pictuga
|
3190d1ec5a
|
feeds: remove useless if(len) before loop
|
2020-06-02 13:57:45 +02:00 |
pictuga
|
9815794a97
|
sheet.xsl: make text more self explanatory
|
2020-05-27 21:42:00 +02:00 |
pictuga
|
758b6861b9
|
sheet.xsl: fix text alignment
|
2020-05-27 21:36:11 +02:00 |
pictuga
|
ce4cf01aa6
|
crawler: clean up encoding detection code
|
2020-05-27 21:35:24 +02:00 |
pictuga
|
dcfdb75a15
|
crawler: fix chinese encoding support
|
2020-05-27 21:34:43 +02:00 |
pictuga
|
4ccc0dafcd
|
Basic help for sub-lib interactive use
|
2020-05-26 19:34:20 +02:00 |
pictuga
|
2fe3e0b8ee
|
feeds: clean up other stylesheets before putting ours
|
2020-05-26 19:26:36 +02:00 |
pictuga
|
ad3ba9de1a
|
sheet.xsl: add <select/> to use :firstlink
|
2020-05-13 12:33:12 +02:00 |
pictuga
|
68c46a1823
|
morss: remove deprecated twitter/fb link handling
|
2020-05-13 12:31:09 +02:00 |
pictuga
|
91be2d229e
|
morss: ability to use first link from desc instead of default link
|
2020-05-13 12:29:53 +02:00 |
pictuga
|
038f267ea2
|
Rename :theforce into :force
|
2020-05-13 11:49:15 +02:00 |
pictuga
|
22005065e8
|
Use etree.tostring 'method' arg
Gives appropriately formatted html code.
Some pages might otherwise be rendered as blank.
|
2020-05-13 11:44:34 +02:00 |
pictuga
|
7d0d416610
|
morss: cache articles for 24hrs
Also make it possible to refetch articles, regardless of cache
|
2020-05-12 21:10:31 +02:00 |
pictuga
|
5dac4c69a1
|
crawler: more code comments
|
2020-05-12 20:44:25 +02:00 |
pictuga
|
36e2a1c3fd
|
crawler: increase size limit from 100KiB to 500
I'm looking at you, worldbankgroup.csod.com/ats/careersite/search.aspx
|
2020-05-12 19:34:16 +02:00 |
pictuga
|
83dd2925d3
|
readabilite: better parsing
Keeping blank_text keeps the tree more as-it, making the final output closer to expectations
|
2020-05-12 14:15:53 +02:00 |
pictuga
|
e09d0abf54
|
morss: remove deprecated peace of code
|
2020-05-07 16:05:30 +02:00 |
pictuga
|
ff26a560cb
|
Shift safari work around to morss.py
|
2020-05-07 16:04:54 +02:00 |
pictuga
|
74d7a1eca2
|
sheet.xsl: fix word wrap
|
2020-05-06 16:58:28 +02:00 |
pictuga
|
eba295cba8
|
sheet.xsl: fixes for safari
|
2020-05-06 12:01:27 +02:00 |
pictuga
|
f27631954e
|
.htaccess: bypass Safari RSS detection
|
2020-05-06 11:47:24 +02:00 |
pictuga
|
c74abfa2f4
|
sheet.xsl: use CDATA for js code
|
2020-05-06 11:46:38 +02:00 |
pictuga
|
1d5272c299
|
sheet.xsl: allow zooming on mobile
|
2020-05-04 14:44:43 +02:00 |
pictuga
|
f685139137
|
crawler: use UPSERT statements
Avoid potential race conditions
|
2020-05-03 21:27:45 +02:00 |
pictuga
|
73b477665e
|
morss: separate :clip with <hr> instead of stars
|
2020-05-02 19:19:54 +02:00 |
pictuga
|
b425992783
|
morss: don't follow alt=rss with custom feeds
To have the same page as with :get=page and to avoid shitty feeds
|
2020-05-02 19:18:58 +02:00 |
pictuga
|
271ac8f80f
|
crawler: comment code a bit
|
2020-05-02 19:18:01 +02:00 |
pictuga
|
64e41b807d
|
crawler: handle http:/ (single slash)
Fixing one more corner case! malayalam.oneindia.com
|
2020-05-02 19:17:15 +02:00 |
pictuga
|
a2c4691090
|
sheet.xsl: dir=auto for rtl languages (arabic, etc.)
|
2020-04-29 15:01:33 +02:00 |
pictuga
|
b6000923bc
|
README: clean up deprecated code
|
2020-04-28 22:31:11 +02:00 |
pictuga
|
27a42c47aa
|
morss: use final request url
Code is not very elegant...
|
2020-04-28 22:30:21 +02:00 |
pictuga
|
c27c38f7c7
|
crawler: return dict instead of tuple
|
2020-04-28 22:29:07 +02:00 |
pictuga
|
a1dc96cb50
|
feeds: remove mimetype from function call as no longer used
|
2020-04-28 22:07:25 +02:00 |
pictuga
|
749acc87fc
|
Centralize url clean up in crawler.py
|
2020-04-28 22:03:49 +02:00 |
pictuga
|
c186188557
|
README: warning about lxml installation
|
2020-04-28 21:58:26 +02:00 |
pictuga
|
cb69e3167f
|
crawler: accept non-ascii urls
Covering one more corner case!
|
2020-04-28 14:47:23 +02:00 |
pictuga
|
c3f06da947
|
morss: process(): specify encoding for clarity
|
2020-04-28 14:45:00 +02:00 |
pictuga
|
44a3e0edc4
|
readabilite: specify in- and out-going encoding
|
2020-04-28 14:44:35 +02:00 |
pictuga
|
4a9b505499
|
README: update python lib instructions
|
2020-04-27 18:12:14 +02:00 |
pictuga
|
818cdaaa9b
|
Make it possible to call sub-libs in non interactive mode
Run `python -m morss.feeds http://lemonde.fr` and so on
|
2020-04-27 18:00:14 +02:00 |
pictuga
|
2806c64326
|
Make it possible to directly run sub-libs (feeds, crawler, readabilite)
Run `python -im morss.feeds http://website.sample/rss.xml` and so on
|
2020-04-27 17:19:31 +02:00 |
pictuga
|
d39d7bb19d
|
sheet.xsl: limit overflow
|
2020-04-25 15:27:49 +02:00 |
pictuga
|
e5e3746fc6
|
sheet.xsl: show plain url
|
2020-04-25 15:27:13 +02:00 |
pictuga
|
960c9d10d6
|
sheet.xsl: customize output feed form
|
2020-04-25 15:26:47 +02:00 |
pictuga
|
0e7a5b9780
|
sheet.xsl: wrap header in <header>
|
2020-04-25 15:24:57 +02:00 |
pictuga
|
186bedcf62
|
sheet.xsl: smarter html reparser
|
2020-04-25 15:22:25 +02:00 |
pictuga
|
5847e18e42
|
sheet: improved feed address output (w/ c/c)
|
2020-04-25 15:21:47 +02:00 |
pictuga
|
f6bc23927f
|
readabilite: drop dangerous tags (script, style)
|
2020-04-25 12:25:02 +02:00 |
pictuga
|
c86572374e
|
readabilite: minimum score requirement
|
2020-04-25 12:24:36 +02:00 |
pictuga
|
59ef5af9e2
|
feeds: fix bug when deleting attr in html
|
2020-04-24 22:12:05 +02:00 |