pictuga
|
7f4589c578
|
crawler: return dict instead of tuple
|
2020-04-28 22:10:20 +02:00 |
pictuga
|
a1dc96cb50
|
feeds: remove mimetype from function call as no longer used
|
2020-04-28 22:07:25 +02:00 |
pictuga
|
749acc87fc
|
Centralize url clean up in crawler.py
|
2020-04-28 22:03:49 +02:00 |
pictuga
|
c186188557
|
README: warning about lxml installation
|
2020-04-28 21:58:26 +02:00 |
pictuga
|
cb69e3167f
|
crawler: accept non-ascii urls
Covering one more corner case!
|
2020-04-28 14:47:23 +02:00 |
pictuga
|
c3f06da947
|
morss: process(): specify encoding for clarity
|
2020-04-28 14:45:00 +02:00 |
pictuga
|
44a3e0edc4
|
readabilite: specify in- and out-going encoding
|
2020-04-28 14:44:35 +02:00 |
pictuga
|
4a9b505499
|
README: update python lib instructions
|
2020-04-27 18:12:14 +02:00 |
pictuga
|
818cdaaa9b
|
Make it possible to call sub-libs in non interactive mode
Run `python -m morss.feeds http://lemonde.fr` and so on
|
2020-04-27 18:00:14 +02:00 |
pictuga
|
2806c64326
|
Make it possible to directly run sub-libs (feeds, crawler, readabilite)
Run `python -im morss.feeds http://website.sample/rss.xml` and so on
|
2020-04-27 17:19:31 +02:00 |
pictuga
|
d39d7bb19d
|
sheet.xsl: limit overflow
|
2020-04-25 15:27:49 +02:00 |
pictuga
|
e5e3746fc6
|
sheet.xsl: show plain url
|
2020-04-25 15:27:13 +02:00 |
pictuga
|
960c9d10d6
|
sheet.xsl: customize output feed form
|
2020-04-25 15:26:47 +02:00 |
pictuga
|
0e7a5b9780
|
sheet.xsl: wrap header in <header>
|
2020-04-25 15:24:57 +02:00 |
pictuga
|
186bedcf62
|
sheet.xsl: smarter html reparser
|
2020-04-25 15:22:25 +02:00 |
pictuga
|
5847e18e42
|
sheet: improved feed address output (w/ c/c)
|
2020-04-25 15:21:47 +02:00 |
pictuga
|
f6bc23927f
|
readabilite: drop dangerous tags (script, style)
|
2020-04-25 12:25:02 +02:00 |
pictuga
|
c86572374e
|
readabilite: minimum score requirement
|
2020-04-25 12:24:36 +02:00 |
pictuga
|
59ef5af9e2
|
feeds: fix bug when deleting attr in html
|
2020-04-24 22:12:05 +02:00 |
pictuga
|
6a0531ca03
|
crawler: randomize user agent
|
2020-04-24 11:28:39 +02:00 |
pictuga
|
8187876a06
|
crawler: stop at first alternative link
Should save a few ms and the first one is usually (?) the most relevant/generic
|
2020-04-23 11:23:45 +02:00 |
pictuga
|
325a373e3e
|
feeds: add SyntaxError catch
|
2020-04-20 16:15:15 +02:00 |
pictuga
|
2719bd6776
|
crawler: fix chinese encoding
|
2020-04-20 16:14:55 +02:00 |
pictuga
|
285e1e5f42
|
docker: pip install local
|
2020-04-19 13:25:53 +02:00 |
pictuga
|
41a63900c2
|
README: improve docker instructions
|
2020-04-19 13:01:08 +02:00 |
pictuga
|
ec8edb02f1
|
Various small bug fixes
|
2020-04-19 12:54:02 +02:00 |
pictuga
|
d01b943597
|
Remove leftover threading var
|
2020-04-19 12:51:11 +02:00 |
pictuga
|
b361aa2867
|
Add timeout to :get
|
2020-04-19 12:50:26 +02:00 |
pictuga
|
4ce3c7cb32
|
Small code clean ups
|
2020-04-19 12:50:05 +02:00 |
pictuga
|
7e45b2611d
|
Disable multi-threading
Impact was mostly negative due to locks
|
2020-04-19 12:29:52 +02:00 |
pictuga
|
036e5190f1
|
crawler: remove unused code
|
2020-04-18 21:40:02 +02:00 |
pictuga
|
e99c5b3b71
|
morss: more sensible default MAX/LIM values
|
2020-04-18 17:21:45 +02:00 |
pictuga
|
4f44df8d63
|
Make all ports default to 8080
|
2020-04-18 17:15:59 +02:00 |
pictuga
|
497c14db81
|
Add dockerfile & how to in README
|
2020-04-18 17:04:44 +02:00 |
pictuga
|
a4e1dba8b7
|
sheet.xsl: improve url display
|
2020-04-16 10:33:36 +02:00 |
pictuga
|
7375adce33
|
sheet.xsl: fix & improve
|
2020-04-15 23:34:28 +02:00 |
pictuga
|
663212de0a
|
sheet.xsl: various cosmetic improvements
|
2020-04-15 23:22:45 +02:00 |
pictuga
|
4a2ea1bce9
|
README: add gunicorn instructions
|
2020-04-15 22:31:21 +02:00 |
pictuga
|
fe82b19c91
|
Merge .xsl & html template
Turns out they somehow serve a similar purpose
|
2020-04-15 22:30:45 +02:00 |
pictuga
|
0b31e97492
|
morss: remove debug code in http file handler
|
2020-04-14 23:20:03 +02:00 |
pictuga
|
b0ad7c259d
|
Add README & LICENSE to data_files
|
2020-04-14 19:34:12 +02:00 |
pictuga
|
bffb23f884
|
README: how to use cli
|
2020-04-14 18:21:32 +02:00 |
pictuga
|
59139272fd
|
Auto-detect the location of www/
Either ../www or /usr/share/morss
Adapted README accordingly
|
2020-04-14 18:07:19 +02:00 |
pictuga
|
39b0a1d7cc
|
setup.py: fix deps & files
|
2020-04-14 17:36:42 +02:00 |
pictuga
|
65803b328d
|
New git url and updated date in provided index.html
|
2020-04-13 15:30:32 +02:00 |
pictuga
|
e6b7c0eb33
|
Fix app definition for uwsgi
|
2020-04-13 15:30:09 +02:00 |
pictuga
|
67c096ad5b
|
feeds: add fake path to default html parser
Without it, some websites were accidentally matching it (false positives)
|
2020-04-12 13:00:56 +02:00 |
pictuga
|
f018437544
|
crawler: make mysql backend thread safe
|
2020-04-12 12:53:05 +02:00 |
pictuga
|
8e5e8d24a4
|
Timezone fixes
|
2020-04-10 20:33:59 +02:00 |
pictuga
|
ee78a7875a
|
morss: focus on the most recent feed items
|
2020-04-10 16:08:13 +02:00 |