pictuga
|
2719bd6776
|
crawler: fix chinese encoding
|
2020-04-20 16:14:55 +02:00 |
pictuga
|
285e1e5f42
|
docker: pip install local
|
2020-04-19 13:25:53 +02:00 |
pictuga
|
41a63900c2
|
README: improve docker instructions
|
2020-04-19 13:01:08 +02:00 |
pictuga
|
ec8edb02f1
|
Various small bug fixes
|
2020-04-19 12:54:02 +02:00 |
pictuga
|
d01b943597
|
Remove leftover threading var
|
2020-04-19 12:51:11 +02:00 |
pictuga
|
b361aa2867
|
Add timeout to :get
|
2020-04-19 12:50:26 +02:00 |
pictuga
|
4ce3c7cb32
|
Small code clean ups
|
2020-04-19 12:50:05 +02:00 |
pictuga
|
7e45b2611d
|
Disable multi-threading
Impact was mostly negative due to locks
|
2020-04-19 12:29:52 +02:00 |
pictuga
|
036e5190f1
|
crawler: remove unused code
|
2020-04-18 21:40:02 +02:00 |
pictuga
|
e99c5b3b71
|
morss: more sensible default MAX/LIM values
|
2020-04-18 17:21:45 +02:00 |
pictuga
|
4f44df8d63
|
Make all ports default to 8080
|
2020-04-18 17:15:59 +02:00 |
pictuga
|
497c14db81
|
Add dockerfile & how to in README
|
2020-04-18 17:04:44 +02:00 |
pictuga
|
a4e1dba8b7
|
sheet.xsl: improve url display
|
2020-04-16 10:33:36 +02:00 |
pictuga
|
7375adce33
|
sheet.xsl: fix & improve
|
2020-04-15 23:34:28 +02:00 |
pictuga
|
663212de0a
|
sheet.xsl: various cosmetic improvements
|
2020-04-15 23:22:45 +02:00 |
pictuga
|
4a2ea1bce9
|
README: add gunicorn instructions
|
2020-04-15 22:31:21 +02:00 |
pictuga
|
fe82b19c91
|
Merge .xsl & html template
Turns out they somehow serve a similar purpose
|
2020-04-15 22:30:45 +02:00 |
pictuga
|
0b31e97492
|
morss: remove debug code in http file handler
|
2020-04-14 23:20:03 +02:00 |
pictuga
|
b0ad7c259d
|
Add README & LICENSE to data_files
|
2020-04-14 19:34:12 +02:00 |
pictuga
|
bffb23f884
|
README: how to use cli
|
2020-04-14 18:21:32 +02:00 |
pictuga
|
59139272fd
|
Auto-detect the location of www/
Either ../www or /usr/share/morss
Adapted README accordingly
|
2020-04-14 18:07:19 +02:00 |
pictuga
|
39b0a1d7cc
|
setup.py: fix deps & files
|
2020-04-14 17:36:42 +02:00 |
pictuga
|
65803b328d
|
New git url and updated date in provided index.html
|
2020-04-13 15:30:32 +02:00 |
pictuga
|
e6b7c0eb33
|
Fix app definition for uwsgi
|
2020-04-13 15:30:09 +02:00 |
pictuga
|
67c096ad5b
|
feeds: add fake path to default html parser
Without it, some websites were accidentally matching it (false positives)
|
2020-04-12 13:00:56 +02:00 |
pictuga
|
f018437544
|
crawler: make mysql backend thread safe
|
2020-04-12 12:53:05 +02:00 |
pictuga
|
8e5e8d24a4
|
Timezone fixes
|
2020-04-10 20:33:59 +02:00 |
pictuga
|
ee78a7875a
|
morss: focus on the most recent feed items
|
2020-04-10 16:08:13 +02:00 |
pictuga
|
9e7b9d95ee
|
feeds: properly use html template
|
2020-04-09 20:00:51 +02:00 |
pictuga
|
987a719c4e
|
feeds: try all parsers regardless of contenttype
Turns out some websites send the wrong contenttype (json for html, html for xml, etc.)
|
2020-04-09 19:17:51 +02:00 |
pictuga
|
47b33f4baa
|
morss: specify server output encoding
|
2020-04-09 19:10:45 +02:00 |
pictuga
|
3c7f512583
|
feeds: handle several errors
|
2020-04-09 19:09:10 +02:00 |
pictuga
|
a32f5a8536
|
readabilite: add debug option (also used by :get)
|
2020-04-09 19:08:13 +02:00 |
pictuga
|
63a06524b7
|
morss: various encoding fixes
|
2020-04-09 19:06:51 +02:00 |
pictuga
|
b0f80c6d3c
|
morss: fix csv output encoding
|
2020-04-09 19:05:50 +02:00 |
pictuga
|
78cea10ead
|
morss: replace :getpage with :get
Also provides readabilite debugging
|
2020-04-09 18:43:20 +02:00 |
pictuga
|
e5a82ff1f4
|
crawler: drop auto-referer
Was solving some issues. But creating even more issues.
|
2020-04-07 10:39:21 +02:00 |
pictuga
|
f3d1f92b39
|
Detect encoding everytime
|
2020-04-07 10:38:36 +02:00 |
pictuga
|
7691df5257
|
Use wrapper for http calls
|
2020-04-07 10:30:17 +02:00 |
pictuga
|
0ae0dbc175
|
README: mention csv output
|
2020-04-07 09:24:32 +02:00 |
pictuga
|
f1d0431e68
|
morss: drop :html, replaced with :reader
README updated accordingly
|
2020-04-07 09:23:29 +02:00 |
pictuga
|
a09831415f
|
feeds: fix bug when mimetype matches nothing
|
2020-04-06 18:53:07 +02:00 |
pictuga
|
bfad6b7a4a
|
readabilite: clean before counting
To remove links which are not kept anyway
|
2020-04-06 16:55:39 +02:00 |
pictuga
|
6b8c3e51e7
|
readabilite: fix threshold feature
Awkward typo...
|
2020-04-06 16:52:06 +02:00 |
pictuga
|
dc9e425247
|
readabilite: don't clean-out the top 10% nodes
Loosen up the code once again to limit over-kill
|
2020-04-06 14:26:28 +02:00 |
pictuga
|
2f48e18bb1
|
readabilite: put scores directly in html node
Probably slower but makes code somewhat cleaner...
|
2020-04-06 14:21:41 +02:00 |
pictuga
|
31cac921c7
|
README: remove ref to iTunes
|
2020-04-05 22:20:33 +02:00 |
pictuga
|
a82ec96eb7
|
Delete feedify.py leftover code
iTunes integration untested, unreliable and not working...
|
2020-04-05 22:16:52 +02:00 |
pictuga
|
aad2398e69
|
feeds: turns out lxml.etree doesn't have drop_tag
|
2020-04-05 21:50:38 +02:00 |
pictuga
|
eeac630855
|
crawler: add more "realistic" headers
|
2020-04-05 21:11:57 +02:00 |