pictuga
a41c2a3a62
morss: fix twitter link detection
2020-03-20 12:26:19 +01:00
pictuga
dd2651061f
feeds & morss: clean up comments/empty lines
2020-03-20 12:25:48 +01:00
pictuga
5865af64f9
Fix indent output for html/xml
2020-03-20 12:18:13 +01:00
pictuga
b3b90c067a
morss.py: remove "useless" functions
...
Have to keep the code clean
2020-03-20 11:19:06 +01:00
pictuga
bda51b0fc7
feeds & morss: many encoding/tostring fixes
2020-03-19 12:53:25 +01:00
pictuga
d26795dce8
morss: from feedify to feeds
...
Also scrap obsolete feedify code
2020-03-19 10:27:44 +01:00
pictuga
9dbe061fd6
Remove markdown-related code
...
Time to clean up the code and stop with those non-core features
They just make the code harder to maintain
2020-03-18 16:47:00 +01:00
pictuga
e606c5eefb
feeds: various small cleanup/fixes
2018-11-18 15:14:38 +01:00
pictuga
3581f34db7
Various feeds.py related fixes
2018-11-11 16:46:23 +01:00
pictuga
679628c7fa
Small code clean up
2018-11-11 16:11:00 +01:00
pictuga
399e867c94
morss: add py2 indication
2018-11-11 16:07:25 +01:00
pictuga
221e1f85ad
feeds: fix implementation in morss
2018-11-11 15:26:09 +01:00
pictuga
4e144487db
Test for feedify support first
...
Otherwise might never be called if the content-type is also supported
2018-10-25 01:17:24 +02:00
pictuga
e72ca3f984
morss: improved output type
2018-09-30 22:02:29 +02:00
pictuga
2ccf36617a
morss: improve http parameter parsing
2018-09-30 22:01:19 +02:00
pictuga
2d5bf7b38b
Fix xml detection regex
...
Also (dirtily) fixes #18 for now
2017-11-04 14:21:05 +01:00
pictuga
194465544a
crawler: separate CacheHander and actual caching
...
Default cache is now just an in-memory {}
2017-11-04 12:41:56 +01:00
pictuga
2d7d0fcdca
morss: fix cgi in python 3
...
Needs explicit [] in py3
2017-11-04 12:27:47 +01:00
pictuga
f563040809
readabilite: threshold to detect if it contains an article
...
Useful for videos/images-based images
2017-10-28 01:30:21 +02:00
pictuga
64babd6713
morss: make readabilite links absolute
2017-07-29 14:37:37 +02:00
pictuga
d3bc2926fc
Remove :hungry
...
Mostly usless. If you need it, you might as well not need to use morss in the first place...
2017-03-25 13:52:58 -10:00
pictuga
167e3e4a15
feedify: accept xpath rules passed as parameters
2017-03-20 20:56:48 -10:00
pictuga
08f08ef704
improve morss url detection regex
2017-03-20 20:51:13 -10:00
pictuga
1b4341f741
accept query_string in morss cgi
2017-03-20 20:50:04 -10:00
pictuga
5e61686373
Only use full feed for articles & feedify
...
Sometimes using referrer and/or useragent makes some dumb websites return diferent content (hello feedburner)
2017-03-18 23:43:28 -10:00
pictuga
0b6e553054
Move iTunes code to feedify.py
2017-03-18 23:41:37 -10:00
pictuga
d4937812a8
Remove HTTPError code
...
Only used to look nice but useless (inherits from IOError anyway)
2017-03-18 23:39:32 -10:00
pictuga
67f5a21019
Move build_opener to crawler
...
Forgotten
2017-03-18 23:03:04 -10:00
pictuga
2003e2760b
Move custom_handler to crawler
...
Makes more sense. Easier to reuse. Also cleaned up a bit the code
2017-03-18 22:51:27 -10:00
pictuga
f4abc4e8a4
Detect encoding (using crawler) before readabilite
2017-03-11 02:30:57 -10:00
pictuga
385f9eb39a
morss: use crawler strict accept for feed
2017-03-08 19:05:48 -10:00
Florian Muenchbach
993ac638a3
Added override for auto-detected character encoding of parsed pages.
2017-03-08 18:45:20 -10:00
pictuga
627163abff
Make cache settings in morss nicer
2017-03-08 18:09:24 -10:00
pictuga
e5f8e43659
Shifted the <link rel='alternate'/> redirect to crawler
...
Now using MIMETYPE var from crawler within morss.py
2017-03-08 18:03:34 -10:00
pictuga
a8ac2ed1ca
Turn FeedBefore/After into ItemBefore/After
...
To reduce the number of loops
2017-02-28 23:24:32 -10:00
pictuga
fcc5e8a076
Add "Feed/Item" in functions name
...
To make it instantly clearer what they work on
2017-02-28 23:23:15 -10:00
pictuga
60e3311e97
Use readabilite properly
...
Not thru some weird wrapper anymore
2017-02-28 22:45:26 -10:00
pictuga
dc8423550f
Support xml starting with \s
2017-02-25 19:04:32 -10:00
pictuga
b14381f575
Use internal readability fork
...
Much simpler, doesn't clean the html, probably less efficient, but much faster
2016-05-31 02:50:03 +02:00
pictuga
2b9bfb47e5
Remove :smart and etag headers
...
Dirty code, not very useful. Use simple cache-control instead.
2016-05-31 02:47:49 +02:00
pictuga
4ff80cec86
Check argv length before using it
2016-05-31 02:46:28 +02:00
pictuga
466d8e47d6
Also make buriy's readability port compatible
...
Should be faster, and it now supports py3
2015-08-29 18:33:12 +02:00
pictuga
95d9d847e9
:proxy implies :keep
2015-08-29 17:48:07 +02:00
pictuga
624fa47f4f
Allow CLI change of the www/ path
2015-08-28 19:22:55 +02:00
pictuga
31fc939d52
Allow CLI change of the http server port
2015-08-28 19:22:23 +02:00
pictuga
4f9000beed
Comment code of launching modes
2015-08-28 19:18:09 +02:00
pictuga
5e87b56a03
Return error code in plain text in file server
2015-08-28 19:16:15 +02:00
pictuga
ffda3fac7e
Improve file detection in web server
2015-08-28 19:15:40 +02:00
pictuga
6741a408dd
Remove now-useless ca-cert file path
2015-08-28 19:13:54 +02:00
Massimo Vannucci
098a306c91
Fixed typo
2015-08-05 23:24:44 +02:00
pictuga
5c2151ffd6
Improve widely feedsportal url decoder
2015-06-14 20:32:47 +08:00
pictuga
ae062ebe90
Remove deprecated https error catch
2015-04-07 18:59:37 +08:00
pictuga
7a3b257328
Make :mono use basic loop
...
Makes profiling easier
2015-04-07 18:16:08 +08:00
pictuga
2f86a2a44b
Remove useless obscure cgi code
2015-04-07 09:49:44 +08:00
pictuga
131ba09207
Change :cache mode behavior
...
Makes underlying code way cleaner
2015-04-07 09:38:22 +08:00
pictuga
cafb87d561
Fix sqlite relative path in cgi
2015-04-07 09:37:25 +08:00
pictuga
decb3f15f6
Move the mod_cgi files to /cgi/
2015-04-07 09:36:00 +08:00
pictuga
b267791199
Remove hashbang from __init__.py
2015-04-07 09:34:22 +08:00
pictuga
acae47dc79
2to3: fix cli_app string print
2015-04-06 23:27:15 +08:00
pictuga
32aa96afa7
Cache HTTP content using a custom Handler
...
Much much cleaner. Nothing comparable
2015-04-06 23:26:12 +08:00
pictuga
1b4fc88ad0
Replace MetaRedirect handler with two cleaner ones
...
One for <meta http-equiv> and one for HTTP 'refresh' header
2015-04-06 23:03:17 +08:00
pictuga
f2fe4fc364
Drop HTTPS SSL certificate verification
...
Breaks everything with python 3. Now built-in in recent python 2.7.9 and python 3.4-ish
2015-04-06 22:54:59 +08:00
pictuga
2e3b766a0a
http-server port as a var, print port on startup
2015-03-24 23:20:06 +08:00
pictuga
656b29e0ef
2to3: using unicode/str to please py3
2015-03-11 01:05:02 +08:00
pictuga
cbeb01e555
2to3: fix urllib header retrieval
2015-03-11 01:03:16 +08:00
pictuga
6ae60d0343
2to3: py3-compatible readability fork
2015-03-03 01:03:03 +08:00
pictuga
dbb3883516
2to3: urllib mimetype
2015-03-03 00:55:58 +08:00
pictuga
071288015b
2to3: morss.py port xrange
2015-02-25 18:41:49 +08:00
pictuga
803d6e37c4
2to3: morss.py port most default libs
2015-02-25 18:36:27 +08:00
pictuga
27cf8f6498
2to3: (iter)items to list
2015-02-25 12:02:53 +08:00
pictuga
3fb90cb7b4
2to3: local import
2015-02-25 11:57:10 +08:00
pictuga
47c8a511ff
2to3: print's
2015-02-25 11:57:10 +08:00
pictuga
604b03e2ba
Delete desc when :keep=False
...
Still needed for Firefox, cause empty <desc/> still show up instead of content in feed preview
2015-02-24 00:38:34 +08:00
pictuga
83ed440e67
Fix issue when desc and content empty
...
Wouldn't put fetched article in feed
2015-02-24 00:38:02 +08:00
pictuga
5c23f90f0b
Disable options filtering by default
...
But still provide sample code
2015-02-21 02:01:32 +08:00
pictuga
149117029c
Improve logging of fetching errors
2015-02-21 01:58:45 +08:00
pictuga
d5269964fc
Make :theforce also bypass http errors
2015-02-21 01:58:16 +08:00
pictuga
f0dcb9912e
Fix cached errors handling
2015-02-21 01:57:33 +08:00
pictuga
f62aedda12
Double HTTP timeout
...
Better slow than nothing (especially when running on a personal computer)
2015-02-21 01:55:53 +08:00
pictuga
76c4211a04
Make :hungry more useful
2015-02-21 01:55:25 +08:00
pictuga
ef946c0712
XML pretty-print in separate option
...
Who reads plain XML anyway?
2015-02-20 17:38:39 +08:00
pictuga
ec5f5b865f
Make it easy to restrict available options
2014-11-21 22:01:03 +01:00
pictuga
105ca67744
Move facebook token to own script
...
To a PHP script actually. Not sure why PHP. Keeps morss' code cleaner. This piece of code had nothing to do in there, and didn't bring any advantage.
2014-11-19 20:09:27 +01:00
pictuga
8131ea2244
HTTPS SSL certificate validation
...
Specific error message added
2014-11-19 11:59:59 +01:00
pictuga
1b26c5f0e3
Split SimpleDownload in a lot of Handlers
...
Cleaner code, easier to edit, more flexibility. Paves the way to SSL certificates validation.
Still have to clean up the code of AcceptHeadersHandler.
2014-11-19 11:57:40 +01:00
pictuga
f46576168a
Add :mono to disable multithreading
...
Convenient to have linear logging
2014-11-10 23:14:54 +01:00
pictuga
5dd262139d
Add HTTP error code to download error message
2014-11-09 15:45:01 +01:00
pictuga
6d5bb2b3c5
Print error message in wgi mode
2014-11-09 15:44:42 +01:00
pictuga
a820cf6812
Run :strip in After
...
Makes more sense
2014-11-09 15:01:50 +01:00
pictuga
5eefe2c916
Log more when using wgi
2014-11-08 21:22:34 +01:00
pictuga
6f2061ff37
Fix :smart
...
Wasn't using the right way
2014-11-08 21:22:07 +01:00
pictuga
40834eeb93
Split After into Before/After
...
Needed since a bunch of options needed to be run before the actual fetching (cause no-one needs to fetch the articles of to-be-dropped items)
2014-11-08 20:31:29 +01:00
pictuga
f20fb9cdf6
Use more stable loop-over-list in Gather
2014-11-08 20:30:36 +01:00
pictuga
6a40731248
Return output when DEBUG is on
...
Much more convenient to actually debug
2014-11-07 18:44:59 +01:00
pictuga
d3eb2dd88d
Implement :smart to save bandwidth
2014-11-07 18:40:44 +01:00
pictuga
67fc5f06f8
Run "After" even when debug mode is on
2014-11-06 21:15:16 +01:00
pictuga
ad2673f474
Add :emtpy to remove all items
...
This is completely useless...
2014-11-06 21:14:41 +01:00
pictuga
ecfda1d05a
Add :strip to remove desc and content
2014-11-06 21:14:20 +01:00
pictuga
1a8ee716f3
Add "search" option
...
PLEASE NOTE that this is case sensitive and does really basic research ("is xyz in the title?"). Don't use this for fine filtering.
Also fixed an issue with After(), due to the fact that some functions were removing items from the feed while looping over the feed items, creating some anoying item-skipping issues.
2014-11-06 21:11:23 +01:00
pictuga
0e22bb4316
Cache: catch json parse erros
2014-09-28 12:03:58 +02:00