Commit Graph

202 Commits (b42599278324949c5e12989f7ad3f7a62954cdba)

Author SHA1 Message Date
pictuga a41c2a3a62 morss: fix twitter link detection 2020-03-20 12:26:19 +01:00
pictuga dd2651061f feeds & morss: clean up comments/empty lines 2020-03-20 12:25:48 +01:00
pictuga 5865af64f9 Fix indent output for html/xml 2020-03-20 12:18:13 +01:00
pictuga b3b90c067a morss.py: remove "useless" functions
Have to keep the code clean
2020-03-20 11:19:06 +01:00
pictuga bda51b0fc7 feeds & morss: many encoding/tostring fixes 2020-03-19 12:53:25 +01:00
pictuga d26795dce8 morss: from feedify to feeds
Also scrap obsolete feedify code
2020-03-19 10:27:44 +01:00
pictuga 9dbe061fd6 Remove markdown-related code
Time to clean up the code and stop with those non-core features
They just make the code harder to maintain
2020-03-18 16:47:00 +01:00
pictuga e606c5eefb feeds: various small cleanup/fixes 2018-11-18 15:14:38 +01:00
pictuga 3581f34db7 Various feeds.py related fixes 2018-11-11 16:46:23 +01:00
pictuga 679628c7fa Small code clean up 2018-11-11 16:11:00 +01:00
pictuga 399e867c94 morss: add py2 indication 2018-11-11 16:07:25 +01:00
pictuga 221e1f85ad feeds: fix implementation in morss 2018-11-11 15:26:09 +01:00
pictuga 4e144487db Test for feedify support first
Otherwise might never be called if the content-type is also supported
2018-10-25 01:17:24 +02:00
pictuga e72ca3f984 morss: improved output type 2018-09-30 22:02:29 +02:00
pictuga 2ccf36617a morss: improve http parameter parsing 2018-09-30 22:01:19 +02:00
pictuga 2d5bf7b38b Fix xml detection regex
Also (dirtily) fixes #18 for now
2017-11-04 14:21:05 +01:00
pictuga 194465544a crawler: separate CacheHander and actual caching
Default cache is now just an in-memory {}
2017-11-04 12:41:56 +01:00
pictuga 2d7d0fcdca morss: fix cgi in python 3
Needs explicit [] in py3
2017-11-04 12:27:47 +01:00
pictuga f563040809 readabilite: threshold to detect if it contains an article
Useful for videos/images-based images
2017-10-28 01:30:21 +02:00
pictuga 64babd6713 morss: make readabilite links absolute 2017-07-29 14:37:37 +02:00
pictuga d3bc2926fc Remove :hungry
Mostly usless. If you need it, you might as well not need to use morss in the first place...
2017-03-25 13:52:58 -10:00
pictuga 167e3e4a15 feedify: accept xpath rules passed as parameters 2017-03-20 20:56:48 -10:00
pictuga 08f08ef704 improve morss url detection regex 2017-03-20 20:51:13 -10:00
pictuga 1b4341f741 accept query_string in morss cgi 2017-03-20 20:50:04 -10:00
pictuga 5e61686373 Only use full feed for articles & feedify
Sometimes using referrer and/or useragent makes some dumb websites return diferent content (hello feedburner)
2017-03-18 23:43:28 -10:00
pictuga 0b6e553054 Move iTunes code to feedify.py 2017-03-18 23:41:37 -10:00
pictuga d4937812a8 Remove HTTPError code
Only used to look nice but useless (inherits from IOError anyway)
2017-03-18 23:39:32 -10:00
pictuga 67f5a21019 Move build_opener to crawler
Forgotten
2017-03-18 23:03:04 -10:00
pictuga 2003e2760b Move custom_handler to crawler
Makes more sense. Easier to reuse. Also cleaned up a bit the code
2017-03-18 22:51:27 -10:00
pictuga f4abc4e8a4 Detect encoding (using crawler) before readabilite 2017-03-11 02:30:57 -10:00
pictuga 385f9eb39a morss: use crawler strict accept for feed 2017-03-08 19:05:48 -10:00
Florian Muenchbach 993ac638a3 Added override for auto-detected character encoding of parsed pages. 2017-03-08 18:45:20 -10:00
pictuga 627163abff Make cache settings in morss nicer 2017-03-08 18:09:24 -10:00
pictuga e5f8e43659 Shifted the <link rel='alternate'/> redirect to crawler
Now using MIMETYPE var from crawler within morss.py
2017-03-08 18:03:34 -10:00
pictuga a8ac2ed1ca Turn FeedBefore/After into ItemBefore/After
To reduce the number of loops
2017-02-28 23:24:32 -10:00
pictuga fcc5e8a076 Add "Feed/Item" in functions name
To make it instantly clearer what they work on
2017-02-28 23:23:15 -10:00
pictuga 60e3311e97 Use readabilite properly
Not thru some weird wrapper anymore
2017-02-28 22:45:26 -10:00
pictuga dc8423550f Support xml starting with \s 2017-02-25 19:04:32 -10:00
pictuga b14381f575 Use internal readability fork
Much simpler, doesn't clean the html, probably less efficient, but much faster
2016-05-31 02:50:03 +02:00
pictuga 2b9bfb47e5 Remove :smart and etag headers
Dirty code, not very useful. Use simple cache-control instead.
2016-05-31 02:47:49 +02:00
pictuga 4ff80cec86 Check argv length before using it 2016-05-31 02:46:28 +02:00
pictuga 466d8e47d6 Also make buriy's readability port compatible
Should be faster, and it now supports py3
2015-08-29 18:33:12 +02:00
pictuga 95d9d847e9 :proxy implies :keep 2015-08-29 17:48:07 +02:00
pictuga 624fa47f4f Allow CLI change of the www/ path 2015-08-28 19:22:55 +02:00
pictuga 31fc939d52 Allow CLI change of the http server port 2015-08-28 19:22:23 +02:00
pictuga 4f9000beed Comment code of launching modes 2015-08-28 19:18:09 +02:00
pictuga 5e87b56a03 Return error code in plain text in file server 2015-08-28 19:16:15 +02:00
pictuga ffda3fac7e Improve file detection in web server 2015-08-28 19:15:40 +02:00
pictuga 6741a408dd Remove now-useless ca-cert file path 2015-08-28 19:13:54 +02:00
Massimo Vannucci 098a306c91 Fixed typo 2015-08-05 23:24:44 +02:00
pictuga 5c2151ffd6 Improve widely feedsportal url decoder 2015-06-14 20:32:47 +08:00
pictuga ae062ebe90 Remove deprecated https error catch 2015-04-07 18:59:37 +08:00
pictuga 7a3b257328 Make :mono use basic loop
Makes profiling easier
2015-04-07 18:16:08 +08:00
pictuga 2f86a2a44b Remove useless obscure cgi code 2015-04-07 09:49:44 +08:00
pictuga 131ba09207 Change :cache mode behavior
Makes underlying code way cleaner
2015-04-07 09:38:22 +08:00
pictuga cafb87d561 Fix sqlite relative path in cgi 2015-04-07 09:37:25 +08:00
pictuga decb3f15f6 Move the mod_cgi files to /cgi/ 2015-04-07 09:36:00 +08:00
pictuga b267791199 Remove hashbang from __init__.py 2015-04-07 09:34:22 +08:00
pictuga acae47dc79 2to3: fix cli_app string print 2015-04-06 23:27:15 +08:00
pictuga 32aa96afa7 Cache HTTP content using a custom Handler
Much much cleaner. Nothing comparable
2015-04-06 23:26:12 +08:00
pictuga 1b4fc88ad0 Replace MetaRedirect handler with two cleaner ones
One for <meta http-equiv> and one for HTTP 'refresh' header
2015-04-06 23:03:17 +08:00
pictuga f2fe4fc364 Drop HTTPS SSL certificate verification
Breaks everything with python 3. Now built-in in recent python 2.7.9 and python 3.4-ish
2015-04-06 22:54:59 +08:00
pictuga 2e3b766a0a http-server port as a var, print port on startup 2015-03-24 23:20:06 +08:00
pictuga 656b29e0ef 2to3: using unicode/str to please py3 2015-03-11 01:05:02 +08:00
pictuga cbeb01e555 2to3: fix urllib header retrieval 2015-03-11 01:03:16 +08:00
pictuga 6ae60d0343 2to3: py3-compatible readability fork 2015-03-03 01:03:03 +08:00
pictuga dbb3883516 2to3: urllib mimetype 2015-03-03 00:55:58 +08:00
pictuga 071288015b 2to3: morss.py port xrange 2015-02-25 18:41:49 +08:00
pictuga 803d6e37c4 2to3: morss.py port most default libs 2015-02-25 18:36:27 +08:00
pictuga 27cf8f6498 2to3: (iter)items to list 2015-02-25 12:02:53 +08:00
pictuga 3fb90cb7b4 2to3: local import 2015-02-25 11:57:10 +08:00
pictuga 47c8a511ff 2to3: print's 2015-02-25 11:57:10 +08:00
pictuga 604b03e2ba Delete desc when :keep=False
Still needed for Firefox, cause empty <desc/> still show up instead of content in feed preview
2015-02-24 00:38:34 +08:00
pictuga 83ed440e67 Fix issue when desc and content empty
Wouldn't put fetched article in feed
2015-02-24 00:38:02 +08:00
pictuga 5c23f90f0b Disable options filtering by default
But still provide sample code
2015-02-21 02:01:32 +08:00
pictuga 149117029c Improve logging of fetching errors 2015-02-21 01:58:45 +08:00
pictuga d5269964fc Make :theforce also bypass http errors 2015-02-21 01:58:16 +08:00
pictuga f0dcb9912e Fix cached errors handling 2015-02-21 01:57:33 +08:00
pictuga f62aedda12 Double HTTP timeout
Better slow than nothing (especially when running on a personal computer)
2015-02-21 01:55:53 +08:00
pictuga 76c4211a04 Make :hungry more useful 2015-02-21 01:55:25 +08:00
pictuga ef946c0712 XML pretty-print in separate option
Who reads plain XML anyway?
2015-02-20 17:38:39 +08:00
pictuga ec5f5b865f Make it easy to restrict available options 2014-11-21 22:01:03 +01:00
pictuga 105ca67744 Move facebook token to own script
To a PHP script actually. Not sure why PHP. Keeps morss' code cleaner. This piece of code had nothing to do in there, and didn't bring any advantage.
2014-11-19 20:09:27 +01:00
pictuga 8131ea2244 HTTPS SSL certificate validation
Specific error message added
2014-11-19 11:59:59 +01:00
pictuga 1b26c5f0e3 Split SimpleDownload in a lot of Handlers
Cleaner code, easier to edit, more flexibility. Paves the way to SSL certificates validation.
Still have to clean up the code of AcceptHeadersHandler.
2014-11-19 11:57:40 +01:00
pictuga f46576168a Add :mono to disable multithreading
Convenient to have linear logging
2014-11-10 23:14:54 +01:00
pictuga 5dd262139d Add HTTP error code to download error message 2014-11-09 15:45:01 +01:00
pictuga 6d5bb2b3c5 Print error message in wgi mode 2014-11-09 15:44:42 +01:00
pictuga a820cf6812 Run :strip in After
Makes more sense
2014-11-09 15:01:50 +01:00
pictuga 5eefe2c916 Log more when using wgi 2014-11-08 21:22:34 +01:00
pictuga 6f2061ff37 Fix :smart
Wasn't using the right way
2014-11-08 21:22:07 +01:00
pictuga 40834eeb93 Split After into Before/After
Needed since a bunch of options needed to be run before the actual fetching (cause no-one needs to fetch the articles of to-be-dropped items)
2014-11-08 20:31:29 +01:00
pictuga f20fb9cdf6 Use more stable loop-over-list in Gather 2014-11-08 20:30:36 +01:00
pictuga 6a40731248 Return output when DEBUG is on
Much more convenient to actually debug
2014-11-07 18:44:59 +01:00
pictuga d3eb2dd88d Implement :smart to save bandwidth 2014-11-07 18:40:44 +01:00
pictuga 67fc5f06f8 Run "After" even when debug mode is on 2014-11-06 21:15:16 +01:00
pictuga ad2673f474 Add :emtpy to remove all items
This is completely useless...
2014-11-06 21:14:41 +01:00
pictuga ecfda1d05a Add :strip to remove desc and content 2014-11-06 21:14:20 +01:00
pictuga 1a8ee716f3 Add "search" option
PLEASE NOTE that this is case sensitive and does really basic research ("is xyz in the title?"). Don't use this for fine filtering.
Also fixed an issue with After(), due to the fact that some functions were removing items from the feed while looping over the feed items, creating some anoying item-skipping issues.
2014-11-06 21:11:23 +01:00
pictuga 0e22bb4316 Cache: catch json parse erros 2014-09-28 12:03:58 +02:00