Commit Graph

581 Commits (master)

Author SHA1 Message Date
pictuga d12d44a500 Remove unused hash-bangs
Leftovers from debugging
2020-03-19 15:06:28 +01:00
pictuga ee8c57c1fc feeds: avoid convert to self 2020-03-19 12:54:04 +01:00
pictuga bda51b0fc7 feeds & morss: many encoding/tostring fixes 2020-03-19 12:53:25 +01:00
pictuga c09b457168 feeds: fix .dic code
Bug introduced recently…
2020-03-19 11:36:20 +01:00
pictuga b47e40246c feeds: clean up html code handling 2020-03-19 11:35:51 +01:00
pictuga 9cf933723f feeds: clean up time handling
Includes a shameful fix on @property
2020-03-19 11:35:02 +01:00
pictuga d26795dce8 morss: from feedify to feeds
Also scrap obsolete feedify code
2020-03-19 10:27:44 +01:00
pictuga 2704e91a3d readabilite: handle another weird html stuff 2020-03-19 10:24:09 +01:00
pictuga f48961a7e4 feeds: small code cleanups 2020-03-19 10:13:22 +01:00
pictuga fd51f74eb5 feeds: drop facebook-related code
No longer maintained and of little use anyway
2020-03-19 10:03:59 +01:00
pictuga d3fe51cea5 feeds: remove duplicated code 2020-03-19 09:49:52 +01:00
pictuga 449bc3c695 feeds: fix handling of html code 2020-03-19 09:48:53 +01:00
pictuga 13ea52ef80 feeds: add .torss() 2020-03-19 09:47:58 +01:00
pictuga aa2b56b266 feeds: small code cleanups 2020-03-19 09:47:17 +01:00
pictuga 296e9a6c13 feedifiy.ini: clean up following feeds.py's html intro 2020-03-18 16:52:28 +01:00
pictuga 9dbe061fd6 Remove markdown-related code
Time to clean up the code and stop with those non-core features
They just make the code harder to maintain
2020-03-18 16:47:00 +01:00
pictuga 4a70aa9dfa feeds: auto-parse() 2020-03-18 16:34:40 +01:00
pictuga c2f85da94a feeds: add html support, adapt .tohtml() 2020-03-18 16:33:10 +01:00
pictuga e3528a8f36 feeds: use FeddJSON for .tojson()
Clean up related code
2020-03-18 16:31:36 +01:00
pictuga 2dd9ae202d feeds: add 'mode' to Parsers 2020-03-18 16:24:08 +01:00
pictuga 8e3d32f24c feds: rename 'ruleset' into 'default_ruleset'
Better reflects its use
2020-03-18 16:22:03 +01:00
pictuga 6ce616106b feeds: disable 'multi' ruleset
RSS ruleset has been cut into 4 rulesets
2020-03-18 16:20:42 +01:00
pictuga 186fa2b408 feedify: remove empty 'path' lines 2020-03-18 16:18:22 +01:00
pictuga e9d46cb6a9 feeds: move mimetypes into .py from .ini 2020-03-18 16:08:42 +01:00
pictuga 7644c550ec feedify: remove id, is_permalink as well 2020-03-17 16:48:36 +01:00
pictuga e460fdc8f4 feeds: remove little-used properties
id, is_permalink
Also a good opportunity to trash the unreliable bool-parsing code
2020-03-17 16:46:42 +01:00
pictuga 1e714ab34b feeds: add ability to convert to another typo of feed 2020-03-17 14:02:24 +01:00
pictuga 4ba4d73ce6 feeds: add json support 2020-03-17 14:02:01 +01:00
pictuga 6d5aa8c222 feeds: clean up .append() 2020-03-17 13:59:51 +01:00
pictuga f10727f94a feeds: small code cleanup's 2020-03-17 12:26:34 +01:00
pictuga d42e19a165 feeds: beter rules handling
"Dynamic" rule set picking, better handling of non-multi rules
2020-03-17 12:23:36 +01:00
pictuga fe46c6c522 feeds: pass parent Feed to Items 2020-03-17 12:22:14 +01:00
pictuga 9c557ea02c feeds: fix function def 2020-03-17 11:08:40 +01:00
pictuga 8a4f86210c feedify.ini: remove utf-8 declaration
Screws up with the parser as it is read as unicode (and xml parser expects bytes)
2020-03-17 11:06:59 +01:00
pictuga ce30952fa2 feeds: make "rule" split clearer
"rrule" var name to tell appart the original "rule" from the parsed one
2020-03-16 17:46:04 +01:00
pictuga 3fb6ff891c feeds: share more code, add comments
Should reduce redundancy
2020-03-16 17:45:08 +01:00
pictuga f5acd2c14c feeds: use RawConfigParser
This one does not try to replace non-std characters (e.g. %)
2020-03-16 17:43:03 +01:00
pictuga 7cb3b29ef2 feeds: remove unused import 2020-03-16 17:38:48 +01:00
pictuga 9cb2d5bb86 feeds: centralize time format/parse
At the same code _should_ apply to most, if not all, parsers
2018-11-18 16:03:02 +01:00
pictuga e606c5eefb feeds: various small cleanup/fixes 2018-11-18 15:14:38 +01:00
pictuga 24c8a0ecd0 feeds: fix typo 2018-11-13 21:23:24 +01:00
pictuga 9a62e6ae75 feeds: remove old code 2018-11-13 21:22:50 +01:00
pictuga adbaed9e54 feeds: put code tgt 2018-11-11 17:24:56 +01:00
pictuga 3581f34db7 Various feeds.py related fixes 2018-11-11 16:46:23 +01:00
pictuga 966559bdd3 feeds: fix remove function in case of no match 2018-11-11 16:33:36 +01:00
pictuga 4fb98bc2ed feeds: fix append content 2018-11-11 16:33:18 +01:00
pictuga 679628c7fa Small code clean up 2018-11-11 16:11:00 +01:00
pictuga 399e867c94 morss: add py2 indication 2018-11-11 16:07:25 +01:00
pictuga c5d8b064ae feeds: fix an error when no match 2018-11-11 15:31:46 +01:00
pictuga c2a6ea7cfe feeds: give example of regex 2018-11-11 15:26:46 +01:00
pictuga 221e1f85ad feeds: fix implementation in morss 2018-11-11 15:26:09 +01:00
pictuga 857bb9c366 feeds: fix remove() unclear function naming 2018-11-11 15:25:03 +01:00
pictuga 75f691b009 feeds: fix multi rules parsing 2018-11-11 15:21:43 +01:00
pictuga 401dfbc1ff feeds: fix atom xhtml handling 2018-11-11 15:21:06 +01:00
pictuga 8aceda4957 feeds: fix feedify.ini 2018-11-11 15:19:41 +01:00
pictuga 024466733c feeds: remove old code 2018-11-09 22:09:59 +01:00
pictuga 92b06bea6d feeds: fix Uniq for merger 2018-11-09 22:05:13 +01:00
pictuga 94372af868 feeds: transitional code for json/csv/html export 2018-11-09 22:04:46 +01:00
pictuga 6d28323e3a feeds: add XML support for merger 2018-11-09 22:04:08 +01:00
pictuga 5a4a86d622 feeds: add base classes for merger 2018-11-09 22:02:44 +01:00
pictuga d321550166 feeds: prepare feedify.ini for merger 2018-11-09 21:53:19 +01:00
pictuga d1aab99b80 feeds: replacement code for descriptors 2018-10-31 22:15:34 +01:00
pictuga 16f3ffa96e feeds: remove further Descriptor code 2018-10-31 22:15:15 +01:00
pictuga 02b7e07097 feeds: fix typo 2018-10-31 22:07:49 +01:00
pictuga 8487a43c6c feeds: remove FeedList(Descriptor) 2018-10-31 22:07:16 +01:00
pictuga 081d560bc4 feeds: create obj to keep FeedItems unique 2018-10-31 21:47:19 +01:00
pictuga cfd758b6b5 feeds: shift easy ones to @property 2018-10-26 19:48:39 +02:00
pictuga 4e144487db Test for feedify support first
Otherwise might never be called if the content-type is also supported
2018-10-25 01:17:24 +02:00
pictuga d13362c4ac feeds: drop .iterchildren
Redundant
2018-10-25 01:16:28 +02:00
pictuga 17856929fe feeds: pretty_print was made a default 2018-10-25 01:16:07 +02:00
pictuga 90110a4661 crawler: reduce max file size 2018-10-25 01:15:09 +02:00
pictuga 91a084e5ed crawler: make py2/3 code distinction clearer 2018-10-25 01:14:46 +02:00
pictuga 5d93d68f62 readabilite: add some function descriptions 2018-10-25 01:12:42 +02:00
pictuga 8d7e1811fd readabilite: update lists
Some code was also meant to be committed earlier
2018-10-25 01:12:08 +02:00
pictuga 72d03f21fe readabilite: forgot count_content
Was meant to be in an earlier commit
2018-10-25 01:11:29 +02:00
pictuga 1d6d0b8ff1 readabilite: move br2p in the cleaning code 2018-10-25 01:09:15 +02:00
pictuga 7d005e9a65 readabilite: run the new cleaning code 2018-10-25 01:08:25 +02:00
pictuga 58fe5243af readabilite: improve cleaning code 2018-10-25 01:07:25 +02:00
pictuga f044c242ef readabilite: simplify scoring loop
For perfomance
2018-10-25 00:59:39 +02:00
pictuga a6befad136 readabilite: change scoring 2018-10-25 00:57:43 +02:00
pictuga 9e71de8d40 readabilite: improve output 2018-10-24 23:49:16 +02:00
pictuga 787d90fac0 readabilite: some technical improvements for score
Linear, removed misplaced debugging code
2018-10-24 23:47:37 +02:00
pictuga 040d2cb889 readabilite: improve word count 2018-10-23 00:09:34 +02:00
pictuga 9fcef826f5 reader: everything in one file
Including css & js. Should now works by itself
2018-10-22 23:55:14 +02:00
pictuga e72ca3f984 morss: improved output type 2018-09-30 22:02:29 +02:00
pictuga 2ccf36617a morss: improve http parameter parsing 2018-09-30 22:01:19 +02:00
pictuga 945e0dceab crawler: typo in comment 2018-09-30 21:59:50 +02:00
pictuga 5111d40011 feedify: update rules
They obviously no longer worked after so long without updating them...
2018-09-30 21:54:10 +02:00
pictuga f9217102f3 crawler: fix sqlite/binary issue 2017-11-25 19:58:14 +01:00
pictuga 21480f90de Move from gzip to zlib to decompress data
Faster on incomplete files
2017-11-25 19:57:41 +01:00
pictuga d091e74d56 crawler: add MySQL backend
With extra dependency
2017-11-04 14:51:41 +01:00
pictuga f29a107a09 crawler: make SQLiteCache inherit from BaseCache
Saves some time for other cache backends
2017-11-04 14:48:00 +01:00
pictuga 2d5bf7b38b Fix xml detection regex
Also (dirtily) fixes #18 for now
2017-11-04 14:21:05 +01:00
pictuga b7db78f631 crawler: use BLOB in sqlite and drop "buffer"
Can't really remember why "buffer" was introduced in the first place
2017-11-04 13:54:40 +01:00
pictuga 203ba10dbd Explain __init__.py and __main__.py use 2017-11-04 13:17:12 +01:00
pictuga 194465544a crawler: separate CacheHander and actual caching
Default cache is now just an in-memory {}
2017-11-04 12:41:56 +01:00
pictuga 523b250907 crawler: SQL request in CAPS for readability 2017-11-04 12:36:58 +01:00
pictuga 2d7d0fcdca morss: fix cgi in python 3
Needs explicit [] in py3
2017-11-04 12:27:47 +01:00
pictuga a8c2df7f41 crawler: fix truncated gzip reader
For python 3
2017-11-04 12:07:08 +01:00
pictuga d39d0f4cae crawler: properly define default sqlite file 2017-11-02 22:50:40 +01:00
pictuga f563040809 readabilite: threshold to detect if it contains an article
Useful for videos/images-based images
2017-10-28 01:30:21 +02:00
pictuga 0df6409b0e crawler: use `with con` to commit, journal WAL for perf 2017-10-28 01:28:47 +02:00
pictuga 7b85f692a0 crawler: fix encoding detection 2017-10-27 23:14:08 +02:00
pictuga 840842d246 crawler: limit download to 500KiB
More can only be linked to a fraudulent/incorrect use of the service
2017-10-27 23:12:40 +02:00
pictuga fbe811384a crawler: add (unused) DebugHandler to output headers sent/received
Saves a lot of time when debugging
2017-10-27 23:10:03 +02:00
pictuga 64babd6713 morss: make readabilite links absolute 2017-07-29 14:37:37 +02:00
pictuga 3bfad54add readabilite: change cleaning & code structure
Kinda struggled to make some "nice" code
2017-07-17 00:27:41 +02:00
pictuga 386bafd391 readabilite: write_all use "node" instead of "item" 2017-07-17 00:13:15 +02:00
pictuga a61b259792 readabilite: easy option to highlight the nodes 2017-07-17 00:11:49 +02:00
pictuga c52b47616d readabilite: always return common of 2 best nodes
Better results. Less is not more
2017-07-17 00:10:58 +02:00
pictuga bfdda18b9c readbilite: better explain lowest_common output 2017-07-17 00:08:00 +02:00
pictuga 2afea497a3 readabilite: br2p use "node" instead of "item"
Confusing with rss items otherwise
2017-07-17 00:06:39 +02:00
pictuga 843dc97fbf readabilite: change scoring algorithm
Use 3 groups of keywords instead
2017-07-17 00:01:44 +02:00
pictuga df22396838 Only use chardet on 2k letters
Takes forever otherwise
2017-07-16 23:59:06 +02:00
pictuga 6f0efd5802 crawler: add cookies support
Somehow got dropped when splitting the big handler
2017-03-25 19:51:42 -10:00
pictuga d3bc2926fc Remove :hungry
Mostly usless. If you need it, you might as well not need to use morss in the first place...
2017-03-25 13:52:58 -10:00
pictuga 505b02d70d crawler: remove debugging print() 2017-03-25 13:45:12 -10:00
pictuga 3ca6ed5bb0 readabilite: add author/about to black list 2017-03-24 22:02:41 -10:00
pictuga 4aa25bf3d8 readabilite: clean_html before scoring
Surprisingly efficient
2017-03-24 21:50:46 -10:00
pictuga bfefa8d599 readabilite: add tags to black list 2017-03-24 21:50:26 -10:00
pictuga 91da0f36dc readabilite: comment the clean_html function 2017-03-24 21:50:01 -10:00
pictuga 67889a1d14 readabilite: drop useless tags
This extra cluster actually jams the algorithm
2017-03-24 21:49:14 -10:00
pictuga 167e3e4a15 feedify: accept xpath rules passed as parameters 2017-03-20 20:56:48 -10:00
pictuga bf3ef586c2 feedify: remove unused downloader 2017-03-20 20:53:52 -10:00
pictuga 08f08ef704 improve morss url detection regex 2017-03-20 20:51:13 -10:00
pictuga 1b4341f741 accept query_string in morss cgi 2017-03-20 20:50:04 -10:00
pictuga f965566054 feedify; make function use clearer 2017-03-20 20:19:08 -10:00
pictuga d6882e0a6a readabilite: (try to) emprove detection
Kinda hopeless
2017-03-19 02:00:31 -10:00
pictuga 79a8ada9f4 readabilite: add tags to score 2017-03-19 01:57:54 -10:00
pictuga 4a5150e030 readabilite: fix iter while iterating 2017-03-19 01:56:33 -10:00
pictuga e65c88abf8 readabilite: fix re.match 2017-03-19 01:55:40 -10:00
pictuga 9c331300eb crawler: move UAHandler to basic
Fuck u feedburner
2017-03-19 01:49:17 -10:00
pictuga 5e61686373 Only use full feed for articles & feedify
Sometimes using referrer and/or useragent makes some dumb websites return diferent content (hello feedburner)
2017-03-18 23:43:28 -10:00
pictuga 0b6e553054 Move iTunes code to feedify.py 2017-03-18 23:41:37 -10:00
pictuga d4937812a8 Remove HTTPError code
Only used to look nice but useless (inherits from IOError anyway)
2017-03-18 23:39:32 -10:00
pictuga 99f3c519f2 crawler: fix accept code 2017-03-18 23:37:51 -10:00
pictuga 67f5a21019 Move build_opener to crawler
Forgotten
2017-03-18 23:03:04 -10:00
pictuga f7d570d4c8 crawler: add some broken as rss mimetype
Seen out there
2017-03-18 23:00:13 -10:00
pictuga 2003e2760b Move custom_handler to crawler
Makes more sense. Easier to reuse. Also cleaned up a bit the code
2017-03-18 22:51:27 -10:00
pictuga e1a13a623c crawler: remove unefficient feedburner-specific code 2017-03-18 22:31:03 -10:00
pictuga 367f86987d readabilite: spread score to all ancestors
Instead of just parents and grandparents
2017-03-18 22:24:38 -10:00
pictuga e3ab3c6823 crawler: use less tertiary operator
Inherited from fork
2017-03-18 22:23:39 -10:00
pictuga 65055290d4 crawler: better use of chardet
Scan whole doc since beginning of html pages tends to be too regular. Ignore ASCII detection for the same reason.
2017-03-18 22:19:54 -10:00
pictuga 9ee6ff60e1 crawler: 301 http code doesn't respect headers
More or less according to the specs
2017-03-18 22:18:10 -10:00
pictuga f4abc4e8a4 Detect encoding (using crawler) before readabilite 2017-03-11 02:30:57 -10:00
pictuga c952b85d92 crawler: cache 301 HTTP code, for a week 2017-03-09 09:37:05 -10:00
pictuga e8023e4336 crawler: remove unused NotInCache error-class 2017-03-09 09:35:40 -10:00
pictuga 385f9eb39a morss: use crawler strict accept for feed 2017-03-08 19:05:48 -10:00
Florian Muenchbach 993ac638a3 Added override for auto-detected character encoding of parsed pages. 2017-03-08 18:45:20 -10:00
pictuga 627163abff Make cache settings in morss nicer 2017-03-08 18:09:24 -10:00
pictuga e5f8e43659 Shifted the <link rel='alternate'/> redirect to crawler
Now using MIMETYPE var from crawler within morss.py
2017-03-08 18:03:34 -10:00
pictuga fb8825b410 crawler: parse html to get http-equiv
For sure slower, but way cleaner (and probably more stable)
2017-03-08 17:50:57 -10:00
pictuga f4f6a86147 feeds: make wheezy.template mandatory
Cleaner code. Less confusing.
2017-03-08 15:38:59 -10:00
pictuga ad9bf946ec crawler: use chardet again
Always nice in case no encoding is specified. Somehow got dropped with commit 245ba99. Most probably by accident
2017-03-08 11:37:12 -10:00
pictuga 3fc89d5359 readabilite: improve score for <p>
Helps a lot with bbc, le monde. Might backfire on other websites tho...
2017-03-01 18:02:45 -10:00
pictuga a8ac2ed1ca Turn FeedBefore/After into ItemBefore/After
To reduce the number of loops
2017-02-28 23:24:32 -10:00
pictuga fcc5e8a076 Add "Feed/Item" in functions name
To make it instantly clearer what they work on
2017-02-28 23:23:15 -10:00
pictuga 60e3311e97 Use readabilite properly
Not thru some weird wrapper anymore
2017-02-28 22:45:26 -10:00
pictuga dc8423550f Support xml starting with \s 2017-02-25 19:04:32 -10:00
pictuga e0f533ca31 readabilite: test to replace <br/> with div 2017-02-25 18:16:15 -10:00
pictuga c6c113b8a8 readabilite: function to clean up the html code 2017-02-25 18:15:33 -10:00
pictuga 58d9f65735 readabilite: explain the use of .tail 2017-02-25 18:14:13 -10:00
pictuga a5aec8c7a6 readability: more keywords to the filter list
Also fixed indentation
2017-02-25 18:13:15 -10:00
pictuga 026903ce73 crawler: change http header after uncompressing
Change content-encoding to "identity"
2017-02-25 18:10:43 -10:00
pictuga e71fc967ce readabilite: shift "good" tags to a var (list)
So that this list can later be re-used
2017-02-25 18:07:28 -10:00
pictuga b14381f575 Use internal readability fork
Much simpler, doesn't clean the html, probably less efficient, but much faster
2016-05-31 02:50:03 +02:00
pictuga 2b9bfb47e5 Remove :smart and etag headers
Dirty code, not very useful. Use simple cache-control instead.
2016-05-31 02:47:49 +02:00
pictuga 4ff80cec86 Check argv length before using it 2016-05-31 02:46:28 +02:00
pictuga 466d8e47d6 Also make buriy's readability port compatible
Should be faster, and it now supports py3
2015-08-29 18:33:12 +02:00
pictuga 95d9d847e9 :proxy implies :keep 2015-08-29 17:48:07 +02:00
pictuga 8a1c00abf0 Typo in python version check 2015-08-28 19:29:09 +02:00
pictuga 624fa47f4f Allow CLI change of the www/ path 2015-08-28 19:22:55 +02:00
pictuga 31fc939d52 Allow CLI change of the http server port 2015-08-28 19:22:23 +02:00
pictuga 4f9000beed Comment code of launching modes 2015-08-28 19:18:09 +02:00
pictuga 5e87b56a03 Return error code in plain text in file server 2015-08-28 19:16:15 +02:00
pictuga ffda3fac7e Improve file detection in web server 2015-08-28 19:15:40 +02:00
pictuga 6741a408dd Remove now-useless ca-cert file path 2015-08-28 19:13:54 +02:00
Massimo Vannucci 8656e53b84 Correct Python version check 2015-08-05 23:36:11 +02:00
Massimo Vannucci 098a306c91 Fixed typo 2015-08-05 23:24:44 +02:00
pictuga 5c2151ffd6 Improve widely feedsportal url decoder 2015-06-14 20:32:47 +08:00
pictuga 8418212475 Use good path for html template access 2015-05-04 22:26:31 +08:00
pictuga 931fd53da6 Fix 304-cache handling
To make sure that the cached request also gets processed (by GZip and stuff)
2015-05-04 22:25:26 +08:00
pictuga ae062ebe90 Remove deprecated https error catch 2015-04-07 18:59:37 +08:00
pictuga 7a3b257328 Make :mono use basic loop
Makes profiling easier
2015-04-07 18:16:08 +08:00
pictuga 2f86a2a44b Remove useless obscure cgi code 2015-04-07 09:49:44 +08:00
pictuga 131ba09207 Change :cache mode behavior
Makes underlying code way cleaner
2015-04-07 09:38:22 +08:00
pictuga cafb87d561 Fix sqlite relative path in cgi 2015-04-07 09:37:25 +08:00
pictuga decb3f15f6 Move the mod_cgi files to /cgi/ 2015-04-07 09:36:00 +08:00
pictuga b267791199 Remove hashbang from __init__.py 2015-04-07 09:34:22 +08:00
pictuga acae47dc79 2to3: fix cli_app string print 2015-04-06 23:27:15 +08:00
pictuga 32aa96afa7 Cache HTTP content using a custom Handler
Much much cleaner. Nothing comparable
2015-04-06 23:26:12 +08:00
pictuga 006478d451 2to3: fix feeds.py string handling
Use bytes strings
2015-04-06 23:13:46 +08:00
pictuga a35225a234 2to3: fix feedify string handling 2015-04-06 23:12:50 +08:00
pictuga 1b4fc88ad0 Replace MetaRedirect handler with two cleaner ones
One for <meta http-equiv> and one for HTTP 'refresh' header
2015-04-06 23:03:17 +08:00
pictuga f2fe4fc364 Drop HTTPS SSL certificate verification
Breaks everything with python 3. Now built-in in recent python 2.7.9 and python 3.4-ish
2015-04-06 22:54:59 +08:00
pictuga 88af80e817 feeds: no need to decode xml strings
It event makes python3 lxml get angry
2015-04-06 22:37:33 +08:00
pictuga 1335b3fdda feedify: use better relative path for the .ini 2015-04-06 22:19:13 +08:00
pictuga c41c0761b6 feedify: don't insert useless url when none is found 2015-04-06 22:15:59 +08:00
pictuga dbc92068f0 feedify: explanation of methods' purpose
Kinda messy when reading code after a year
2015-04-06 22:11:31 +08:00
pictuga 9d64c31947 Feeds: use crawler.py encoding detection 2015-03-24 23:23:40 +08:00
pictuga 29d9e4702f Force enc det to return utf-8 rather than nothing 2015-03-24 23:22:56 +08:00
pictuga 2e3b766a0a http-server port as a var, print port on startup 2015-03-24 23:20:06 +08:00
pictuga b3572e143d New way of calling the program
python -m morss, python morss/main.py
2015-03-11 14:23:14 +08:00
pictuga 656b29e0ef 2to3: using unicode/str to please py3 2015-03-11 01:05:02 +08:00
pictuga cbeb01e555 2to3: fix urllib header retrieval 2015-03-11 01:03:16 +08:00
pictuga 6ae60d0343 2to3: py3-compatible readability fork 2015-03-03 01:03:03 +08:00
pictuga 28bb4b8647 2to3: csv (with if python 3) 2015-03-03 00:59:33 +08:00
pictuga 2f542005d1 2to3: urllib host 2015-03-03 00:59:00 +08:00
pictuga 9bc5b0c7f7 2to3; ordereddict fallback was for python2.6 2015-03-03 00:57:09 +08:00
pictuga dbb3883516 2to3: urllib mimetype 2015-03-03 00:55:58 +08:00
pictuga 7bd448789d 2to3: first attempt to fix strings 2015-02-26 00:50:23 +08:00
pictuga 071288015b 2to3: morss.py port xrange 2015-02-25 18:41:49 +08:00
pictuga 803d6e37c4 2to3: morss.py port most default libs 2015-02-25 18:36:27 +08:00
pictuga 327b8504c4 2to3: feeds.py port urllib2 2015-02-25 18:22:38 +08:00
pictuga 4f6f8bd41b 2to3: feedify.py port http-related lib 2015-02-25 18:16:35 +08:00
pictuga a0f2e0d995 2to3: crawler.py improve except 2015-02-25 18:07:09 +08:00
pictuga 6a06b742f9 2to3: crawler.py port try as 2015-02-25 18:03:54 +08:00
pictuga c2d85e2bf9 2to3: crawler.py port httplib 2015-02-25 18:02:29 +08:00
pictuga 4f224888d8 2to3: crawler.py port urllib2 and StringIO 2015-02-25 17:53:36 +08:00
pictuga 27cf8f6498 2to3: (iter)items to list 2015-02-25 12:02:53 +08:00
pictuga 3fb90cb7b4 2to3: local import 2015-02-25 11:57:10 +08:00
pictuga 47c8a511ff 2to3: print's 2015-02-25 11:57:10 +08:00
pictuga 604b03e2ba Delete desc when :keep=False
Still needed for Firefox, cause empty <desc/> still show up instead of content in feed preview
2015-02-24 00:38:34 +08:00
pictuga 83ed440e67 Fix issue when desc and content empty
Wouldn't put fetched article in feed
2015-02-24 00:38:02 +08:00
pictuga 5c23f90f0b Disable options filtering by default
But still provide sample code
2015-02-21 02:01:32 +08:00
pictuga 149117029c Improve logging of fetching errors 2015-02-21 01:58:45 +08:00
pictuga d5269964fc Make :theforce also bypass http errors 2015-02-21 01:58:16 +08:00
pictuga f0dcb9912e Fix cached errors handling 2015-02-21 01:57:33 +08:00
pictuga f62aedda12 Double HTTP timeout
Better slow than nothing (especially when running on a personal computer)
2015-02-21 01:55:53 +08:00
pictuga 76c4211a04 Make :hungry more useful 2015-02-21 01:55:25 +08:00
pictuga 446dd9fb3f Fix typo in FeedListDescriptor
Thanks @tehsphinx. Fixes #4.
2015-02-20 17:41:14 +08:00
pictuga ef946c0712 XML pretty-print in separate option
Who reads plain XML anyway?
2015-02-20 17:38:39 +08:00
pictuga fcf4197801 Populate __init__.py 2015-02-19 13:05:59 +08:00
pictuga ec5f5b865f Make it easy to restrict available options 2014-11-21 22:01:03 +01:00
pictuga 105ca67744 Move facebook token to own script
To a PHP script actually. Not sure why PHP. Keeps morss' code cleaner. This piece of code had nothing to do in there, and didn't bring any advantage.
2014-11-19 20:09:27 +01:00
pictuga a9654ea578 Fix encoding detection in feedify 2014-11-19 12:25:18 +01:00
pictuga 8131ea2244 HTTPS SSL certificate validation
Specific error message added
2014-11-19 11:59:59 +01:00
pictuga 1b26c5f0e3 Split SimpleDownload in a lot of Handlers
Cleaner code, easier to edit, more flexibility. Paves the way to SSL certificates validation.
Still have to clean up the code of AcceptHeadersHandler.
2014-11-19 11:57:40 +01:00
pictuga f46576168a Add :mono to disable multithreading
Convenient to have linear logging
2014-11-10 23:14:54 +01:00
pictuga 5dd262139d Add HTTP error code to download error message 2014-11-09 15:45:01 +01:00
pictuga 6d5bb2b3c5 Print error message in wgi mode 2014-11-09 15:44:42 +01:00
pictuga a820cf6812 Run :strip in After
Makes more sense
2014-11-09 15:01:50 +01:00
pictuga 607df4b123 Fix Twitter
They changed the html structure of the profile pages
2014-11-09 15:00:38 +01:00
pictuga 5eefe2c916 Log more when using wgi 2014-11-08 21:22:34 +01:00
pictuga 6f2061ff37 Fix :smart
Wasn't using the right way
2014-11-08 21:22:07 +01:00
pictuga 40834eeb93 Split After into Before/After
Needed since a bunch of options needed to be run before the actual fetching (cause no-one needs to fetch the articles of to-be-dropped items)
2014-11-08 20:31:29 +01:00
pictuga f20fb9cdf6 Use more stable loop-over-list in Gather 2014-11-08 20:30:36 +01:00
pictuga 6a40731248 Return output when DEBUG is on
Much more convenient to actually debug
2014-11-07 18:44:59 +01:00
pictuga d3eb2dd88d Implement :smart to save bandwidth 2014-11-07 18:40:44 +01:00
pictuga 67fc5f06f8 Run "After" even when debug mode is on 2014-11-06 21:15:16 +01:00
pictuga ad2673f474 Add :emtpy to remove all items
This is completely useless...
2014-11-06 21:14:41 +01:00
pictuga ecfda1d05a Add :strip to remove desc and content 2014-11-06 21:14:20 +01:00
pictuga 1a8ee716f3 Add "search" option
PLEASE NOTE that this is case sensitive and does really basic research ("is xyz in the title?"). Don't use this for fine filtering.
Also fixed an issue with After(), due to the fact that some functions were removing items from the feed while looping over the feed items, creating some anoying item-skipping issues.
2014-11-06 21:11:23 +01:00
pictuga 690bf43977 reader: show desc if no content is available 2014-10-26 19:22:57 +01:00
pictuga 0e22bb4316 Cache: catch json parse erros 2014-09-28 12:03:58 +02:00
pictuga 5f8288eecb Add :hungry to fill feeds with long intros 2014-06-28 01:43:31 +02:00
pictuga ac69b28f1b Pass options to Fill 2014-06-28 01:43:09 +02:00
pictuga 6cc3e7eb93 Fix :callback and add content-type 2014-06-28 01:20:47 +02:00
pictuga 0ec7c2f3e6 Fix :callback crash 2014-06-28 01:13:29 +02:00
pictuga 484432d804 Add :callback for JSONP calls 2014-06-28 00:59:57 +02:00
pictuga 226441d821 Add :cors for cross-domain XHR (with README update) 2014-06-28 00:59:13 +02:00
pictuga 230659a34b Reenable args with values 2014-06-28 00:58:37 +02:00
pictuga 38b90e0e4c Fix template syntax 2014-06-22 20:23:32 +02:00
pictuga d877e856d3 Fix feed.items.append since pep8
The underscore naming convention was not yet applied in that function
2014-06-22 20:13:36 +02:00
pictuga ee3b2590d0 Remove useless line-break (pep8) 2014-06-22 20:00:44 +02:00
pictuga 5a0084c7cc Fix isPermaLink in feedify 2014-06-22 19:54:13 +02:00
pictuga e991d356f4 Fix duckduckgo layout in .ini 2014-06-22 19:53:53 +02:00
pictuga ecabbc0175 Replace <a> with <span> in reader with :noref 2014-06-22 19:42:52 +02:00
pictuga 6352ef28a9 Use pep8-like layout for .ini 2014-06-22 02:14:11 +02:00
pictuga 3ca5dbaf31 Raise ImportError when missing dependency for call 2014-06-22 02:04:14 +02:00
pictuga 9f51448160 Use xrange where applicable (faster) 2014-06-22 02:02:43 +02:00
pictuga f01efb7334 Make most of the code pep8-compliant
Thanks a lot to github.com/SamuelMarks for his nice work
2014-06-22 01:59:01 +02:00
pictuga da0a8feadd Replace TABS with FOUR SPACES in .py
(you might want to use: git diff -w)
2014-06-21 18:35:59 +02:00
pictuga da857f8bb2 Remove useless odata var in morss/morss.py 2014-06-21 18:25:50 +02:00
pictuga 286b90ab8e Fix typo in error raising message 2014-06-21 16:29:05 +02:00
pictuga cc27483143 Remove ununsed imports 2014-06-21 16:13:54 +02:00
pictuga 1cf959ce5b Fix item.link deletion 2014-06-21 16:08:37 +02:00
pictuga de5b75162c Add :ad mode (as an example)
Not really useful, but shows how to quickly add/remove items from the feed
2014-06-16 14:07:59 +02:00
pictuga 850d574424 Add one comment
Was waiting to be committed for months...
2014-06-16 14:07:23 +02:00
pictuga 45478b592e Remove cache-redirect
Some kind of no-longer-working code left-over
2014-06-16 14:06:42 +02:00
pictuga 8270685ac6 Use longer timeout for xml fetching 2014-06-16 14:03:24 +02:00
pictuga 0e3751c712 Remove useless comment 2014-06-16 14:02:54 +02:00
pictuga 862fe3cae4 Use more recent user-agent 2014-06-16 14:01:01 +02:00
pictuga 7211093cc5 Add :smart :noref modes, update README 2014-06-16 14:00:02 +02:00
pictuga f991802d9e Try to use less server-specific code for FB tokens 2014-06-16 13:57:53 +02:00
pictuga 9285525256 Unify internal/external errors 2014-06-16 13:55:59 +02:00
pictuga cdef40fbbe Fix Cache saving crash
Because was deleting values of a dict while looping over its values...
2014-06-07 19:14:31 +02:00
pictuga f90958149e Add :reader
Uses wheezy.template, which is said to be fast and light. Provided template file is really basic, custom css suggested.
2014-05-29 14:12:16 +02:00
pictuga b66ac2bc5e Make it possible not to use caching 2014-05-24 19:13:41 +02:00
pictuga 25fdca4bf0 Add do-it-all function
For quick lib use
2014-05-24 19:02:22 +02:00
pictuga 26c91070f5 Time-based Cache
Solves the :proxy issue for good. More convenient, more flexible
2014-05-24 19:01:21 +02:00
pictuga 5e64696031 Fix '/morss.py/' url fixer 2014-05-22 22:53:36 +02:00
pictuga 364fbc4ba6 Remove apparent limit
Cause no longer works, cause of all-bool args introduced earlier
2014-05-22 22:52:49 +02:00
pictuga b03d865b7b Get rid of ParseOptions()
That thing wasn't nice, and depended too much on the various use case. The new approach is to turn morss into a library and turn the use cases into some pre-implemented lib usages
2014-05-22 22:44:59 +02:00
pictuga 3c48c58127 Remove useless HOLD var
Was needed in DEBUG at some point
2014-05-21 12:19:49 +02:00
pictuga e8e7f170a6 Include super dumb http file server
For index.html, other files can be added, but everything has to be hard-coded (mimetype included)
2014-05-18 12:34:23 +02:00
pictuga c41a1fe226 Support for wikipedia fetured articles feed
Should work with most wikipedias
2014-05-18 12:17:14 +02:00
pictuga d8a3c4e9af Add support for Google News 2014-05-18 11:58:45 +02:00
pictuga bbf1ffbb15 Remove 'persistent' and 'dic' arg in Cache
'dic' was mostly intended for facebook now-bygone advanced buggy token storage. 'persistent' was needed by fb and 'proxy' mode, but a small workaround was found for the proxy mode (basically making sure the cache object is always at least 5-item long)
2014-05-15 00:54:40 +02:00
pictuga 76e7f1ea00 Try to use more generic 302/303 redirections
Still far from being great, but at least I can use it on both morss.it and test.morss.it now
2014-05-14 15:05:14 +02:00