308 Commits

Author SHA1 Message Date
04840d9843 More flexible parameters can be passed
Multiple parameters can now be passed. HTTP "API" has been improved, and url now have to be like "http://<path to morss>/:<param1>:<param2>/<url>". The code handling the parameters parsing is now way cleaner. Debug toggle is now a var, which can be changed with parameters. Also http logging is no longer done into a file, which tended to grow way too fast, while lacking an "error 403 protection", but instead the parameter ":debug" can be passed in the url, and the page will be delivered as "text/plain" with the debug written into it. Therefore some logging had to be moved around, so as not to output anything during http headers definition.
2013-09-15 15:38:03 +02:00
c25aec7107 Only perform <meta> redirects on html pages 2013-09-15 15:33:14 +02:00
3176c2a8e8 Fix bad characters detection
Now works with any encoding, no longer restricted to utf-8. Uses regex to find encoding (not perfect, but rather fast, since it's used on a substring)
2013-09-15 14:57:37 +02:00
3ba74649f6 Test if linked pages are text documents
Useful for feeds such as HackerNews
2013-09-10 15:25:55 +02:00
1b7fdad6a8 Improve broken XML support
TPB feed is a good example <http://rss.thepiratebay.sx/blog>. Now supports ampersand in feed, using the "recover" mode in etree.parse. Broken utf-8 strings in feed are now also supported.
2013-09-08 15:48:34 +02:00
5ebd84ee55 Fix broken feeds.py calls for items count 2013-09-08 15:47:15 +02:00
fe89a70f24 Add help for new classes 2013-09-01 19:00:22 +02:00
50f3c5a552 Use descriptors for lists and to replace property
Much nicer. Less duplicate code. More transparent. Big commit.
2013-09-01 18:52:07 +02:00
a94d659bc8 Make negation in README more obvious 2013-08-25 00:01:00 +02:00
d3c163fb74 Use ETag for user-side caching
Pretty hard-code ETag use. ETag is just a timestamp, and the server checks whether it's recent enough.
2013-08-24 23:43:32 +02:00
e2c3375eb6 Log url earlier
Now logging it in both use cases
2013-08-24 23:41:40 +02:00
0c6e28205a Use seconds for every parameter 2013-08-24 23:40:37 +02:00
b350602232 Remove legacy "xml map" declaration 2013-08-24 23:16:23 +02:00
1ba22516fe Small help for etag handler 2013-07-19 00:02:52 +02:00
90efb84c57 Don't log word counts
Nobody cares
2013-07-18 23:55:58 +02:00
9e324465e4 Use etag/last-modified to fetch xml feeds 2013-07-18 23:54:13 +02:00
70df746416 Accept None as value to cache 2013-07-18 23:51:11 +02:00
71129b5898 Fix headers definition
Based on what's done inside urllib2.py.
2013-07-17 14:41:29 +02:00
d3213ea1e7 Implement user-agent in HTMLDownloader
It was forgotten in the previous commit
2013-07-17 14:40:29 +02:00
918dede4be Extend urllib2 to download pages, use gzip
Cleaner than dirty function. Handles decoding, gzip decompression, meta redirects (eg. Washington Post). Might need extra testing.
2013-07-16 23:33:45 +02:00
1fa8c4c535 Remove cleanXML()
This function is way too strong, and no longer needed (even for the targeted feed). It lead to other bugs with other feeds, where needed spaces were stripped.
2013-07-15 11:10:19 +02:00
0718303eb7 Use ' instead of " when possible 2013-07-14 19:00:16 +02:00
7275bb1a59 Better content insertion
Also takes care of description, by creating one, when missing.
2013-07-14 18:58:48 +02:00
054f5c0846 Detect provided content with word count
This is instead of character count.
2013-07-14 18:57:12 +02:00
7fa183d713 Change morss.py to use feeds.py
No other changes should appear in this commit
2013-07-14 18:44:11 +02:00
8ac7d8b282 Add feeds.py
This is a huge change. Feed parsing is now done in a separate file, much cleaner. The code of the lib tends to repeat itself a lot though. It should be possible to improve it. Code should be more stable.
2013-07-14 18:25:49 +02:00
6e891ef6ff Nicer link display in readme 2013-07-11 14:17:04 +02:00
981e83fd1e Add link to online test version 2013-07-11 14:11:23 +02:00
cf3934a513 Change http output mimetype to xml 2013-06-28 13:34:12 +02:00
1f4c219880 Common code for url/options handling 2013-06-25 13:13:23 +02:00
89662ccbae typo in readme 2013-06-19 22:16:46 +03:00
16f2e3b4c3 todo and newsreader hook update in readme
Updated liferea use to reflect code changes. Link to morss.it as live "preview". Added a todo. Added dependencies list.
2013-06-19 21:12:03 +02:00
9ad9ffaf91 Use proper markdown for links in readme 2013-06-11 13:10:40 +02:00
d2418a47c2 Add support for reddit.com feeds
The content of the linked article is used for the content. The original content (with a link to comments) is still available in the "description" of the feed item.
2013-06-11 13:02:47 +02:00
f0b237364f Better annotation of feedsburner/feedsportal code 2013-06-11 13:02:16 +02:00
0978e76356 str.decode() within EncDownload() 2013-06-08 17:32:55 +02:00
89354e1528 Use file's built-in readlines() to split file 2013-06-08 17:30:53 +02:00
bbf5c92ba2 Fix lenHTML() with empty string 2013-06-08 17:30:11 +02:00
e05d1c9deb Replace uppercase title with "title-case" 2013-06-02 23:45:41 +02:00
f09dfbacf5 Warning in README: no http server provided 2013-05-23 21:54:11 +02:00
a8feac9811 Detail MAX settings in README 2013-05-23 21:48:45 +02:00
b78f0bfba5 Improve options and limits
New limits are possible: time limit, max number of item fetched, and max number of item taken from cache. Fill third argument is now Fast=True, which is self-explicit. (Complexity of the changes made separate commits impossible).
2013-05-15 17:56:58 +02:00
2a71fe07f2 Improve Cache code
Removed _new flag. Slightly more stable and cleaner.
2013-05-15 17:48:39 +02:00
bf647ba5f8 Make Fill return True when it had done sth useful 2013-05-15 17:38:52 +02:00
9694a31052 Add 'feedurl' argument to Fill
Was needed for commit f3c2c34
2013-05-15 17:36:00 +02:00
8e2aab55e7 Check url before looking for provided content
Also use lenHTML() function defined a lately
2013-05-15 17:32:42 +02:00
85e40cde4e Check article length is big enough
Avoids replacing rather useful descriptions with empty string
2013-05-15 17:24:27 +02:00
222b1369e5 Support for relative urls in feed 2013-05-15 17:13:57 +02:00
d88719c87f Use urlparse library to check feed urls 2013-05-15 17:12:59 +02:00
1506a5c0cd Fix string output in XMLMap 2013-05-05 16:04:42 +02:00