morss

Commit Graph

Author	SHA1	Message	Date
pictuga	0657077191	Add support for twitter Grabs "feed" from the html page, clips tweet and article together.	2013-09-25 12:37:14 +02:00
pictuga	da14242bcf	Add feedify, and use it in morss	2013-09-25 12:36:21 +02:00
pictuga	9bc4417be3	More flexible xml caching New includes a 'type' var, to remember what we did out of it (normal, nothing, grabbed xml link, etc). xml/html mimetype are now saved in a dict, for easier editing, and consistency.	2013-09-25 12:32:40 +02:00
pictuga	edff54a016	Add pushContent in feeds.py Useful for twitter (later) for it's "clip" toggle, which keeps the original desc/content above the article. Makes changing the content, while keeping the original stub in place, easier.	2013-09-25 12:18:22 +02:00
pictuga	208d70d3db	Use separate var in Fill for final url That way the url can be changed altogether for the article-fetching part, without changing the item link itself. Useful for upcoming twitter feeds.	2013-09-25 11:51:48 +02:00
pictuga	fd1501a0c0	Check relative url earlier	2013-09-25 11:49:45 +02:00
pictuga	1e621099e0	Log cache hash in Gather	2013-09-25 11:15:11 +02:00
pictuga	3d6d7e70b6	Remove useless "as" in error catch	2013-09-25 11:14:22 +02:00
pictuga	e73cbf56c2	Add 'html' option, usefull to see error on server	2013-09-25 11:13:33 +02:00
pictuga	03014a8cbf	Typo in UA_HTML var name	2013-09-25 11:11:11 +02:00
pictuga	4a5cbcfd18	Move httplib in common code Needed for error catch	2013-09-25 11:10:16 +02:00
pictuga	3fd34ff1a6	decodeHTML works without connection object	2013-09-25 11:08:58 +02:00
pictuga	658f51e5a9	Support feeds handed out as text/html <http://www.pro-linux.de/rss/index1.xml> and <http://tehrantimes.com/index.php?option=com_ninjarsssyndicator&feed_id=1&format=raw> are on an equal footing…	2013-09-16 00:33:24 +02:00
pictuga	8eb2f7c249	Added another letter to feedsportal table	2013-09-15 19:38:59 +02:00
pictuga	23246ca6c1	Save the key in cache file	2013-09-15 19:20:51 +02:00
pictuga	1b7777c331	Find RSS links within html pages' <head> And cache those links	2013-09-15 19:19:50 +02:00
pictuga	1bd17f1365	Faster relative link resolution	2013-09-15 19:18:39 +02:00
pictuga	7575291f8f	Log url in Gather Useful for upcoming commits	2013-09-15 18:53:35 +02:00
pictuga	532852a408	Use cleaner http error catch One error type was inheriting from another one	2013-09-15 18:52:34 +02:00
pictuga	43bf021f23	Catch more http exceptions Such as InvalidURL. Subclasses of httplib.HTTPException	2013-09-15 17:19:33 +02:00
pictuga	9252e75923	Ensure var in parseOptions are defined Caused a bug on morss.it's server	2013-09-15 15:56:08 +02:00
pictuga	75b51fc2c2	Add ability to bypass ETag support Add the ":force" argument over http to bypass ETag support, which is convenient to debug code	2013-09-15 15:54:42 +02:00
pictuga	d2de6cf23d	Extra doc for DELAY, for xml cache, & now for etag	2013-09-15 15:45:15 +02:00
pictuga	89187ab6a6	Log generation time	2013-09-15 15:44:25 +02:00
pictuga	04840d9843	More flexible parameters can be passed Multiple parameters can now be passed. HTTP "API" has been improved, and url now have to be like "http://<path to morss>/:<param1>:<param2>/<url>". The code handling the parameters parsing is now way cleaner. Debug toggle is now a var, which can be changed with parameters. Also http logging is no longer done into a file, which tended to grow way too fast, while lacking an "error 403 protection", but instead the parameter ":debug" can be passed in the url, and the page will be delivered as "text/plain" with the debug written into it. Therefore some logging had to be moved around, so as not to output anything during http headers definition.	2013-09-15 15:38:03 +02:00
pictuga	c25aec7107	Only perform <meta> redirects on html pages	2013-09-15 15:33:14 +02:00
pictuga	3ba74649f6	Test if linked pages are text documents Useful for feeds such as HackerNews	2013-09-10 15:25:55 +02:00
pictuga	5ebd84ee55	Fix broken feeds.py calls for items count	2013-09-08 15:47:15 +02:00
pictuga	d3c163fb74	Use ETag for user-side caching Pretty hard-code ETag use. ETag is just a timestamp, and the server checks whether it's recent enough.	2013-08-24 23:43:32 +02:00
pictuga	e2c3375eb6	Log url earlier Now logging it in both use cases	2013-08-24 23:41:40 +02:00
pictuga	0c6e28205a	Use seconds for every parameter	2013-08-24 23:40:37 +02:00
pictuga	b350602232	Remove legacy "xml map" declaration	2013-08-24 23:16:23 +02:00
pictuga	1ba22516fe	Small help for etag handler	2013-07-19 00:02:52 +02:00
pictuga	90efb84c57	Don't log word counts Nobody cares	2013-07-18 23:55:58 +02:00
pictuga	9e324465e4	Use etag/last-modified to fetch xml feeds	2013-07-18 23:54:13 +02:00
pictuga	70df746416	Accept None as value to cache	2013-07-18 23:51:11 +02:00
pictuga	71129b5898	Fix headers definition Based on what's done inside urllib2.py.	2013-07-17 14:41:29 +02:00
pictuga	d3213ea1e7	Implement user-agent in HTMLDownloader It was forgotten in the previous commit	2013-07-17 14:40:29 +02:00
pictuga	918dede4be	Extend urllib2 to download pages, use gzip Cleaner than dirty function. Handles decoding, gzip decompression, meta redirects (eg. Washington Post). Might need extra testing.	2013-07-16 23:33:45 +02:00
pictuga	1fa8c4c535	Remove cleanXML() This function is way too strong, and no longer needed (even for the targeted feed). It lead to other bugs with other feeds, where needed spaces were stripped.	2013-07-15 11:10:19 +02:00
pictuga	0718303eb7	Use ' instead of " when possible	2013-07-14 19:00:16 +02:00
pictuga	7275bb1a59	Better content insertion Also takes care of description, by creating one, when missing.	2013-07-14 18:58:48 +02:00
pictuga	054f5c0846	Detect provided content with word count This is instead of character count.	2013-07-14 18:57:12 +02:00
pictuga	7fa183d713	Change morss.py to use feeds.py No other changes should appear in this commit	2013-07-14 18:44:11 +02:00
pictuga	cf3934a513	Change http output mimetype to xml	2013-06-28 13:34:12 +02:00
pictuga	1f4c219880	Common code for url/options handling	2013-06-25 13:13:23 +02:00
pictuga	d2418a47c2	Add support for reddit.com feeds The content of the linked article is used for the content. The original content (with a link to comments) is still available in the "description" of the feed item.	2013-06-11 13:02:47 +02:00
pictuga	f0b237364f	Better annotation of feedsburner/feedsportal code	2013-06-11 13:02:16 +02:00
pictuga	0978e76356	str.decode() within EncDownload()	2013-06-08 17:32:55 +02:00
pictuga	89354e1528	Use file's built-in readlines() to split file	2013-06-08 17:30:53 +02:00

1 2

92 Commits (9f22402dcada341b2ab8fc3402447407bdf5567d)