morss

Commit Graph

Author	SHA1	Message	Date
pictuga	78706952fe	Remove "clip" from Fill Put that in Gather. Also removed from feeds.py. "alone" mode was also added (it removes the description).	2013-10-01 19:45:54 +02:00
pictuga	1b7fe8fbee	Use "options" in Gather instead of "progress" Also made it possible to toggle Fill's toggle through parameters	2013-09-29 15:32:58 +02:00
pictuga	a5a327388a	Add ability not to fetch an item's article	2013-09-25 13:47:05 +02:00
pictuga	0657077191	Add support for twitter Grabs "feed" from the html page, clips tweet and article together.	2013-09-25 12:37:14 +02:00
pictuga	da14242bcf	Add feedify, and use it in morss	2013-09-25 12:36:21 +02:00
pictuga	9bc4417be3	More flexible xml caching New includes a 'type' var, to remember what we did out of it (normal, nothing, grabbed xml link, etc). xml/html mimetype are now saved in a dict, for easier editing, and consistency.	2013-09-25 12:32:40 +02:00
pictuga	edff54a016	Add pushContent in feeds.py Useful for twitter (later) for it's "clip" toggle, which keeps the original desc/content above the article. Makes changing the content, while keeping the original stub in place, easier.	2013-09-25 12:18:22 +02:00
pictuga	208d70d3db	Use separate var in Fill for final url That way the url can be changed altogether for the article-fetching part, without changing the item link itself. Useful for upcoming twitter feeds.	2013-09-25 11:51:48 +02:00
pictuga	fd1501a0c0	Check relative url earlier	2013-09-25 11:49:45 +02:00
pictuga	1e621099e0	Log cache hash in Gather	2013-09-25 11:15:11 +02:00
pictuga	3d6d7e70b6	Remove useless "as" in error catch	2013-09-25 11:14:22 +02:00
pictuga	e73cbf56c2	Add 'html' option, usefull to see error on server	2013-09-25 11:13:33 +02:00
pictuga	03014a8cbf	Typo in UA_HTML var name	2013-09-25 11:11:11 +02:00
pictuga	4a5cbcfd18	Move httplib in common code Needed for error catch	2013-09-25 11:10:16 +02:00
pictuga	3fd34ff1a6	decodeHTML works without connection object	2013-09-25 11:08:58 +02:00
pictuga	658f51e5a9	Support feeds handed out as text/html <http://www.pro-linux.de/rss/index1.xml> and <http://tehrantimes.com/index.php?option=com_ninjarsssyndicator&feed_id=1&format=raw> are on an equal footing…	2013-09-16 00:33:24 +02:00
pictuga	8eb2f7c249	Added another letter to feedsportal table	2013-09-15 19:38:59 +02:00
pictuga	23246ca6c1	Save the key in cache file	2013-09-15 19:20:51 +02:00
pictuga	1b7777c331	Find RSS links within html pages' <head> And cache those links	2013-09-15 19:19:50 +02:00
pictuga	1bd17f1365	Faster relative link resolution	2013-09-15 19:18:39 +02:00
pictuga	7575291f8f	Log url in Gather Useful for upcoming commits	2013-09-15 18:53:35 +02:00
pictuga	532852a408	Use cleaner http error catch One error type was inheriting from another one	2013-09-15 18:52:34 +02:00
pictuga	43bf021f23	Catch more http exceptions Such as InvalidURL. Subclasses of httplib.HTTPException	2013-09-15 17:19:33 +02:00
pictuga	9252e75923	Ensure var in parseOptions are defined Caused a bug on morss.it's server	2013-09-15 15:56:08 +02:00
pictuga	75b51fc2c2	Add ability to bypass ETag support Add the ":force" argument over http to bypass ETag support, which is convenient to debug code	2013-09-15 15:54:42 +02:00
pictuga	d2de6cf23d	Extra doc for DELAY, for xml cache, & now for etag	2013-09-15 15:45:15 +02:00
pictuga	89187ab6a6	Log generation time	2013-09-15 15:44:25 +02:00
pictuga	04840d9843	More flexible parameters can be passed Multiple parameters can now be passed. HTTP "API" has been improved, and url now have to be like "http://<path to morss>/:<param1>:<param2>/<url>". The code handling the parameters parsing is now way cleaner. Debug toggle is now a var, which can be changed with parameters. Also http logging is no longer done into a file, which tended to grow way too fast, while lacking an "error 403 protection", but instead the parameter ":debug" can be passed in the url, and the page will be delivered as "text/plain" with the debug written into it. Therefore some logging had to be moved around, so as not to output anything during http headers definition.	2013-09-15 15:38:03 +02:00
pictuga	c25aec7107	Only perform <meta> redirects on html pages	2013-09-15 15:33:14 +02:00
pictuga	3ba74649f6	Test if linked pages are text documents Useful for feeds such as HackerNews	2013-09-10 15:25:55 +02:00
pictuga	5ebd84ee55	Fix broken feeds.py calls for items count	2013-09-08 15:47:15 +02:00
pictuga	d3c163fb74	Use ETag for user-side caching Pretty hard-code ETag use. ETag is just a timestamp, and the server checks whether it's recent enough.	2013-08-24 23:43:32 +02:00
pictuga	e2c3375eb6	Log url earlier Now logging it in both use cases	2013-08-24 23:41:40 +02:00
pictuga	0c6e28205a	Use seconds for every parameter	2013-08-24 23:40:37 +02:00
pictuga	b350602232	Remove legacy "xml map" declaration	2013-08-24 23:16:23 +02:00
pictuga	1ba22516fe	Small help for etag handler	2013-07-19 00:02:52 +02:00
pictuga	90efb84c57	Don't log word counts Nobody cares	2013-07-18 23:55:58 +02:00
pictuga	9e324465e4	Use etag/last-modified to fetch xml feeds	2013-07-18 23:54:13 +02:00
pictuga	70df746416	Accept None as value to cache	2013-07-18 23:51:11 +02:00
pictuga	71129b5898	Fix headers definition Based on what's done inside urllib2.py.	2013-07-17 14:41:29 +02:00
pictuga	d3213ea1e7	Implement user-agent in HTMLDownloader It was forgotten in the previous commit	2013-07-17 14:40:29 +02:00
pictuga	918dede4be	Extend urllib2 to download pages, use gzip Cleaner than dirty function. Handles decoding, gzip decompression, meta redirects (eg. Washington Post). Might need extra testing.	2013-07-16 23:33:45 +02:00
pictuga	1fa8c4c535	Remove cleanXML() This function is way too strong, and no longer needed (even for the targeted feed). It lead to other bugs with other feeds, where needed spaces were stripped.	2013-07-15 11:10:19 +02:00
pictuga	0718303eb7	Use ' instead of " when possible	2013-07-14 19:00:16 +02:00
pictuga	7275bb1a59	Better content insertion Also takes care of description, by creating one, when missing.	2013-07-14 18:58:48 +02:00
pictuga	054f5c0846	Detect provided content with word count This is instead of character count.	2013-07-14 18:57:12 +02:00
pictuga	7fa183d713	Change morss.py to use feeds.py No other changes should appear in this commit	2013-07-14 18:44:11 +02:00
pictuga	cf3934a513	Change http output mimetype to xml	2013-06-28 13:34:12 +02:00
pictuga	1f4c219880	Common code for url/options handling	2013-06-25 13:13:23 +02:00
pictuga	d2418a47c2	Add support for reddit.com feeds The content of the linked article is used for the content. The original content (with a link to comments) is still available in the "description" of the feed item.	2013-06-11 13:02:47 +02:00
pictuga	f0b237364f	Better annotation of feedsburner/feedsportal code	2013-06-11 13:02:16 +02:00
pictuga	0978e76356	str.decode() within EncDownload()	2013-06-08 17:32:55 +02:00
pictuga	89354e1528	Use file's built-in readlines() to split file	2013-06-08 17:30:53 +02:00
pictuga	bbf5c92ba2	Fix lenHTML() with empty string	2013-06-08 17:30:11 +02:00
pictuga	e05d1c9deb	Replace uppercase title with "title-case"	2013-06-02 23:45:41 +02:00
pictuga	b78f0bfba5	Improve options and limits New limits are possible: time limit, max number of item fetched, and max number of item taken from cache. Fill third argument is now Fast=True, which is self-explicit. (Complexity of the changes made separate commits impossible).	2013-05-15 17:56:58 +02:00
pictuga	2a71fe07f2	Improve Cache code Removed _new flag. Slightly more stable and cleaner.	2013-05-15 17:48:39 +02:00
pictuga	bf647ba5f8	Make Fill return True when it had done sth useful	2013-05-15 17:38:52 +02:00
pictuga	9694a31052	Add 'feedurl' argument to Fill Was needed for commit f3c2c34	2013-05-15 17:36:00 +02:00
pictuga	8e2aab55e7	Check url before looking for provided content Also use lenHTML() function defined a lately	2013-05-15 17:32:42 +02:00
pictuga	85e40cde4e	Check article length is big enough Avoids replacing rather useful descriptions with empty string	2013-05-15 17:24:27 +02:00
pictuga	222b1369e5	Support for relative urls in feed	2013-05-15 17:13:57 +02:00
pictuga	d88719c87f	Use urlparse library to check feed urls	2013-05-15 17:12:59 +02:00
pictuga	1506a5c0cd	Fix string output in XMLMap	2013-05-05 16:04:42 +02:00
pictuga	adebe23232	Better logging when running as Liferea hook	2013-05-05 15:33:46 +02:00
pictuga	32514941b4	Try to improve support for bogus xml feed	2013-05-05 15:32:57 +02:00
pictuga	b34ecb8ad3	Fix cache crash with one entry with empty value	2013-05-05 15:32:05 +02:00
pictuga	e518f2cced	Better timeout error handling For older versions of Python	2013-05-05 15:31:11 +02:00
pictuga	03501edccd	Add/fix extra modes 'progress' mode now works on Chrome. 'cache' mode only relies on cache to load faster.	2013-05-05 15:30:06 +02:00
pictuga	65090870ac	Remove temp debug print statement	2013-05-05 15:28:32 +02:00
pictuga	e77278dda9	Remove leftover SERVER var from source code	2013-05-01 19:31:24 +02:00
pictuga	949582ba19	Add progress view.	2013-05-01 17:57:09 +02:00
pictuga	5ee5dbf359	Cache http errors to save time.	2013-05-01 17:56:03 +02:00
pictuga	2f1ae1ce91	Use less suspicious user-agents.	2013-05-01 17:54:17 +02:00
pictuga	0a97a2a2b5	Support for combined feedsportal and feedburner.	2013-05-01 17:43:43 +02:00
pictuga	93b098ab11	Added http timeout.	2013-04-30 19:54:32 +02:00
pictuga	9f175994c6	Fix regex implementation.	2013-04-30 19:51:29 +02:00
pictuga	93f971896b	Improved feedsportal url recognition.	2013-04-28 10:10:58 +02:00
pictuga	fa7cd957df	Save Cache when it's new. So as to avoid crashes on first fetch.	2013-04-23 00:24:41 +02:00
pictuga	ca90d082c3	Library import list made cleaner.	2013-04-23 00:04:44 +02:00
pictuga	1480bd7af4	Auto-detection of server-mode, better caching. The SERVER variable is no longer needed. RSS .xml file is now cached for a very short time, so as to make loading faster, and hopefully reduce bann a little. Use a more common User-Agent to try to cut down bann. Added ability to test whether a key is in the Cache.	2013-04-23 00:00:07 +02:00
pictuga	a616c96e32	Removed another unused var.	2013-04-22 23:58:20 +02:00
pictuga	f95c5dcf0d	Fixed caching.	2013-04-22 22:56:38 +02:00
pictuga	83d0dcce4d	Delete unused var declaration.	2013-04-22 22:56:21 +02:00
pictuga	2d05653190	Better detection of feedportal, extra url logging.	2013-04-19 11:44:25 +02:00
pictuga	8ce9812dfd	Meta redirects are now supported.	2013-04-19 11:43:47 +02:00
pictuga	80ba60d295	Better detection of feeds with content provided.	2013-04-19 11:42:54 +02:00
pictuga	d2b74819b4	Improved caching. No longer writes everytime a value is added, since it could cause some issues if two instances of the script were run at the same time. Now it only writes when the Cache object is no longer in use (ie. garbage colllected).	2013-04-19 11:40:35 +02:00
pictuga	4abf7b699c	Use readability to fetch article content. Makes the whole "xpath rules" things useless. Almost any feed is now supported. CSS liferea stylesheets are also uneeded now, since readability cleans up html code a more efficient way. README was updated.	2013-04-19 11:37:43 +02:00
pictuga	17db2584da	Fixed caching. For scary reasons, re-used cache was deleted everytime. This is now fixed. Loading in now really fast.	2013-04-16 16:13:42 +02:00
pictuga	5a74babf24	Improved logging on server.	2013-04-16 16:13:14 +02:00
pictuga	7b1c32eac2	Added support for 404 redirect. ie. http://domain.com/bbc.co.uk/feed.xml will redirect to http://domain.com/morss.py/bbc.co.uk/feed.xml and work.	2013-04-16 16:11:34 +02:00
pictuga	af8879049f	Another huge commit. Now uses OOP where it fits. Atom feeds are supported, but no real tests were made. Unix globbing is now possible for urls. Caching is done a cleaner way. Feedburner links are also replaced. HTML is cleaned a more efficient way. Code is now much cleaner, using lxml.objectify and a small wrapper to access Atom feeds as if they were RSS feeds (and much faster than feedparser). README has been updated.	2013-04-15 18:51:55 +02:00
pictuga	d6e6d61199	Bypass feedsportal.	2013-04-04 19:29:22 +02:00
pictuga	851dacdfbc	Renamed to .py.	2013-04-04 18:17:12 +02:00

1 2 3

145 Commits (c43bf9f35f3705a26cd7a555f950764fa25dbc70)