pictuga
0657077191
Add support for twitter
...
Grabs "feed" from the html page, clips tweet and article together.
2013-09-25 12:37:14 +02:00
pictuga
da14242bcf
Add feedify, and use it in morss
2013-09-25 12:36:21 +02:00
pictuga
9bc4417be3
More flexible xml caching
...
New includes a 'type' var, to remember what we did out of it (normal, nothing, grabbed xml link, etc). xml/html mimetype are now saved in a dict, for easier editing, and consistency.
2013-09-25 12:32:40 +02:00
pictuga
edff54a016
Add pushContent in feeds.py
...
Useful for twitter (later) for it's "clip" toggle, which keeps the original desc/content above the article. Makes changing the content, while keeping the original stub in place, easier.
2013-09-25 12:18:22 +02:00
pictuga
208d70d3db
Use separate var in Fill for final url
...
That way the url can be changed altogether for the article-fetching part, without changing the item link itself. Useful for upcoming twitter feeds.
2013-09-25 11:51:48 +02:00
pictuga
fd1501a0c0
Check relative url earlier
2013-09-25 11:49:45 +02:00
pictuga
1e621099e0
Log cache hash in Gather
2013-09-25 11:15:11 +02:00
pictuga
3d6d7e70b6
Remove useless "as" in error catch
2013-09-25 11:14:22 +02:00
pictuga
e73cbf56c2
Add 'html' option, usefull to see error on server
2013-09-25 11:13:33 +02:00
pictuga
03014a8cbf
Typo in UA_HTML var name
2013-09-25 11:11:11 +02:00
pictuga
4a5cbcfd18
Move httplib in common code
...
Needed for error catch
2013-09-25 11:10:16 +02:00
pictuga
3fd34ff1a6
decodeHTML works without connection object
2013-09-25 11:08:58 +02:00
pictuga
658f51e5a9
Support feeds handed out as text/html
...
<http://www.pro-linux.de/rss/index1.xml > and <http://tehrantimes.com/index.php?option=com_ninjarsssyndicator&feed_id=1&format=raw > are on an equal footing…
2013-09-16 00:33:24 +02:00
pictuga
8eb2f7c249
Added another letter to feedsportal table
2013-09-15 19:38:59 +02:00
pictuga
23246ca6c1
Save the key in cache file
2013-09-15 19:20:51 +02:00
pictuga
1b7777c331
Find RSS links within html pages' <head>
...
And cache those links
2013-09-15 19:19:50 +02:00
pictuga
1bd17f1365
Faster relative link resolution
2013-09-15 19:18:39 +02:00
pictuga
7575291f8f
Log url in Gather
...
Useful for upcoming commits
2013-09-15 18:53:35 +02:00
pictuga
532852a408
Use cleaner http error catch
...
One error type was inheriting from another one
2013-09-15 18:52:34 +02:00
pictuga
43bf021f23
Catch more http exceptions
...
Such as InvalidURL. Subclasses of httplib.HTTPException
2013-09-15 17:19:33 +02:00
pictuga
9252e75923
Ensure var in parseOptions are defined
...
Caused a bug on morss.it's server
2013-09-15 15:56:08 +02:00
pictuga
75b51fc2c2
Add ability to bypass ETag support
...
Add the ":force" argument over http to bypass ETag support, which is convenient to debug code
2013-09-15 15:54:42 +02:00
pictuga
d2de6cf23d
Extra doc for DELAY, for xml cache, & now for etag
2013-09-15 15:45:15 +02:00
pictuga
89187ab6a6
Log generation time
2013-09-15 15:44:25 +02:00
pictuga
04840d9843
More flexible parameters can be passed
...
Multiple parameters can now be passed. HTTP "API" has been improved, and url now have to be like "http://<path to morss>/:<param1>:<param2>/<url>". The code handling the parameters parsing is now way cleaner. Debug toggle is now a var, which can be changed with parameters. Also http logging is no longer done into a file, which tended to grow way too fast, while lacking an "error 403 protection", but instead the parameter ":debug" can be passed in the url, and the page will be delivered as "text/plain" with the debug written into it. Therefore some logging had to be moved around, so as not to output anything during http headers definition.
2013-09-15 15:38:03 +02:00
pictuga
c25aec7107
Only perform <meta> redirects on html pages
2013-09-15 15:33:14 +02:00
pictuga
3ba74649f6
Test if linked pages are text documents
...
Useful for feeds such as HackerNews
2013-09-10 15:25:55 +02:00
pictuga
5ebd84ee55
Fix broken feeds.py calls for items count
2013-09-08 15:47:15 +02:00
pictuga
d3c163fb74
Use ETag for user-side caching
...
Pretty hard-code ETag use. ETag is just a timestamp, and the server checks whether it's recent enough.
2013-08-24 23:43:32 +02:00
pictuga
e2c3375eb6
Log url earlier
...
Now logging it in both use cases
2013-08-24 23:41:40 +02:00
pictuga
0c6e28205a
Use seconds for every parameter
2013-08-24 23:40:37 +02:00
pictuga
b350602232
Remove legacy "xml map" declaration
2013-08-24 23:16:23 +02:00
pictuga
1ba22516fe
Small help for etag handler
2013-07-19 00:02:52 +02:00
pictuga
90efb84c57
Don't log word counts
...
Nobody cares
2013-07-18 23:55:58 +02:00
pictuga
9e324465e4
Use etag/last-modified to fetch xml feeds
2013-07-18 23:54:13 +02:00
pictuga
70df746416
Accept None as value to cache
2013-07-18 23:51:11 +02:00
pictuga
71129b5898
Fix headers definition
...
Based on what's done inside urllib2.py.
2013-07-17 14:41:29 +02:00
pictuga
d3213ea1e7
Implement user-agent in HTMLDownloader
...
It was forgotten in the previous commit
2013-07-17 14:40:29 +02:00
pictuga
918dede4be
Extend urllib2 to download pages, use gzip
...
Cleaner than dirty function. Handles decoding, gzip decompression, meta redirects (eg. Washington Post). Might need extra testing.
2013-07-16 23:33:45 +02:00
pictuga
1fa8c4c535
Remove cleanXML()
...
This function is way too strong, and no longer needed (even for the targeted feed). It lead to other bugs with other feeds, where needed spaces were stripped.
2013-07-15 11:10:19 +02:00
pictuga
0718303eb7
Use ' instead of " when possible
2013-07-14 19:00:16 +02:00
pictuga
7275bb1a59
Better content insertion
...
Also takes care of description, by creating one, when missing.
2013-07-14 18:58:48 +02:00
pictuga
054f5c0846
Detect provided content with word count
...
This is instead of character count.
2013-07-14 18:57:12 +02:00
pictuga
7fa183d713
Change morss.py to use feeds.py
...
No other changes should appear in this commit
2013-07-14 18:44:11 +02:00
pictuga
cf3934a513
Change http output mimetype to xml
2013-06-28 13:34:12 +02:00
pictuga
1f4c219880
Common code for url/options handling
2013-06-25 13:13:23 +02:00
pictuga
d2418a47c2
Add support for reddit.com feeds
...
The content of the linked article is used for the content. The original content (with a link to comments) is still available in the "description" of the feed item.
2013-06-11 13:02:47 +02:00
pictuga
f0b237364f
Better annotation of feedsburner/feedsportal code
2013-06-11 13:02:16 +02:00
pictuga
0978e76356
str.decode() within EncDownload()
2013-06-08 17:32:55 +02:00
pictuga
89354e1528
Use file's built-in readlines() to split file
2013-06-08 17:30:53 +02:00