pictuga
1b7fe8fbee
Use "options" in Gather instead of "progress"
...
Also made it possible to toggle Fill's toggle through parameters
2013-09-29 15:32:58 +02:00
pictuga
a5a327388a
Add ability not to fetch an item's article
2013-09-25 13:47:05 +02:00
pictuga
9f22402dca
Twitter: use pahe <title> as feed title
2013-09-25 12:48:50 +02:00
pictuga
0657077191
Add support for twitter
...
Grabs "feed" from the html page, clips tweet and article together.
2013-09-25 12:37:14 +02:00
pictuga
da14242bcf
Add feedify, and use it in morss
2013-09-25 12:36:21 +02:00
pictuga
9bc4417be3
More flexible xml caching
...
New includes a 'type' var, to remember what we did out of it (normal, nothing, grabbed xml link, etc). xml/html mimetype are now saved in a dict, for easier editing, and consistency.
2013-09-25 12:32:40 +02:00
pictuga
edff54a016
Add pushContent in feeds.py
...
Useful for twitter (later) for it's "clip" toggle, which keeps the original desc/content above the article. Makes changing the content, while keeping the original stub in place, easier.
2013-09-25 12:18:22 +02:00
pictuga
208d70d3db
Use separate var in Fill for final url
...
That way the url can be changed altogether for the article-fetching part, without changing the item link itself. Useful for upcoming twitter feeds.
2013-09-25 11:51:48 +02:00
pictuga
fd1501a0c0
Check relative url earlier
2013-09-25 11:49:45 +02:00
pictuga
1e621099e0
Log cache hash in Gather
2013-09-25 11:15:11 +02:00
pictuga
3d6d7e70b6
Remove useless "as" in error catch
2013-09-25 11:14:22 +02:00
pictuga
e73cbf56c2
Add 'html' option, usefull to see error on server
2013-09-25 11:13:33 +02:00
pictuga
03014a8cbf
Typo in UA_HTML var name
2013-09-25 11:11:11 +02:00
pictuga
4a5cbcfd18
Move httplib in common code
...
Needed for error catch
2013-09-25 11:10:16 +02:00
pictuga
3fd34ff1a6
decodeHTML works without connection object
2013-09-25 11:08:58 +02:00
pictuga
e759cd46c6
Add set/getLinks in FeedItem (base)
2013-09-17 00:45:15 +02:00
pictuga
355bfa5efd
Adding items, creating feeds now possible
...
Both hakish to do, but works
2013-09-17 00:44:10 +02:00
pictuga
658f51e5a9
Support feeds handed out as text/html
...
<http://www.pro-linux.de/rss/index1.xml > and <http://tehrantimes.com/index.php?option=com_ninjarsssyndicator&feed_id=1&format=raw > are on an equal footing…
2013-09-16 00:33:24 +02:00
pictuga
8eb2f7c249
Added another letter to feedsportal table
2013-09-15 19:38:59 +02:00
pictuga
23246ca6c1
Save the key in cache file
2013-09-15 19:20:51 +02:00
pictuga
1b7777c331
Find RSS links within html pages' <head>
...
And cache those links
2013-09-15 19:19:50 +02:00
pictuga
1bd17f1365
Faster relative link resolution
2013-09-15 19:18:39 +02:00
pictuga
7575291f8f
Log url in Gather
...
Useful for upcoming commits
2013-09-15 18:53:35 +02:00
pictuga
532852a408
Use cleaner http error catch
...
One error type was inheriting from another one
2013-09-15 18:52:34 +02:00
pictuga
2eb6e69b5a
full-text with a dash in README
2013-09-15 17:59:17 +02:00
pictuga
43bf021f23
Catch more http exceptions
...
Such as InvalidURL. Subclasses of httplib.HTTPException
2013-09-15 17:19:33 +02:00
pictuga
9252e75923
Ensure var in parseOptions are defined
...
Caused a bug on morss.it's server
2013-09-15 15:56:08 +02:00
pictuga
75b51fc2c2
Add ability to bypass ETag support
...
Add the ":force" argument over http to bypass ETag support, which is convenient to debug code
2013-09-15 15:54:42 +02:00
pictuga
d2de6cf23d
Extra doc for DELAY, for xml cache, & now for etag
2013-09-15 15:45:15 +02:00
pictuga
89187ab6a6
Log generation time
2013-09-15 15:44:25 +02:00
pictuga
04840d9843
More flexible parameters can be passed
...
Multiple parameters can now be passed. HTTP "API" has been improved, and url now have to be like "http://<path to morss>/:<param1>:<param2>/<url>". The code handling the parameters parsing is now way cleaner. Debug toggle is now a var, which can be changed with parameters. Also http logging is no longer done into a file, which tended to grow way too fast, while lacking an "error 403 protection", but instead the parameter ":debug" can be passed in the url, and the page will be delivered as "text/plain" with the debug written into it. Therefore some logging had to be moved around, so as not to output anything during http headers definition.
2013-09-15 15:38:03 +02:00
pictuga
c25aec7107
Only perform <meta> redirects on html pages
2013-09-15 15:33:14 +02:00
pictuga
3176c2a8e8
Fix bad characters detection
...
Now works with any encoding, no longer restricted to utf-8. Uses regex to find encoding (not perfect, but rather fast, since it's used on a substring)
2013-09-15 14:57:37 +02:00
pictuga
3ba74649f6
Test if linked pages are text documents
...
Useful for feeds such as HackerNews
2013-09-10 15:25:55 +02:00
pictuga
1b7fdad6a8
Improve broken XML support
...
TPB feed is a good example <http://rss.thepiratebay.sx/blog >. Now supports ampersand in feed, using the "recover" mode in etree.parse. Broken utf-8 strings in feed are now also supported.
2013-09-08 15:48:34 +02:00
pictuga
5ebd84ee55
Fix broken feeds.py calls for items count
2013-09-08 15:47:15 +02:00
pictuga
fe89a70f24
Add help for new classes
2013-09-01 19:00:22 +02:00
pictuga
50f3c5a552
Use descriptors for lists and to replace property
...
Much nicer. Less duplicate code. More transparent. Big commit.
2013-09-01 18:52:07 +02:00
pictuga
a94d659bc8
Make negation in README more obvious
2013-08-25 00:01:00 +02:00
pictuga
d3c163fb74
Use ETag for user-side caching
...
Pretty hard-code ETag use. ETag is just a timestamp, and the server checks whether it's recent enough.
2013-08-24 23:43:32 +02:00
pictuga
e2c3375eb6
Log url earlier
...
Now logging it in both use cases
2013-08-24 23:41:40 +02:00
pictuga
0c6e28205a
Use seconds for every parameter
2013-08-24 23:40:37 +02:00
pictuga
b350602232
Remove legacy "xml map" declaration
2013-08-24 23:16:23 +02:00
pictuga
1ba22516fe
Small help for etag handler
2013-07-19 00:02:52 +02:00
pictuga
90efb84c57
Don't log word counts
...
Nobody cares
2013-07-18 23:55:58 +02:00
pictuga
9e324465e4
Use etag/last-modified to fetch xml feeds
2013-07-18 23:54:13 +02:00
pictuga
70df746416
Accept None as value to cache
2013-07-18 23:51:11 +02:00
pictuga
71129b5898
Fix headers definition
...
Based on what's done inside urllib2.py.
2013-07-17 14:41:29 +02:00
pictuga
d3213ea1e7
Implement user-agent in HTMLDownloader
...
It was forgotten in the previous commit
2013-07-17 14:40:29 +02:00
pictuga
918dede4be
Extend urllib2 to download pages, use gzip
...
Cleaner than dirty function. Handles decoding, gzip decompression, meta redirects (eg. Washington Post). Might need extra testing.
2013-07-16 23:33:45 +02:00