pictuga
1b7fe8fbee
Use "options" in Gather instead of "progress"
...
Also made it possible to toggle Fill's toggle through parameters
10 years ago
pictuga
a5a327388a
Add ability not to fetch an item's article
10 years ago
pictuga
9f22402dca
Twitter: use pahe <title> as feed title
10 years ago
pictuga
0657077191
Add support for twitter
...
Grabs "feed" from the html page, clips tweet and article together.
10 years ago
pictuga
da14242bcf
Add feedify, and use it in morss
10 years ago
pictuga
9bc4417be3
More flexible xml caching
...
New includes a 'type' var, to remember what we did out of it (normal, nothing, grabbed xml link, etc). xml/html mimetype are now saved in a dict, for easier editing, and consistency.
10 years ago
pictuga
edff54a016
Add pushContent in feeds.py
...
Useful for twitter (later) for it's "clip" toggle, which keeps the original desc/content above the article. Makes changing the content, while keeping the original stub in place, easier.
10 years ago
pictuga
208d70d3db
Use separate var in Fill for final url
...
That way the url can be changed altogether for the article-fetching part, without changing the item link itself. Useful for upcoming twitter feeds.
10 years ago
pictuga
fd1501a0c0
Check relative url earlier
10 years ago
pictuga
1e621099e0
Log cache hash in Gather
10 years ago
pictuga
3d6d7e70b6
Remove useless "as" in error catch
10 years ago
pictuga
e73cbf56c2
Add 'html' option, usefull to see error on server
10 years ago
pictuga
03014a8cbf
Typo in UA_HTML var name
10 years ago
pictuga
4a5cbcfd18
Move httplib in common code
...
Needed for error catch
10 years ago
pictuga
3fd34ff1a6
decodeHTML works without connection object
10 years ago
pictuga
e759cd46c6
Add set/getLinks in FeedItem (base)
10 years ago
pictuga
355bfa5efd
Adding items, creating feeds now possible
...
Both hakish to do, but works
10 years ago
pictuga
658f51e5a9
Support feeds handed out as text/html
...
<http://www.pro-linux.de/rss/index1.xml > and <http://tehrantimes.com/index.php?option=com_ninjarsssyndicator&feed_id=1&format=raw > are on an equal footing…
10 years ago
pictuga
8eb2f7c249
Added another letter to feedsportal table
10 years ago
pictuga
23246ca6c1
Save the key in cache file
10 years ago
pictuga
1b7777c331
Find RSS links within html pages' <head>
...
And cache those links
10 years ago
pictuga
1bd17f1365
Faster relative link resolution
10 years ago
pictuga
7575291f8f
Log url in Gather
...
Useful for upcoming commits
10 years ago
pictuga
532852a408
Use cleaner http error catch
...
One error type was inheriting from another one
10 years ago
pictuga
2eb6e69b5a
full-text with a dash in README
10 years ago
pictuga
43bf021f23
Catch more http exceptions
...
Such as InvalidURL. Subclasses of httplib.HTTPException
10 years ago
pictuga
9252e75923
Ensure var in parseOptions are defined
...
Caused a bug on morss.it's server
10 years ago
pictuga
75b51fc2c2
Add ability to bypass ETag support
...
Add the ":force" argument over http to bypass ETag support, which is convenient to debug code
10 years ago
pictuga
d2de6cf23d
Extra doc for DELAY, for xml cache, & now for etag
10 years ago
pictuga
89187ab6a6
Log generation time
10 years ago
pictuga
04840d9843
More flexible parameters can be passed
...
Multiple parameters can now be passed. HTTP "API" has been improved, and url now have to be like "http://<path to morss>/:<param1>:<param2>/<url>". The code handling the parameters parsing is now way cleaner. Debug toggle is now a var, which can be changed with parameters. Also http logging is no longer done into a file, which tended to grow way too fast, while lacking an "error 403 protection", but instead the parameter ":debug" can be passed in the url, and the page will be delivered as "text/plain" with the debug written into it. Therefore some logging had to be moved around, so as not to output anything during http headers definition.
10 years ago
pictuga
c25aec7107
Only perform <meta> redirects on html pages
10 years ago
pictuga
3176c2a8e8
Fix bad characters detection
...
Now works with any encoding, no longer restricted to utf-8. Uses regex to find encoding (not perfect, but rather fast, since it's used on a substring)
10 years ago
pictuga
3ba74649f6
Test if linked pages are text documents
...
Useful for feeds such as HackerNews
10 years ago
pictuga
1b7fdad6a8
Improve broken XML support
...
TPB feed is a good example <http://rss.thepiratebay.sx/blog >. Now supports ampersand in feed, using the "recover" mode in etree.parse. Broken utf-8 strings in feed are now also supported.
10 years ago
pictuga
5ebd84ee55
Fix broken feeds.py calls for items count
10 years ago
pictuga
fe89a70f24
Add help for new classes
10 years ago
pictuga
50f3c5a552
Use descriptors for lists and to replace property
...
Much nicer. Less duplicate code. More transparent. Big commit.
10 years ago
pictuga
a94d659bc8
Make negation in README more obvious
10 years ago
pictuga
d3c163fb74
Use ETag for user-side caching
...
Pretty hard-code ETag use. ETag is just a timestamp, and the server checks whether it's recent enough.
10 years ago
pictuga
e2c3375eb6
Log url earlier
...
Now logging it in both use cases
10 years ago
pictuga
0c6e28205a
Use seconds for every parameter
10 years ago
pictuga
b350602232
Remove legacy "xml map" declaration
10 years ago
pictuga
1ba22516fe
Small help for etag handler
10 years ago
pictuga
90efb84c57
Don't log word counts
...
Nobody cares
10 years ago
pictuga
9e324465e4
Use etag/last-modified to fetch xml feeds
10 years ago
pictuga
70df746416
Accept None as value to cache
10 years ago
pictuga
71129b5898
Fix headers definition
...
Based on what's done inside urllib2.py.
10 years ago
pictuga
d3213ea1e7
Implement user-agent in HTMLDownloader
...
It was forgotten in the previous commit
10 years ago
pictuga
918dede4be
Extend urllib2 to download pages, use gzip
...
Cleaner than dirty function. Handles decoding, gzip decompression, meta redirects (eg. Washington Post). Might need extra testing.
10 years ago