pictuga
78706952fe
Remove "clip" from Fill
...
Put that in Gather. Also removed from feeds.py. "alone" mode was also added (it removes the description).
2013-10-01 19:45:54 +02:00
pictuga
1b7fe8fbee
Use "options" in Gather instead of "progress"
...
Also made it possible to toggle Fill's toggle through parameters
2013-09-29 15:32:58 +02:00
pictuga
a5a327388a
Add ability not to fetch an item's article
2013-09-25 13:47:05 +02:00
pictuga
0657077191
Add support for twitter
...
Grabs "feed" from the html page, clips tweet and article together.
2013-09-25 12:37:14 +02:00
pictuga
da14242bcf
Add feedify, and use it in morss
2013-09-25 12:36:21 +02:00
pictuga
9bc4417be3
More flexible xml caching
...
New includes a 'type' var, to remember what we did out of it (normal, nothing, grabbed xml link, etc). xml/html mimetype are now saved in a dict, for easier editing, and consistency.
2013-09-25 12:32:40 +02:00
pictuga
edff54a016
Add pushContent in feeds.py
...
Useful for twitter (later) for it's "clip" toggle, which keeps the original desc/content above the article. Makes changing the content, while keeping the original stub in place, easier.
2013-09-25 12:18:22 +02:00
pictuga
208d70d3db
Use separate var in Fill for final url
...
That way the url can be changed altogether for the article-fetching part, without changing the item link itself. Useful for upcoming twitter feeds.
2013-09-25 11:51:48 +02:00
pictuga
fd1501a0c0
Check relative url earlier
2013-09-25 11:49:45 +02:00
pictuga
1e621099e0
Log cache hash in Gather
2013-09-25 11:15:11 +02:00
pictuga
3d6d7e70b6
Remove useless "as" in error catch
2013-09-25 11:14:22 +02:00
pictuga
e73cbf56c2
Add 'html' option, usefull to see error on server
2013-09-25 11:13:33 +02:00
pictuga
03014a8cbf
Typo in UA_HTML var name
2013-09-25 11:11:11 +02:00
pictuga
4a5cbcfd18
Move httplib in common code
...
Needed for error catch
2013-09-25 11:10:16 +02:00
pictuga
3fd34ff1a6
decodeHTML works without connection object
2013-09-25 11:08:58 +02:00
pictuga
658f51e5a9
Support feeds handed out as text/html
...
<http://www.pro-linux.de/rss/index1.xml > and <http://tehrantimes.com/index.php?option=com_ninjarsssyndicator&feed_id=1&format=raw > are on an equal footing…
2013-09-16 00:33:24 +02:00
pictuga
8eb2f7c249
Added another letter to feedsportal table
2013-09-15 19:38:59 +02:00
pictuga
23246ca6c1
Save the key in cache file
2013-09-15 19:20:51 +02:00
pictuga
1b7777c331
Find RSS links within html pages' <head>
...
And cache those links
2013-09-15 19:19:50 +02:00
pictuga
1bd17f1365
Faster relative link resolution
2013-09-15 19:18:39 +02:00
pictuga
7575291f8f
Log url in Gather
...
Useful for upcoming commits
2013-09-15 18:53:35 +02:00
pictuga
532852a408
Use cleaner http error catch
...
One error type was inheriting from another one
2013-09-15 18:52:34 +02:00
pictuga
43bf021f23
Catch more http exceptions
...
Such as InvalidURL. Subclasses of httplib.HTTPException
2013-09-15 17:19:33 +02:00
pictuga
9252e75923
Ensure var in parseOptions are defined
...
Caused a bug on morss.it's server
2013-09-15 15:56:08 +02:00
pictuga
75b51fc2c2
Add ability to bypass ETag support
...
Add the ":force" argument over http to bypass ETag support, which is convenient to debug code
2013-09-15 15:54:42 +02:00
pictuga
d2de6cf23d
Extra doc for DELAY, for xml cache, & now for etag
2013-09-15 15:45:15 +02:00
pictuga
89187ab6a6
Log generation time
2013-09-15 15:44:25 +02:00
pictuga
04840d9843
More flexible parameters can be passed
...
Multiple parameters can now be passed. HTTP "API" has been improved, and url now have to be like "http://<path to morss>/:<param1>:<param2>/<url>". The code handling the parameters parsing is now way cleaner. Debug toggle is now a var, which can be changed with parameters. Also http logging is no longer done into a file, which tended to grow way too fast, while lacking an "error 403 protection", but instead the parameter ":debug" can be passed in the url, and the page will be delivered as "text/plain" with the debug written into it. Therefore some logging had to be moved around, so as not to output anything during http headers definition.
2013-09-15 15:38:03 +02:00
pictuga
c25aec7107
Only perform <meta> redirects on html pages
2013-09-15 15:33:14 +02:00
pictuga
3ba74649f6
Test if linked pages are text documents
...
Useful for feeds such as HackerNews
2013-09-10 15:25:55 +02:00
pictuga
5ebd84ee55
Fix broken feeds.py calls for items count
2013-09-08 15:47:15 +02:00
pictuga
d3c163fb74
Use ETag for user-side caching
...
Pretty hard-code ETag use. ETag is just a timestamp, and the server checks whether it's recent enough.
2013-08-24 23:43:32 +02:00
pictuga
e2c3375eb6
Log url earlier
...
Now logging it in both use cases
2013-08-24 23:41:40 +02:00
pictuga
0c6e28205a
Use seconds for every parameter
2013-08-24 23:40:37 +02:00
pictuga
b350602232
Remove legacy "xml map" declaration
2013-08-24 23:16:23 +02:00
pictuga
1ba22516fe
Small help for etag handler
2013-07-19 00:02:52 +02:00
pictuga
90efb84c57
Don't log word counts
...
Nobody cares
2013-07-18 23:55:58 +02:00
pictuga
9e324465e4
Use etag/last-modified to fetch xml feeds
2013-07-18 23:54:13 +02:00
pictuga
70df746416
Accept None as value to cache
2013-07-18 23:51:11 +02:00
pictuga
71129b5898
Fix headers definition
...
Based on what's done inside urllib2.py.
2013-07-17 14:41:29 +02:00
pictuga
d3213ea1e7
Implement user-agent in HTMLDownloader
...
It was forgotten in the previous commit
2013-07-17 14:40:29 +02:00
pictuga
918dede4be
Extend urllib2 to download pages, use gzip
...
Cleaner than dirty function. Handles decoding, gzip decompression, meta redirects (eg. Washington Post). Might need extra testing.
2013-07-16 23:33:45 +02:00
pictuga
1fa8c4c535
Remove cleanXML()
...
This function is way too strong, and no longer needed (even for the targeted feed). It lead to other bugs with other feeds, where needed spaces were stripped.
2013-07-15 11:10:19 +02:00
pictuga
0718303eb7
Use ' instead of " when possible
2013-07-14 19:00:16 +02:00
pictuga
7275bb1a59
Better content insertion
...
Also takes care of description, by creating one, when missing.
2013-07-14 18:58:48 +02:00
pictuga
054f5c0846
Detect provided content with word count
...
This is instead of character count.
2013-07-14 18:57:12 +02:00
pictuga
7fa183d713
Change morss.py to use feeds.py
...
No other changes should appear in this commit
2013-07-14 18:44:11 +02:00
pictuga
cf3934a513
Change http output mimetype to xml
2013-06-28 13:34:12 +02:00
pictuga
1f4c219880
Common code for url/options handling
2013-06-25 13:13:23 +02:00
pictuga
d2418a47c2
Add support for reddit.com feeds
...
The content of the linked article is used for the content. The original content (with a link to comments) is still available in the "description" of the feed item.
2013-06-11 13:02:47 +02:00
pictuga
f0b237364f
Better annotation of feedsburner/feedsportal code
2013-06-11 13:02:16 +02:00
pictuga
0978e76356
str.decode() within EncDownload()
2013-06-08 17:32:55 +02:00
pictuga
89354e1528
Use file's built-in readlines() to split file
2013-06-08 17:30:53 +02:00
pictuga
bbf5c92ba2
Fix lenHTML() with empty string
2013-06-08 17:30:11 +02:00
pictuga
e05d1c9deb
Replace uppercase title with "title-case"
2013-06-02 23:45:41 +02:00
pictuga
b78f0bfba5
Improve options and limits
...
New limits are possible: time limit, max number of item fetched, and max number of item taken from cache. Fill third argument is now Fast=True, which is self-explicit. (Complexity of the changes made separate commits impossible).
2013-05-15 17:56:58 +02:00
pictuga
2a71fe07f2
Improve Cache code
...
Removed _new flag. Slightly more stable and cleaner.
2013-05-15 17:48:39 +02:00
pictuga
bf647ba5f8
Make Fill return True when it had done sth useful
2013-05-15 17:38:52 +02:00
pictuga
9694a31052
Add 'feedurl' argument to Fill
...
Was needed for commit f3c2c34
2013-05-15 17:36:00 +02:00
pictuga
8e2aab55e7
Check url before looking for provided content
...
Also use lenHTML() function defined a lately
2013-05-15 17:32:42 +02:00
pictuga
85e40cde4e
Check article length is big enough
...
Avoids replacing rather useful descriptions with empty string
2013-05-15 17:24:27 +02:00
pictuga
222b1369e5
Support for relative urls in feed
2013-05-15 17:13:57 +02:00
pictuga
d88719c87f
Use urlparse library to check feed urls
2013-05-15 17:12:59 +02:00
pictuga
1506a5c0cd
Fix string output in XMLMap
2013-05-05 16:04:42 +02:00
pictuga
adebe23232
Better logging when running as Liferea hook
2013-05-05 15:33:46 +02:00
pictuga
32514941b4
Try to improve support for bogus xml feed
2013-05-05 15:32:57 +02:00
pictuga
b34ecb8ad3
Fix cache crash with one entry with empty value
2013-05-05 15:32:05 +02:00
pictuga
e518f2cced
Better timeout error handling
...
For older versions of Python
2013-05-05 15:31:11 +02:00
pictuga
03501edccd
Add/fix extra modes
...
'progress' mode now works on Chrome. 'cache' mode only relies on cache to load faster.
2013-05-05 15:30:06 +02:00
pictuga
65090870ac
Remove temp debug print statement
2013-05-05 15:28:32 +02:00
pictuga
e77278dda9
Remove leftover SERVER var from source code
2013-05-01 19:31:24 +02:00
pictuga
949582ba19
Add progress view.
2013-05-01 17:57:09 +02:00
pictuga
5ee5dbf359
Cache http errors to save time.
2013-05-01 17:56:03 +02:00
pictuga
2f1ae1ce91
Use less suspicious user-agents.
2013-05-01 17:54:17 +02:00
pictuga
0a97a2a2b5
Support for combined feedsportal and feedburner.
2013-05-01 17:43:43 +02:00
pictuga
93b098ab11
Added http timeout.
2013-04-30 19:54:32 +02:00
pictuga
9f175994c6
Fix regex implementation.
2013-04-30 19:51:29 +02:00
pictuga
93f971896b
Improved feedsportal url recognition.
2013-04-28 10:10:58 +02:00
pictuga
fa7cd957df
Save Cache when it's new.
...
So as to avoid crashes on first fetch.
2013-04-23 00:24:41 +02:00
pictuga
ca90d082c3
Library import list made cleaner.
2013-04-23 00:04:44 +02:00
pictuga
1480bd7af4
Auto-detection of server-mode, better caching.
...
The SERVER variable is no longer needed. RSS .xml file is now cached for a very short time, so as to make loading faster, and hopefully reduce bann a little. Use a more common User-Agent to try to cut down bann. Added ability to test whether a key is in the Cache.
2013-04-23 00:00:07 +02:00
pictuga
a616c96e32
Removed another unused var.
2013-04-22 23:58:20 +02:00
pictuga
f95c5dcf0d
Fixed caching.
2013-04-22 22:56:38 +02:00
pictuga
83d0dcce4d
Delete unused var declaration.
2013-04-22 22:56:21 +02:00
pictuga
2d05653190
Better detection of feedportal, extra url logging.
2013-04-19 11:44:25 +02:00
pictuga
8ce9812dfd
Meta redirects are now supported.
2013-04-19 11:43:47 +02:00
pictuga
80ba60d295
Better detection of feeds with content provided.
2013-04-19 11:42:54 +02:00
pictuga
d2b74819b4
Improved caching.
...
No longer writes everytime a value is added, since it could cause some issues if two instances of the script were run at the same time. Now it only writes when the Cache object is no longer in use (ie. garbage colllected).
2013-04-19 11:40:35 +02:00
pictuga
4abf7b699c
Use readability to fetch article content.
...
Makes the whole "xpath rules" things useless. Almost any feed is now supported. CSS liferea stylesheets are also uneeded now, since readability cleans up html code a more efficient way. README was updated.
2013-04-19 11:37:43 +02:00
pictuga
17db2584da
Fixed caching.
...
For scary reasons, re-used cache was deleted everytime. This is now fixed. Loading in now *really* fast.
2013-04-16 16:13:42 +02:00
pictuga
5a74babf24
Improved logging on server.
2013-04-16 16:13:14 +02:00
pictuga
7b1c32eac2
Added support for 404 redirect.
...
ie. http://domain.com/bbc.co.uk/feed.xml will redirect to http://domain.com/morss.py/bbc.co.uk/feed.xml and work.
2013-04-16 16:11:34 +02:00
pictuga
af8879049f
Another huge commit.
...
Now uses OOP where it fits. Atom feeds are supported, but no real tests were made. Unix globbing is now possible for urls. Caching is done a cleaner way. Feedburner links are also replaced. HTML is cleaned a more efficient way. Code is now much cleaner, using lxml.objectify and a small wrapper to access Atom feeds as if they were RSS feeds (and much faster than feedparser). README has been updated.
2013-04-15 18:51:55 +02:00
pictuga
d6e6d61199
Bypass feedsportal.
2013-04-04 19:29:22 +02:00
pictuga
851dacdfbc
Renamed to .py.
2013-04-04 18:17:12 +02:00