Another huge commit.

Now uses OOP where it fits. Atom feeds are supported, but no real tests were made. Unix globbing is now possible for urls. Caching is done a cleaner way. Feedburner links are also replaced. HTML is cleaned a more efficient way. Code is now much cleaner, using lxml.objectify and a small wrapper to access Atom feeds as if they were RSS feeds (and much faster than feedparser). README has been updated.
This commit is contained in:
2013-04-15 18:51:55 +02:00
parent a098b7e104
commit af8879049f
4 changed files with 339 additions and 184 deletions

View File

@@ -3,6 +3,8 @@
This tool's goal is to get full-text RSS feeds out of striped RSS feeds, commonly available on internet. Indeed most newspapers only make a small description available to users in their rss feeds, which makes the RSS feed rather useless. So this tool intends to fix that problem.
This tool opens the links from the rss feed, then downloads the full article from the newspaper website and puts it back in the rss feed.
morss also has experimental support for Atom feeds.
##(xpath) Rules
To find the article content on the newspaper's website, morss need to know where to look at. The default target is the first `<h1>` element, since it's a common practice, or a `<article>` element, for HTML5 compliant websites.
@@ -19,9 +21,12 @@ Here, xpath rules stored in the `rules` file. (The name of the file can be chang
Fancy name (description)(useless but not optional)
http://example.com/path/to/the/rss/feed.xml
http://example.co.uk/other/*/path/with/wildcard/*.xml
//super/accurate[@xpath='expression']/..
Works like a charm with Tiny TinyRSS (<http://tt-rss.org/redmine/projects/tt-rss/wiki>).
As shown in the example, multiple urls can be specified for a single rule, so as to be able to match feeds from different locations of the website server (for example with or without "www."). Moreover feeds urls can be *NIX glob-style patterns, so as to match any feed from a website.
Works like a charm with Tiny Tiny RSS (<http://tt-rss.org/redmine/projects/tt-rss/wiki>).
###As a newsreader hook
@@ -51,4 +56,4 @@ Unwanted HTML elements are also stripped from the article. By default, elements
---
GPL3 licence.
Python **2.6** required (not 3).
Python **2.6**+ required (not 3).