Now uses OOP where it fits. Atom feeds are supported, but no real tests were made. Unix globbing is now possible for urls. Caching is done a cleaner way. Feedburner links are also replaced. HTML is cleaned a more efficient way. Code is now much cleaner, using lxml.objectify and a small wrapper to access Atom feeds as if they were RSS feeds (and much faster than feedparser). README has been updated.
38 lines
707 B
Plaintext
38 lines
707 B
Plaintext
TehranTimes
|
|
http://www.tehrantimes.com/*
|
|
http://tehrantimes.com/*
|
|
//div[@class='article-indent']
|
|
|
|
FranceInfo
|
|
http://www.franceinfo.fr/rss*
|
|
//h2[@class='chapo']/..
|
|
|
|
Les Echos
|
|
http://rss.feedsportal.com/c/499/f/413829/index.rss
|
|
http://syndication.lesechos.fr/rss/*
|
|
//h1/../..
|
|
|
|
Spiegel
|
|
http://www.spiegel.de/schlagzeilen/*
|
|
//div[@id='spArticleSection']
|
|
|
|
Le Soir
|
|
http://www.lesoir.be/feed/*
|
|
//div[@class='article-content']
|
|
|
|
Stack Overflow
|
|
http://stackoverflow.com/feeds/*
|
|
//*[@id='question']
|
|
|
|
Daily Telegraph
|
|
http://www.telegraph.co.uk/*
|
|
//*[@id='mainBodyArea']
|
|
|
|
Cracked.com
|
|
http://feeds.feedburner.com/CrackedRSS
|
|
//div[@class='content']|//section[@class='body']
|
|
|
|
TheOnion
|
|
http://feeds.theonion.com/*
|
|
//article
|