Commit Graph

978 Commits (7885ab48dfc04311110997a53bc0a8542333c5e8)
 

Author SHA1 Message Date
pictuga af8879049f Another huge commit.
Now uses OOP where it fits. Atom feeds are supported, but no real tests were made. Unix globbing is now possible for urls. Caching is done a cleaner way. Feedburner links are also replaced. HTML is cleaned a more efficient way. Code is now much cleaner, using lxml.objectify and a small wrapper to access Atom feeds as if they were RSS feeds (and much faster than feedparser). README has been updated.
2013-04-15 18:51:55 +02:00
pictuga a098b7e104 Add .htaccess to display cache files as RSS feeds. 2013-04-05 17:05:26 +02:00
pictuga 5898879c8e Added .htaccess to enable script execution. 2013-04-05 17:02:42 +02:00
pictuga d6e6d61199 Bypass feedsportal. 2013-04-04 19:29:22 +02:00
pictuga ad25516e34 Speak about deleteTags in README. 2013-04-04 18:31:26 +02:00
pictuga 851dacdfbc Renamed to .py. 2013-04-04 18:17:12 +02:00
pictuga 6783bbf992 Improved shebang. 2013-04-04 17:56:37 +02:00
pictuga 82084c2c75 Move to OOP.
This is a huge commit. The whole code is ported to Object-Oritented Programming. This makes the code cleaner, which became required to deal with all the different cases, for example with encoding detection. Encoding detection now works better, and uses 3 different methods. HTML pages with an xml declaration are now supported. Feed urls with parameters (eg. "index.php?option=par") are also supported. Cache is now smarter, since it no longer grows indefinitely, since only in-use pages are kept in the cache. Caching is now mandatory. urllib (not urllib2) is no longer needed. Solved a possible crash with log function (when passing list of str with non-unicode encoging).
README is also updated.
2013-04-04 17:43:30 +02:00
pictuga c21af6d9a8 Added lesoir.be rule 2013-04-03 10:22:07 +02:00
pictuga 05b5bc7783 Catch extra errors (timeout). 2013-03-29 20:06:31 +01:00
pictuga f734fb2623 Added quick licence information. 2013-03-29 20:05:53 +01:00
pictuga 6f6c5fbaad Faster xml cleaning 2013-03-01 14:26:51 +01:00
pictuga e305f387ab Hopefully fixed encoding issues
with the dirtiest trick out there...
2013-02-27 15:12:32 +01:00
pictuga 0eaa1b3ab9 Added lesoir.be css rules 2013-02-27 15:12:17 +01:00
pictuga 682ab253b0 Typo in README 2013-02-25 21:56:16 +01:00
pictuga 217ff0fd8f Use better markdown syntax for default xpath rule 2013-02-25 21:55:17 +01:00
pictuga 27b0fbaf01 Speak about default xpath in README 2013-02-25 21:54:04 +01:00
pictuga be17f0c78f Updated README to markdown 2013-02-25 21:49:38 +01:00
pictuga 9a1b2a8490 Updated README since caching is now implemented 2013-02-25 21:38:56 +01:00
pictuga ed8a45875c Default to "//h1/.." since most website use it
because it is said to be good for SEO. Debug now requires env variable "DEBUG" to be set to something else than "".
2013-02-25 21:36:02 +01:00
pictuga 253bc27f17 Hide <noscript> warnings 2013-02-25 21:35:46 +01:00
pictuga d39604c453 Support for cookies added
NYT needs them
2013-02-25 20:53:59 +01:00
pictuga d6179a734f Clearer debug info 2013-02-25 20:53:22 +01:00
pictuga eb63ce3f4f Handle more errors 2013-02-25 18:32:23 +01:00
pictuga b63f91a151 Added cache, easier debug 2013-02-25 18:01:59 +01:00
pictuga 7dfe92de63 Added README 2013-02-25 16:40:51 +01:00
pictuga 2a146f1a36 Links to rss feeds in rules list 2013-02-25 16:18:52 +01:00
pictuga 51fe6ce81b First commit 2013-02-25 15:50:32 +01:00