Commit Graph

114 Commits (0f33db248a06819f71a97045dd2f833d823f3071)

Author SHA1 Message Date
pictuga 0f33db248a Add license info in each file 2020-08-26 20:08:22 +02:00
pictuga 3190d1ec5a feeds: remove useless if(len) before loop 2020-06-02 13:57:45 +02:00
pictuga 4ccc0dafcd Basic help for sub-lib interactive use 2020-05-26 19:34:20 +02:00
pictuga 2fe3e0b8ee feeds: clean up other stylesheets before putting ours 2020-05-26 19:26:36 +02:00
pictuga 22005065e8 Use etree.tostring 'method' arg
Gives appropriately formatted html code.
Some pages might otherwise be rendered as blank.
2020-05-13 11:44:34 +02:00
pictuga c27c38f7c7 crawler: return dict instead of tuple 2020-04-28 22:29:07 +02:00
pictuga a1dc96cb50 feeds: remove mimetype from function call as no longer used 2020-04-28 22:07:25 +02:00
pictuga 818cdaaa9b Make it possible to call sub-libs in non interactive mode
Run `python -m morss.feeds http://lemonde.fr` and so on
2020-04-27 18:00:14 +02:00
pictuga 2806c64326 Make it possible to directly run sub-libs (feeds, crawler, readabilite)
Run `python -im morss.feeds http://website.sample/rss.xml` and so on
2020-04-27 17:19:31 +02:00
pictuga 59ef5af9e2 feeds: fix bug when deleting attr in html 2020-04-24 22:12:05 +02:00
pictuga 325a373e3e feeds: add SyntaxError catch 2020-04-20 16:15:15 +02:00
pictuga 4ce3c7cb32 Small code clean ups 2020-04-19 12:50:05 +02:00
pictuga 7375adce33 sheet.xsl: fix & improve 2020-04-15 23:34:28 +02:00
pictuga fe82b19c91 Merge .xsl & html template
Turns out they somehow serve a similar purpose
2020-04-15 22:30:45 +02:00
pictuga 8e5e8d24a4 Timezone fixes 2020-04-10 20:33:59 +02:00
pictuga 9e7b9d95ee feeds: properly use html template 2020-04-09 20:00:51 +02:00
pictuga 987a719c4e feeds: try all parsers regardless of contenttype
Turns out some websites send the wrong contenttype (json for html, html for xml, etc.)
2020-04-09 19:17:51 +02:00
pictuga 3c7f512583 feeds: handle several errors 2020-04-09 19:09:10 +02:00
pictuga b0f80c6d3c morss: fix csv output encoding 2020-04-09 19:05:50 +02:00
pictuga f3d1f92b39 Detect encoding everytime 2020-04-07 10:38:36 +02:00
pictuga a09831415f feeds: fix bug when mimetype matches nothing 2020-04-06 18:53:07 +02:00
pictuga aad2398e69 feeds: turns out lxml.etree doesn't have drop_tag 2020-04-05 21:50:38 +02:00
pictuga 568e7d7dd2 feeds: make BS's output bytes for lxml's sake 2020-04-05 20:46:04 +02:00
pictuga 40c69f17d2 feeds: parse html with BS
More robust & to make it consistent with :getpage
2020-04-05 16:12:41 +02:00
pictuga 4d785820d9 feeds: ignore provided stylesheets and add ours
Provided sheets usually create errors. Ours is (hopefully) more informative for users not familiar with RSS feeds
2020-03-20 15:32:44 +01:00
pictuga 6a01fc439e feeds: better handle "empty" datetime 2020-03-20 12:30:42 +01:00
pictuga dd2651061f feeds & morss: clean up comments/empty lines 2020-03-20 12:25:48 +01:00
pictuga 912c323c40 feeds: make function output more consistent
e.g. setters return nothing, getters return sth relevant or None (i.e. no empty strings)
2020-03-20 12:23:15 +01:00
pictuga 5705a0be17 feeds: fix delete/rmv code 2020-03-20 12:22:07 +01:00
pictuga 4735ffba45 feeds: fix .convert auto-convert
To fix inheritance loophole
2020-03-20 12:20:41 +01:00
pictuga 08e39f5631 feeds: give simpler name to helper functions 2020-03-20 12:20:15 +01:00
pictuga 765a43511e feeds: remove unused import 2020-03-20 12:19:08 +01:00
pictuga 5865af64f9 Fix indent output for html/xml 2020-03-20 12:18:13 +01:00
pictuga d12d44a500 Remove unused hash-bangs
Leftovers from debugging
2020-03-19 15:06:28 +01:00
pictuga ee8c57c1fc feeds: avoid convert to self 2020-03-19 12:54:04 +01:00
pictuga bda51b0fc7 feeds & morss: many encoding/tostring fixes 2020-03-19 12:53:25 +01:00
pictuga c09b457168 feeds: fix .dic code
Bug introduced recently…
2020-03-19 11:36:20 +01:00
pictuga b47e40246c feeds: clean up html code handling 2020-03-19 11:35:51 +01:00
pictuga 9cf933723f feeds: clean up time handling
Includes a shameful fix on @property
2020-03-19 11:35:02 +01:00
pictuga f48961a7e4 feeds: small code cleanups 2020-03-19 10:13:22 +01:00
pictuga d3fe51cea5 feeds: remove duplicated code 2020-03-19 09:49:52 +01:00
pictuga 449bc3c695 feeds: fix handling of html code 2020-03-19 09:48:53 +01:00
pictuga 13ea52ef80 feeds: add .torss() 2020-03-19 09:47:58 +01:00
pictuga aa2b56b266 feeds: small code cleanups 2020-03-19 09:47:17 +01:00
pictuga 4a70aa9dfa feeds: auto-parse() 2020-03-18 16:34:40 +01:00
pictuga c2f85da94a feeds: add html support, adapt .tohtml() 2020-03-18 16:33:10 +01:00
pictuga e3528a8f36 feeds: use FeddJSON for .tojson()
Clean up related code
2020-03-18 16:31:36 +01:00
pictuga 2dd9ae202d feeds: add 'mode' to Parsers 2020-03-18 16:24:08 +01:00
pictuga 8e3d32f24c feds: rename 'ruleset' into 'default_ruleset'
Better reflects its use
2020-03-18 16:22:03 +01:00
pictuga 6ce616106b feeds: disable 'multi' ruleset
RSS ruleset has been cut into 4 rulesets
2020-03-18 16:20:42 +01:00