Commit Graph

130 Commits (ea2ebedfcb5a03a1848c0895e05625db8e5be953)

Author SHA1 Message Date
pictuga 32645548c2 pytest: first batch with test_feeds
continuous-integration/drone/push Build is failing Details
And multiple related fixes
2022-01-31 08:32:34 +01:00
pictuga b2b04691d6 Ability to pass custom data_files location 2022-01-25 22:36:34 +01:00
pictuga 32d9bc9d9d feeds: proceed with conversion when rules do not match
continuous-integration/drone/push Build is failing Details
2022-01-24 09:34:57 +00:00
pictuga e88a823ada feeds: better handle rulesets without a 'mode' specified
continuous-integration/drone/push Build is failing Details
2022-01-19 13:08:33 +01:00
pictuga c8669002e4 feeds: exotic xpath in html as well
continuous-integration/drone/push Build is passing Details
2022-01-17 14:22:48 +00:00
pictuga c524e54d2d feeds: support some exotic xpath rules returning a single string
continuous-integration/drone/push Build is passing Details
2022-01-17 13:59:58 +00:00
pictuga 51f1d330a4 Fn to access data_files & pkg files
continuous-integration/drone Build is running Details
continuous-integration/drone/push Build is passing Details
2021-12-05 12:09:01 +01:00
pictuga 5473b77416 Post-clean up isort
continuous-integration/drone/push Build is passing Details
2021-09-21 08:11:04 +02:00
pictuga 0b3e6d7749 Apply isort 2021-09-21 08:04:23 +02:00
pictuga d5942fe5a7 feeds: fix issues when mode not explicited in ruleset 2021-08-29 00:20:29 +02:00
pictuga da5442a1dc feedify: support any type (json, xml, html) 2021-08-29 00:17:28 +02:00
pictuga e37c8346d0 feeds: add fallback for time parser 2021-04-22 21:57:16 +02:00
pictuga 3a1d564992 feeds: fix time zone handling 2021-04-22 21:51:00 +02:00
pictuga 2514fabd38 Replace memory-leak-prone Uniq with @uniq_wrapper 2020-10-03 19:43:55 +02:00
pictuga 8cb7002fe6 feeds: make it possible to append empty items
And return the newly appended items, to make it easy to edit them
2020-10-03 16:56:07 +02:00
pictuga 6966e03bef Clean up itemClass code
To avoid globals()
2020-10-03 16:25:29 +02:00
pictuga 0f33db248a Add license info in each file 2020-08-26 20:08:22 +02:00
pictuga 3190d1ec5a feeds: remove useless if(len) before loop 2020-06-02 13:57:45 +02:00
pictuga 4ccc0dafcd Basic help for sub-lib interactive use 2020-05-26 19:34:20 +02:00
pictuga 2fe3e0b8ee feeds: clean up other stylesheets before putting ours 2020-05-26 19:26:36 +02:00
pictuga 22005065e8 Use etree.tostring 'method' arg
Gives appropriately formatted html code.
Some pages might otherwise be rendered as blank.
2020-05-13 11:44:34 +02:00
pictuga c27c38f7c7 crawler: return dict instead of tuple 2020-04-28 22:29:07 +02:00
pictuga a1dc96cb50 feeds: remove mimetype from function call as no longer used 2020-04-28 22:07:25 +02:00
pictuga 818cdaaa9b Make it possible to call sub-libs in non interactive mode
Run `python -m morss.feeds http://lemonde.fr` and so on
2020-04-27 18:00:14 +02:00
pictuga 2806c64326 Make it possible to directly run sub-libs (feeds, crawler, readabilite)
Run `python -im morss.feeds http://website.sample/rss.xml` and so on
2020-04-27 17:19:31 +02:00
pictuga 59ef5af9e2 feeds: fix bug when deleting attr in html 2020-04-24 22:12:05 +02:00
pictuga 325a373e3e feeds: add SyntaxError catch 2020-04-20 16:15:15 +02:00
pictuga 4ce3c7cb32 Small code clean ups 2020-04-19 12:50:05 +02:00
pictuga 7375adce33 sheet.xsl: fix & improve 2020-04-15 23:34:28 +02:00
pictuga fe82b19c91 Merge .xsl & html template
Turns out they somehow serve a similar purpose
2020-04-15 22:30:45 +02:00
pictuga 8e5e8d24a4 Timezone fixes 2020-04-10 20:33:59 +02:00
pictuga 9e7b9d95ee feeds: properly use html template 2020-04-09 20:00:51 +02:00
pictuga 987a719c4e feeds: try all parsers regardless of contenttype
Turns out some websites send the wrong contenttype (json for html, html for xml, etc.)
2020-04-09 19:17:51 +02:00
pictuga 3c7f512583 feeds: handle several errors 2020-04-09 19:09:10 +02:00
pictuga b0f80c6d3c morss: fix csv output encoding 2020-04-09 19:05:50 +02:00
pictuga f3d1f92b39 Detect encoding everytime 2020-04-07 10:38:36 +02:00
pictuga a09831415f feeds: fix bug when mimetype matches nothing 2020-04-06 18:53:07 +02:00
pictuga aad2398e69 feeds: turns out lxml.etree doesn't have drop_tag 2020-04-05 21:50:38 +02:00
pictuga 568e7d7dd2 feeds: make BS's output bytes for lxml's sake 2020-04-05 20:46:04 +02:00
pictuga 40c69f17d2 feeds: parse html with BS
More robust & to make it consistent with :getpage
2020-04-05 16:12:41 +02:00
pictuga 4d785820d9 feeds: ignore provided stylesheets and add ours
Provided sheets usually create errors. Ours is (hopefully) more informative for users not familiar with RSS feeds
2020-03-20 15:32:44 +01:00
pictuga 6a01fc439e feeds: better handle "empty" datetime 2020-03-20 12:30:42 +01:00
pictuga dd2651061f feeds & morss: clean up comments/empty lines 2020-03-20 12:25:48 +01:00
pictuga 912c323c40 feeds: make function output more consistent
e.g. setters return nothing, getters return sth relevant or None (i.e. no empty strings)
2020-03-20 12:23:15 +01:00
pictuga 5705a0be17 feeds: fix delete/rmv code 2020-03-20 12:22:07 +01:00
pictuga 4735ffba45 feeds: fix .convert auto-convert
To fix inheritance loophole
2020-03-20 12:20:41 +01:00
pictuga 08e39f5631 feeds: give simpler name to helper functions 2020-03-20 12:20:15 +01:00
pictuga 765a43511e feeds: remove unused import 2020-03-20 12:19:08 +01:00
pictuga 5865af64f9 Fix indent output for html/xml 2020-03-20 12:18:13 +01:00
pictuga d12d44a500 Remove unused hash-bangs
Leftovers from debugging
2020-03-19 15:06:28 +01:00