Commit Graph

574 Commits (d3b623482d1b64b6e17c821000d35fb330140f60)

Author SHA1 Message Date
pictuga 32645548c2 pytest: first batch with test_feeds
continuous-integration/drone/push Build is failing Details
And multiple related fixes
2022-01-31 08:32:34 +01:00
pictuga d6b90448f3 crawler: improve handling of non-ascii urls 2022-01-30 23:27:49 +01:00
pictuga da81edc651 log to stderr
continuous-integration/drone/push Build is failing Details
2022-01-26 07:57:57 +01:00
pictuga 4f2895f931 cli: update `--help`
continuous-integration/drone/push Build is failing Details
2022-01-25 22:36:57 +01:00
pictuga b2b04691d6 Ability to pass custom data_files location 2022-01-25 22:36:34 +01:00
pictuga bfaf7b0fac feeds: clean up default `item_link`
continuous-integration/drone/push Build is failing Details
To be supported by feeds' `_rule_parse`
2022-01-24 16:16:14 +00:00
pictuga 32d9bc9d9d feeds: proceed with conversion when rules do not match
continuous-integration/drone/push Build is failing Details
2022-01-24 09:34:57 +00:00
pictuga b138f11771 util: support more `data_files` location
continuous-integration/drone/push Build is passing Details
2022-01-23 12:40:18 +01:00
pictuga a01258700d More ordering options
continuous-integration/drone/push Build was killed Details
2022-01-23 12:27:07 +01:00
pictuga 4d6d3c9239 wsgi: limit supported mimetypes & return actual mimetype
continuous-integration/drone/push Build is passing Details
2022-01-23 11:44:07 +01:00
pictuga e81f6b173f readabilite: remove code duplicate 2022-01-23 11:41:32 +01:00
pictuga fe5dbf1ce0 wsgi: reuse mimetype table from crawler 2022-01-22 13:22:39 +01:00
pictuga d05706e056 crawler: fix typo
continuous-integration/drone/push Build was killed Details
2022-01-19 13:41:12 +01:00
pictuga e88a823ada feeds: better handle rulesets without a 'mode' specified
continuous-integration/drone/push Build is failing Details
2022-01-19 13:08:33 +01:00
pictuga 750850c162 crawler: avoid too many .append() 2022-01-19 13:04:33 +01:00
pictuga c8669002e4 feeds: exotic xpath in html as well
continuous-integration/drone/push Build is passing Details
2022-01-17 14:22:48 +00:00
pictuga c524e54d2d feeds: support some exotic xpath rules returning a single string
continuous-integration/drone/push Build is passing Details
2022-01-17 13:59:58 +00:00
pictuga fb643f5ef1 readabilite: remove unneeded reference to `features` (overriden by `builder`)
continuous-integration/drone/push Build is passing Details
2022-01-03 18:01:12 +00:00
pictuga dbdca910d8 readabilite: fix new parser code & drop PIs
continuous-integration/drone/push Build was killed Details
2022-01-03 17:51:49 +00:00
pictuga 9eb19fac04 readabilite: use custom html parser within bs4's lxml parser
continuous-integration/drone/push Build is passing Details
Solves the following obscure error:
ValueError: Invalid PI name 'b'xml''
2022-01-03 16:26:17 +00:00
pictuga d424e394d1 readabilite: use lxml bs4 parser for speed
continuous-integration/drone/push Build is passing Details
2022-01-01 14:52:48 +01:00
pictuga 3f92787b38 readabilite: limit html comments related issues
continuous-integration/drone/push Build is passing Details
2022-01-01 13:58:42 +01:00
pictuga afc31eb6e9 readabilite: avoid double parsing of html
continuous-integration/drone/push Build is passing Details
2022-01-01 12:51:30 +01:00
pictuga 87d2fe772d wsgi: fix py2 compatibility 2022-01-01 12:35:41 +01:00
pictuga 917aa0fbc5 crawler: do not re-save cached response
continuous-integration/drone/push Build is passing Details
Otherwise cache never gets invalidated!
2021-12-31 19:28:11 +01:00
pictuga d17b9a2f27 Fix typo in DISKCACHE_DIR var name
continuous-integration/drone/push Build was killed Details
2021-12-23 12:02:24 +01:00
pictuga 368e4683d6 util: clean paths code
continuous-integration/drone/push Build was killed Details
2021-12-16 08:53:18 +00:00
pictuga 7cdcbd23e1 wsgi: fix another typo
continuous-integration/drone/push Build is passing Details
2021-12-14 12:06:08 +00:00
pictuga 25f283da1f wsgi: fix bug following the removal of the loop
continuous-integration/drone/push Build was killed Details
2021-12-14 11:56:55 +00:00
pictuga 727d14e539 wsgi: use data_files helper
continuous-integration/drone/push Build was killed Details
2021-12-14 11:47:10 +00:00
pictuga 3392ae3973 util: try one more path for data_files
continuous-integration/drone/push Build is passing Details
2021-12-14 11:10:26 +00:00
pictuga 51f1d330a4 Fn to access data_files & pkg files
continuous-integration/drone Build is running Details
continuous-integration/drone/push Build is passing Details
2021-12-05 12:09:01 +01:00
pictuga eb47aac6f1 morss: respect timeout settings in all cases
continuous-integration/drone/push Build is failing Details
Special treatment of feed fetch not justified and not documented
2021-11-25 22:13:38 +01:00
pictuga eca546b890 Change HTTP error code to 404
continuous-integration/drone/push Build is failing Details
To tell them apart from 'true' 500 errors
2021-11-25 21:34:46 +01:00
pictuga d8cc07223e readabilite: fix bug when nothing above threshold
continuous-integration/drone/push Build is failing Details
2021-11-23 20:53:00 +01:00
pictuga 765e0ba728 Pass py error msg in http headers
continuous-integration/drone/push Build is passing Details
2021-11-22 23:22:13 +01:00
pictuga 6ec3fb47d1 readabilite: .strip() first to save time
continuous-integration/drone/push Build is passing Details
2021-11-15 21:54:07 +01:00
pictuga 1083f3ffbc crawler: make sure to use HTTPMessage
continuous-integration/drone/push Build is passing Details
2021-11-11 10:21:48 +01:00
pictuga 7eeb1d696c crawler: clean up code
continuous-integration/drone/push Build is passing Details
2021-11-10 23:25:03 +01:00
pictuga e42df98f83 crawler: fix regression brought with 44a6b2591
continuous-integration/drone/push Build is passing Details
2021-11-10 23:08:31 +01:00
pictuga cb21871c35 crawler: clean up caching code
continuous-integration/drone/push Build is passing Details
2021-11-08 22:02:23 +01:00
pictuga c71cf5d5ce caching: fix diskcache implementation 2021-11-08 21:57:43 +01:00
pictuga 44a6b2591d crawler: cleaner http header object import 2021-11-07 19:44:36 +01:00
pictuga a890536601 morss: comment code a bit 2021-11-07 18:26:07 +01:00
pictuga 8de309f2d4 caching: add diskcache backend 2021-11-07 18:15:20 +01:00
pictuga cbf7b3f77b caching: simplify sqlite code 2021-11-07 18:14:18 +01:00
pictuga d023ec8d73 Change default port to 8000 2021-10-19 22:19:59 +02:00
pictuga 5473b77416 Post-clean up isort
continuous-integration/drone/push Build is passing Details
2021-09-21 08:11:04 +02:00
pictuga 0365232a73 readabilite: custom xpath for article detection
continuous-integration/drone/push Build is failing Details
2021-09-21 08:04:45 +02:00
pictuga a523518ae8 cache: avoid name collision 2021-09-21 08:04:45 +02:00