pictuga
|
2afea497a3
|
readabilite: br2p use "node" instead of "item"
Confusing with rss items otherwise
|
2017-07-17 00:06:39 +02:00 |
pictuga
|
843dc97fbf
|
readabilite: change scoring algorithm
Use 3 groups of keywords instead
|
2017-07-17 00:01:44 +02:00 |
pictuga
|
3ca6ed5bb0
|
readabilite: add author/about to black list
|
2017-03-24 22:02:41 -10:00 |
pictuga
|
4aa25bf3d8
|
readabilite: clean_html before scoring
Surprisingly efficient
|
2017-03-24 21:50:46 -10:00 |
pictuga
|
bfefa8d599
|
readabilite: add tags to black list
|
2017-03-24 21:50:26 -10:00 |
pictuga
|
91da0f36dc
|
readabilite: comment the clean_html function
|
2017-03-24 21:50:01 -10:00 |
pictuga
|
67889a1d14
|
readabilite: drop useless tags
This extra cluster actually jams the algorithm
|
2017-03-24 21:49:14 -10:00 |
pictuga
|
d6882e0a6a
|
readabilite: (try to) emprove detection
Kinda hopeless
|
2017-03-19 02:00:31 -10:00 |
pictuga
|
79a8ada9f4
|
readabilite: add tags to score
|
2017-03-19 01:57:54 -10:00 |
pictuga
|
4a5150e030
|
readabilite: fix iter while iterating
|
2017-03-19 01:56:33 -10:00 |
pictuga
|
e65c88abf8
|
readabilite: fix re.match
|
2017-03-19 01:55:40 -10:00 |
pictuga
|
367f86987d
|
readabilite: spread score to all ancestors
Instead of just parents and grandparents
|
2017-03-18 22:24:38 -10:00 |
Florian Muenchbach
|
993ac638a3
|
Added override for auto-detected character encoding of parsed pages.
|
2017-03-08 18:45:20 -10:00 |
pictuga
|
3fc89d5359
|
readabilite: improve score for <p>
Helps a lot with bbc, le monde. Might backfire on other websites tho...
|
2017-03-01 18:02:45 -10:00 |
pictuga
|
e0f533ca31
|
readabilite: test to replace <br/> with div
|
2017-02-25 18:16:15 -10:00 |
pictuga
|
c6c113b8a8
|
readabilite: function to clean up the html code
|
2017-02-25 18:15:33 -10:00 |
pictuga
|
58d9f65735
|
readabilite: explain the use of .tail
|
2017-02-25 18:14:13 -10:00 |
pictuga
|
a5aec8c7a6
|
readability: more keywords to the filter list
Also fixed indentation
|
2017-02-25 18:13:15 -10:00 |
pictuga
|
e71fc967ce
|
readabilite: shift "good" tags to a var (list)
So that this list can later be re-used
|
2017-02-25 18:07:28 -10:00 |
pictuga
|
b14381f575
|
Use internal readability fork
Much simpler, doesn't clean the html, probably less efficient, but much faster
|
2016-05-31 02:50:03 +02:00 |