pictuga
|
ad9bf946ec
|
crawler: use chardet again
Always nice in case no encoding is specified. Somehow got dropped with commit 245ba99 . Most probably by accident
|
2017-03-08 11:37:12 -10:00 |
pictuga
|
026903ce73
|
crawler: change http header after uncompressing
Change content-encoding to "identity"
|
2017-02-25 18:10:43 -10:00 |
pictuga
|
8a1c00abf0
|
Typo in python version check
|
2015-08-28 19:29:09 +02:00 |
Massimo Vannucci
|
8656e53b84
|
Correct Python version check
|
2015-08-05 23:36:11 +02:00 |
pictuga
|
931fd53da6
|
Fix 304-cache handling
To make sure that the cached request also gets processed (by GZip and stuff)
|
2015-05-04 22:25:26 +08:00 |
pictuga
|
131ba09207
|
Change :cache mode behavior
Makes underlying code way cleaner
|
2015-04-07 09:38:22 +08:00 |
pictuga
|
32aa96afa7
|
Cache HTTP content using a custom Handler
Much much cleaner. Nothing comparable
|
2015-04-06 23:26:12 +08:00 |
pictuga
|
1b4fc88ad0
|
Replace MetaRedirect handler with two cleaner ones
One for <meta http-equiv> and one for HTTP 'refresh' header
|
2015-04-06 23:03:17 +08:00 |
pictuga
|
f2fe4fc364
|
Drop HTTPS SSL certificate verification
Breaks everything with python 3. Now built-in in recent python 2.7.9 and python 3.4-ish
|
2015-04-06 22:54:59 +08:00 |
pictuga
|
29d9e4702f
|
Force enc det to return utf-8 rather than nothing
|
2015-03-24 23:22:56 +08:00 |
pictuga
|
656b29e0ef
|
2to3: using unicode/str to please py3
|
2015-03-11 01:05:02 +08:00 |
pictuga
|
cbeb01e555
|
2to3: fix urllib header retrieval
|
2015-03-11 01:03:16 +08:00 |
pictuga
|
2f542005d1
|
2to3: urllib host
|
2015-03-03 00:59:00 +08:00 |
pictuga
|
dbb3883516
|
2to3: urllib mimetype
|
2015-03-03 00:55:58 +08:00 |
pictuga
|
7bd448789d
|
2to3: first attempt to fix strings
|
2015-02-26 00:50:23 +08:00 |
pictuga
|
a0f2e0d995
|
2to3: crawler.py improve except
|
2015-02-25 18:07:09 +08:00 |
pictuga
|
6a06b742f9
|
2to3: crawler.py port try as
|
2015-02-25 18:03:54 +08:00 |
pictuga
|
c2d85e2bf9
|
2to3: crawler.py port httplib
|
2015-02-25 18:02:29 +08:00 |
pictuga
|
4f224888d8
|
2to3: crawler.py port urllib2 and StringIO
|
2015-02-25 17:53:36 +08:00 |
pictuga
|
27cf8f6498
|
2to3: (iter)items to list
|
2015-02-25 12:02:53 +08:00 |
pictuga
|
8131ea2244
|
HTTPS SSL certificate validation
Specific error message added
|
2014-11-19 11:59:59 +01:00 |
pictuga
|
1b26c5f0e3
|
Split SimpleDownload in a lot of Handlers
Cleaner code, easier to edit, more flexibility. Paves the way to SSL certificates validation.
Still have to clean up the code of AcceptHeadersHandler.
|
2014-11-19 11:57:40 +01:00 |