Commit Graph

7 Commits

Author SHA1 Message Date
00ab822ea8 Fix deprecated comparison structs 2019-01-17 21:19:57 +01:00
0883ac6423 Fix whitespace and indentation, per pep8 2019-01-17 20:47:43 +01:00
87237f6536 Tabs, meet your new overlords: spaces
In a quest to reach pep8, use spaces to indent rather than tabs.
2019-01-17 15:35:39 +01:00
4ce8184e65 Explicitly exclude urls with .. in search crawling
There were per-site configured rules defined but the regexp was slightly
incorrectly defined. However, we should just simply never crawl urls
like this unless they are normalized, so for now just add them to the
hardcoded exclusion rules.
2017-11-08 12:04:36 -05:00
d9744cba44 First stab at supporting https for generic sites
Previously only the main website search supported it, which was less
than great for community sites that are now https only.
2017-04-02 16:47:02 +02:00
7edb14284d Teach search crawler about internal sitemap
We only support it for our main website, which uses a sitemap, so
implement it only for that provider. And always probe
sitemap_internal.xml, since we don't even try to access any external
sites on it.
2017-03-23 16:31:39 +01:00
b8a2015be2 New set of web search crawlers and infrastructure
Replaces the old search code with something that's not quite as much
spaghetti (e.g. not evolved over too much time), and more stable (actual
error handling instead of random crashes)

Crawlers are now also multithreaded to deal with higher latency to some
sites.
2012-01-21 15:27:06 +01:00