* include/libwget.h.in: Add function wget_list_getnext().
* libwget/list.c: Add function wget_list_getnext().
* libwget/robots.c: Fix memory leak.
* src/host.c (host_remove_job): Cleanup queue after downloading and
scanning robots.txt.
* src/job.h (struct JOB): Add flag 'requested_by_user'.
* src/wget.c (add_url_to_queue): Set 'requested_by_user',
(add_url): Fix checking for disallowed paths.
* tests/Makefile.am: Add test 'test-robots'.
* tests/test-robots.c: New test to prove robots functionality.
Special handling for automatic robots.txt jobs
==============================================
What can happen with --recursive and --span-hosts is that a document from hostA
has links to hostB. All these links might go into the hostB queue before robots.txt
is downloaded and parsed. To avoid downloading of 'disallowed' documents, the queue
for hostB has to be cleaned up right after downloading and parsing robots.txt.
Any links links that have been explicitly requested by the user are still downloaded.
* include/libwget.h.in: Update prototype of wget_robots_parse()
* libwget/robots.c (wget_robots_parse): Add client name as parameter
* src/wget.c: Call wget_robots_parse() with PACKAGE_NAME as client name
* examples/getstream.c, examples/print_css_urls2.c, libwget/cookie.c,
libwget/css_url.c, libwget/encoding.c, libwget/hsts.c,
libwget/html_url.c, libwget/http.c, libwget/metalink.c,
libwget/ocsp.c, libwget/robots.c, libwget/ssl_gnutls.c,
src/options.c, src/wget.c, tests/stringmap_perf.c,
tests/test-wget-1.c
Strndup() calls an additional strlen() on the input string.
This is normally not needed, and thus just consumes CPU cycles.
All calls to strndup() could be replaced by wget_strmemdup, which
basically allocates len+1 memory, calls memcpy and terminates with
a 0 byte.