wget2

gnu/wget2

mirror of https://gitlab.com/gnuwget/wget2.git synced 2026-02-01 14:41:08 +00:00

Author	SHA1	Message	Date
Tim Rühsen	c6ee3d79ad	Fix syntax-check 'sc_prohibit_have_config_h' * cfg.mk: Remove sc_prohibit_have_config_h from local-checks-to-skip * libwget/.c: Include <config.h> unconditionally src/.c: Likewise tests/*.c: Likewise	2017-04-30 22:01:34 +02:00
Tim Rühsen	58f1274808	Do not include unused strings.h * examples/getstream.c: Remove #include <strings.h> * libwget/css.c: Likewise * libwget/encoding.c: Likewise * libwget/hashfile.c: Likewise * libwget/metalink.c: Likewise * libwget/net.c: Likewise * libwget/robots.c: Likewise * src/wget.c: Likewise	2017-04-29 20:24:19 +02:00
Didik Setiawan	a07a8c0825	Fix illegal memory access when parsing null-terminated string	2017-04-04 12:29:45 +02:00
Tim Rühsen	ec396c577f	Fix URLs to HTTPS where possible	2017-02-28 15:31:30 +01:00
Tim Rühsen	bfcd65c12b	Use typedefs for function pointer arguments * include/wget/wget.h: Add typedefs * libwget/cookie.c: Use typedefs * libwget/css.c: Likewise * libwget/css_url.c: Likewise * libwget/decompressor.c: Likewise * libwget/hashmap.c: Likewise * libwget/hsts.c: Likewise * libwget/http.c: Likewise * libwget/init.c: Likewise * libwget/io.c: Likewise * libwget/list.c: Likewise * libwget/logger.c: Likewise * libwget/metalink.c: Likewise * libwget/net.c: Likewise * libwget/netrc.c: Likewise * libwget/ocsp.c: Likewise * libwget/private.h: Likewise * libwget/robots.c: Likewise * libwget/stringmap.c: Likewise * libwget/tls_session.c: Likewise * libwget/vector.c: Likewise * libwget/xml.c: Likewise * src/blacklist.c: Likewise * src/host.c: Likewise * src/options.c: Likewise * src/wget.c: Likewise * tests/stringmap_perf.c: Likewise * tests/test.c: Likewise	2017-01-23 14:43:17 +01:00
Tim Rühsen	9568b0c87e	Rename include/libwget.h to include/wget/wget.h+wgetver.h	2016-09-30 09:47:32 +02:00
Tim Rühsen	36b095fd64	Fix Robots Exclusion Standard * include/libwget.h.in: Add function wget_list_getnext(). * libwget/list.c: Add function wget_list_getnext(). * libwget/robots.c: Fix memory leak. * src/host.c (host_remove_job): Cleanup queue after downloading and scanning robots.txt. * src/job.h (struct JOB): Add flag 'requested_by_user'. * src/wget.c (add_url_to_queue): Set 'requested_by_user', (add_url): Fix checking for disallowed paths. * tests/Makefile.am: Add test 'test-robots'. * tests/test-robots.c: New test to prove robots functionality. Special handling for automatic robots.txt jobs ============================================== What can happen with --recursive and --span-hosts is that a document from hostA has links to hostB. All these links might go into the hostB queue before robots.txt is downloaded and parsed. To avoid downloading of 'disallowed' documents, the queue for hostB has to be cleaned up right after downloading and parsing robots.txt. Any links links that have been explicitly requested by the user are still downloaded.	2016-09-19 15:23:48 +02:00
Tim Rühsen	ef7c5bba6d	Add docs for robots.txt parsing * libwget/robots.c: Add Doxygen docs * docs/Makefile.am: Add rule for man page	2016-04-25 21:14:52 +02:00
Tim Rühsen	7f96ff9ed8	Do not use hard-coded user-agent in robots library code * include/libwget.h.in: Update prototype of wget_robots_parse() * libwget/robots.c (wget_robots_parse): Add client name as parameter * src/wget.c: Call wget_robots_parse() with PACKAGE_NAME as client name	2016-03-29 19:20:01 +02:00
Tim Rühsen	dd3c2f63b2	Updated copyright year for all relevant files	2016-01-25 13:06:21 +01:00
Tim Rühsen	1e8c3848d0	Call wget_str(n)casecmp_ascii instead of str(n)casecmp * examples/print_css_urls2.c, libwget/cookie.c, libwget/css.c, libwget/html_url.c, libwget/http.c, libwget/iri.c, libwget/metalink.c, libwget/net.c, libwget/robots.c, libwget/ssl_gnutls.c, libwget/stringmap.c, libwget/xml.c, src/job.c, src/options.c, src/wget.c: Call wget_str(n)casecmp_ascii instead of str(n)casecmp	2016-01-11 16:26:25 +01:00
Tim Rühsen	18bdc20576	Replaced strndup() by wget_strmemdup() * examples/getstream.c, examples/print_css_urls2.c, libwget/cookie.c, libwget/css_url.c, libwget/encoding.c, libwget/hsts.c, libwget/html_url.c, libwget/http.c, libwget/metalink.c, libwget/ocsp.c, libwget/robots.c, libwget/ssl_gnutls.c, src/options.c, src/wget.c, tests/stringmap_perf.c, tests/test-wget-1.c Strndup() calls an additional strlen() on the input string. This is normally not needed, and thus just consumes CPU cycles. All calls to strndup() could be replaced by wget_strmemdup, which basically allocates len+1 memory, calls memcpy and terminates with a 0 byte.	2015-11-10 10:55:30 +01:00
Tim Rühsen	0f8e49128a	Transfer copyright to Free Software Foundation, Inc.	2015-09-22 11:50:06 +02:00
Tim Rühsen	c6b0e461a1	Transform Mget into Wget	2015-09-19 22:54:38 +02:00

14 Commits