Commit Graph

16 Commits

Author SHA1 Message Date
fed4ae32c8 Rename ROBOTS to wget_robots_t
* fuzz/libwget_robots_parse_fuzzer.c: Rename ROBOTS to wget_robots_t
* include/wget/wget.h: Likewise
* libwget/robots.c: Likewise
* src/wget_host.h: Likewise
* unit-tests/test.c: Likewise
2017-10-22 20:54:42 +02:00
787c775e00 Replace ROBOTS_PATH by wget_string_t
* include/wget/wget.h: Remove struct ROBOTS_PATH
* libwget/robots.c: Replace ROBOTS_PATH by wget_string_t
* src/host.c: Likewise
* src/wget.c: Likewise
* unit-tests/test.c: Likewise
2017-10-22 20:50:04 +02:00
c6ee3d79ad Fix syntax-check 'sc_prohibit_have_config_h'
* cfg.mk: Remove sc_prohibit_have_config_h from local-checks-to-skip
* libwget/*.c: Include <config.h> unconditionally
* src/*.c: Likewise
* tests/*.c: Likewise
2017-04-30 22:01:34 +02:00
58f1274808 Do not include unused strings.h
* examples/getstream.c: Remove #include <strings.h>
* libwget/css.c: Likewise
* libwget/encoding.c: Likewise
* libwget/hashfile.c: Likewise
* libwget/metalink.c: Likewise
* libwget/net.c: Likewise
* libwget/robots.c: Likewise
* src/wget.c: Likewise
2017-04-29 20:24:19 +02:00
a07a8c0825 Fix illegal memory access when parsing null-terminated string 2017-04-04 12:29:45 +02:00
ec396c577f Fix URLs to HTTPS where possible 2017-02-28 15:31:30 +01:00
bfcd65c12b Use typedefs for function pointer arguments
* include/wget/wget.h: Add typedefs
* libwget/cookie.c: Use typedefs
* libwget/css.c: Likewise
* libwget/css_url.c: Likewise
* libwget/decompressor.c: Likewise
* libwget/hashmap.c: Likewise
* libwget/hsts.c: Likewise
* libwget/http.c: Likewise
* libwget/init.c: Likewise
* libwget/io.c: Likewise
* libwget/list.c: Likewise
* libwget/logger.c: Likewise
* libwget/metalink.c: Likewise
* libwget/net.c: Likewise
* libwget/netrc.c: Likewise
* libwget/ocsp.c: Likewise
* libwget/private.h: Likewise
* libwget/robots.c: Likewise
* libwget/stringmap.c: Likewise
* libwget/tls_session.c: Likewise
* libwget/vector.c: Likewise
* libwget/xml.c: Likewise
* src/blacklist.c: Likewise
* src/host.c: Likewise
* src/options.c: Likewise
* src/wget.c: Likewise
* tests/stringmap_perf.c: Likewise
* tests/test.c: Likewise
2017-01-23 14:43:17 +01:00
9568b0c87e Rename include/libwget.h to include/wget/wget.h+wgetver.h 2016-09-30 09:47:32 +02:00
36b095fd64 Fix Robots Exclusion Standard
* include/libwget.h.in: Add function wget_list_getnext().
* libwget/list.c: Add function wget_list_getnext().
* libwget/robots.c: Fix memory leak.
* src/host.c (host_remove_job): Cleanup queue after downloading and
  scanning robots.txt.
* src/job.h (struct JOB): Add flag 'requested_by_user'.
* src/wget.c (add_url_to_queue): Set 'requested_by_user',
  (add_url): Fix checking for disallowed paths.
* tests/Makefile.am: Add test 'test-robots'.
* tests/test-robots.c: New test to prove robots functionality.

Special handling for automatic robots.txt jobs
==============================================
What can happen with --recursive and --span-hosts is that a document from hostA
has links to hostB. All these links might go into the hostB queue before robots.txt
is downloaded and parsed. To avoid downloading of 'disallowed' documents, the queue
for hostB has to be cleaned up right after downloading and parsing robots.txt.
Any links links that have been explicitly requested by the user are still downloaded.
2016-09-19 15:23:48 +02:00
ef7c5bba6d Add docs for robots.txt parsing
* libwget/robots.c: Add Doxygen docs
* docs/Makefile.am: Add rule for man page
2016-04-25 21:14:52 +02:00
7f96ff9ed8 Do not use hard-coded user-agent in robots library code
* include/libwget.h.in: Update prototype of wget_robots_parse()
* libwget/robots.c (wget_robots_parse): Add client name as parameter
* src/wget.c: Call wget_robots_parse() with PACKAGE_NAME as client name
2016-03-29 19:20:01 +02:00
dd3c2f63b2 Updated copyright year for all relevant files 2016-01-25 13:06:21 +01:00
1e8c3848d0 Call wget_str(n)casecmp_ascii instead of str(n)casecmp
* examples/print_css_urls2.c, libwget/cookie.c, libwget/css.c,
  libwget/html_url.c, libwget/http.c, libwget/iri.c, libwget/metalink.c,
  libwget/net.c, libwget/robots.c, libwget/ssl_gnutls.c,
  libwget/stringmap.c, libwget/xml.c, src/job.c, src/options.c, src/wget.c:
  Call wget_str(n)casecmp_ascii instead of str(n)casecmp
2016-01-11 16:26:25 +01:00
18bdc20576 Replaced strndup() by wget_strmemdup()
* examples/getstream.c, examples/print_css_urls2.c, libwget/cookie.c,
  libwget/css_url.c, libwget/encoding.c, libwget/hsts.c,
  libwget/html_url.c, libwget/http.c, libwget/metalink.c,
  libwget/ocsp.c, libwget/robots.c, libwget/ssl_gnutls.c,
  src/options.c, src/wget.c, tests/stringmap_perf.c,
  tests/test-wget-1.c

Strndup() calls an additional strlen() on the input string.
This is normally not needed, and thus just consumes CPU cycles.
All calls to strndup() could be replaced by wget_strmemdup, which
basically allocates len+1 memory, calls memcpy and terminates with
a 0 byte.
2015-11-10 10:55:30 +01:00
0f8e49128a Transfer copyright to Free Software Foundation, Inc. 2015-09-22 11:50:06 +02:00
c6b0e461a1 Transform Mget into Wget 2015-09-19 22:54:38 +02:00