c6ee3d79ad
Fix syntax-check 'sc_prohibit_have_config_h'
...
* cfg.mk: Remove sc_prohibit_have_config_h from local-checks-to-skip
* libwget/*.c: Include <config.h> unconditionally
* src/*.c: Likewise
* tests/*.c: Likewise
2017-04-30 22:01:34 +02:00
ec396c577f
Fix URLs to HTTPS where possible
2017-02-28 15:31:30 +01:00
bfcd65c12b
Use typedefs for function pointer arguments
...
* include/wget/wget.h: Add typedefs
* libwget/cookie.c: Use typedefs
* libwget/css.c: Likewise
* libwget/css_url.c: Likewise
* libwget/decompressor.c: Likewise
* libwget/hashmap.c: Likewise
* libwget/hsts.c: Likewise
* libwget/http.c: Likewise
* libwget/init.c: Likewise
* libwget/io.c: Likewise
* libwget/list.c: Likewise
* libwget/logger.c: Likewise
* libwget/metalink.c: Likewise
* libwget/net.c: Likewise
* libwget/netrc.c: Likewise
* libwget/ocsp.c: Likewise
* libwget/private.h: Likewise
* libwget/robots.c: Likewise
* libwget/stringmap.c: Likewise
* libwget/tls_session.c: Likewise
* libwget/vector.c: Likewise
* libwget/xml.c: Likewise
* src/blacklist.c: Likewise
* src/host.c: Likewise
* src/options.c: Likewise
* src/wget.c: Likewise
* tests/stringmap_perf.c: Likewise
* tests/test.c: Likewise
2017-01-23 14:43:17 +01:00
dfc4b53eae
Add typedef for wget_vector_browse() callback function
...
* include/wget/wget.h: Add typedef wget_list_browse_cb_t
* libwget/list.c: Use wget_list_browse_cb_t
2017-01-19 17:11:58 +01:00
9568b0c87e
Rename include/libwget.h to include/wget/wget.h+wgetver.h
2016-09-30 09:47:32 +02:00
36b095fd64
Fix Robots Exclusion Standard
...
* include/libwget.h.in: Add function wget_list_getnext().
* libwget/list.c: Add function wget_list_getnext().
* libwget/robots.c: Fix memory leak.
* src/host.c (host_remove_job): Cleanup queue after downloading and
scanning robots.txt.
* src/job.h (struct JOB): Add flag 'requested_by_user'.
* src/wget.c (add_url_to_queue): Set 'requested_by_user',
(add_url): Fix checking for disallowed paths.
* tests/Makefile.am: Add test 'test-robots'.
* tests/test-robots.c: New test to prove robots functionality.
Special handling for automatic robots.txt jobs
==============================================
What can happen with --recursive and --span-hosts is that a document from hostA
has links to hostB. All these links might go into the hostB queue before robots.txt
is downloaded and parsed. To avoid downloading of 'disallowed' documents, the queue
for hostB has to be cleaned up right after downloading and parsing robots.txt.
Any links links that have been explicitly requested by the user are still downloaded.
2016-09-19 15:23:48 +02:00
79dd277c12
Refactoring to separate send, receive and HTTP state machine
...
* examples/websequencediagram.c (main): Call wget_http_request_set_body()
and wget_http_send_request() instead of wget_http_send_request_with_body().
* include/libwget.h.in: Add WGET_HTTP_USER_DATA, wget_get_timemillis(),
extend wget_thread_cond_wait(), add body, user_data, body_length to
wget_http_request_t, remove wget_http_send_request_with_body(),
add wget_http_request_get_int(), wget_http_request_set_ptr(),
wget_http_request_get_ptr(), wget_http_request_set_body().
* libwget/http.c: Add wget_http_request_get_int(),
wget_http_request_set_ptr(), wget_http_request_get_ptr(),
wget_http_request_set_body(),
remove wget_http_send_request_with_body(),
(wget_http_request_to_buffer): add body to request buffer.
* libwget/http_highlevel.c (wget_http_get): Replace
wget_http_send_request_with_body()
* libwget/iri.c: Use c-ctype.h instead of ctype.h
(wget_iri_parse): Allow any numbers of / after scheme:
(wget_iri_parse): Catch URIs without /after scheme:
* libwget/list.c (wget_list_browse): Small code rearrangement
* libwget/metalink.c (_add_mirror): Check mirror.iri for NULL
* libwget/ssl_gnutls.c (send_ocsp_request): Replace
wget_http_send_request_with_body()
* libwget/thread.c (wget_thread_cond_signal): Add timeout param
* libwget/utils.c: New function wget_get_timemillis()
* src/blacklist.c: Include wget.h instead of log.h
* src/blacklist.h: Fix indentation
* src/host.c: Add queueing stuff
* src/host.h: Reflect changes in host.c
* src/job.c: Remove queueing stuff
* src/job.h: Reflect changes in job.c
* src/log.c: Sync stdout/stderr to correct output order
* src/log.h: Remove shortcuts of print functions
* src/wget.c: Remove download_part() and http_get().
Add http_send_request(), http_receive_response(), try_connection(),
establish_connection(), add_statistics(), process_response_header().
Amend downloader_thread() to reflect the changes.
* src/wget.h: Add shortcut defines for print functions.
* tests/libtest.c (_http_server_thread): Fix compiler warning,
fix debug message.
New function _write_msg() to print server messages yellow.
(wget_test) Add -d to wget command line.
* tests/test-metalink.c (main): Add tests for V3 and V4 metalink
files read from command line (-i, --force-metalink)
* tests/test.c (test_iri_parse): Add test for slash-less mailto: URI
2016-07-11 14:53:36 +02:00
a582d324d6
Fix / silence some Coverity findings
2016-02-07 12:48:41 +01:00
dd3c2f63b2
Updated copyright year for all relevant files
2016-01-25 13:06:21 +01:00
b751e343b7
Remove NULL check in list code
...
* libwget/list.c (wget_list_remove): Remove checking first param for NULL.
Since the function is declared with the first param being NON-NULL, gcc will
silently remove the check anyways. Clang even complains about the check.
2016-01-18 10:40:10 +01:00
f7ba3e5431
Use doxygen instead of gtk-doc
...
* Removed all gtk-doc files and references.
* Added doxygen files and rules
2016-01-17 20:29:58 +01:00
0f8e49128a
Transfer copyright to Free Software Foundation, Inc.
2015-09-22 11:50:06 +02:00
c6b0e461a1
Transform Mget into Wget
2015-09-19 22:54:38 +02:00