Commit Graph

83 Commits

Author SHA1 Message Date
b89af6d0b8 Introduce XML function to decode entities
* include/wget/wget.h: Add wget_xml_decode_entities_inline().
* libwget/iri.c (iri_unescape_inline): Remove decoding of XML entities.
* libwget/xml.c: Add wget_xml_decode_entities_inline().
* src/wget.c (normalize_uri): Call wget_xml_decode_entities_inline().
2022-09-02 19:51:52 +02:00
ed80255d82 Remove redunadnt memcpy() calls and refactor
* libwget/iri.c: Remove redundant memcpy() calls
* libwget/iri.c: Refactor
* libwget/dns_cache.c: Refactor
2022-06-26 13:08:10 +02:00
27d606baf3 * libwget/iri.c (wget_iri_get_escaped_resource): Fix escaping space in query part
Examples and explanation in https://gitlab.com/gnuwget/wget/-/issues/10
2022-05-21 19:58:22 +02:00
3dc7f71098 Update copyright year 2022-02-25 17:46:43 +01:00
13516ee20f * libwget/iri.c (iri_unescape_inline): Avoid harmless integer overflow 2021-12-05 18:05:11 +01:00
01a6fe02e3 Fix request path escaping
* include/wget/wget.h: Remove function wget_iri_isunreserved_path().
* libwget/iri.c: Remove function wget_iri_isunreserved_path().
  (iri_ctype): Extend array with unreserved characters.
  (wget_iri_isunreserved): Simplify code.
  (wget_iri_escape): Use macro iri_isunreserved instead of wget_iri_isunreserved().
  (wget_iri_escape_path): Add RFC links to function comment,
  Fix check whether char needs percent-encoding or not,
  Use macro iri_isunreserved instead of wget_iri_isunreserved().
2021-05-16 14:29:26 +02:00
22162f82b2 Update copyright year 2021-01-22 21:58:38 +01:00
cb20de8e01 Remove CR and LF from URLs
* libwget/iri.c (iri_unescape_inline): Remove CR/LF from URL.
* tests/test-base.c: Add CR+LF into URL.

References
https://gitlab.com/gnuwget/wget2/-/issues/522
2020-04-22 13:49:08 +02:00
59d9ecd9c0 * Update copyright year to 2020 2020-01-10 00:33:02 +01:00
849c95fdf7 * libwget/iri.c (wget_iri_parse_base): Fix implicit conversion -1 to size_t 2020-01-06 12:32:47 +01:00
6d47c6b0db Fix --cut-file-get-vars creating dirs on single file download
* tests/test-cut-get-vars.c: Amend to reproduce issue #490.
* src/blacklist.c (get_local_filename_real): Generate basename
  instead of filename, without query.
* include/wget/wget.h: Add flag WGET_IRI_WITH_QUERY.
  Add param 'flags' to wget_iri_get_basename.
* libwget/iri.c: Add param 'flags' to wget_iri_get_basename.
  Implement flag WGET_IRI_WITH_QUERY.
* fuzz/libwget_iri_fuzzer.c (test): Use WGET_IRI_WITH_QUERY.

Closes #490
2019-12-07 17:29:48 +01:00
f3e1d61a25 Rename wget_iri_get_filename -> wget_iri_get_basename
* fuzz/libwget_iri_fuzzer.c: Likewise.
* include/wget/wget.h: Likewise.
* libwget/iri.c: Likewise.
* src/blacklist.c: Likewise.
2019-12-07 15:40:53 +01:00
2ec7c7e290 Deduplicate code for creating robots.txt jobs
* src/wget.c (queue_url_from_local): Remove duplicated code,
  (queue_url_from_remote): Likewise,
  (set_file_metadata): Amend calling wget_iri_get_connection_part()
* include/wget/wget.h (wget_iri_parse_base): Add 'const' to 'base' param,
  (wget_iri_get_connection_part): Add 'const' to 'iri' param,
  add second param 'buf'
* fuzz/libwget_iri_fuzzer.c (test): Amend calling wget_iri_get_connection_part()
* libwget/iri.c (wget_iri_get_connection_part): Add param 'buf',
  add 'const' to 'iri' param, amend docs,
  (wget_iri_relative_to_abs): Add 'const' to 'base' param,
  amend calling wget_iri_get_connection_part(),
  (wget_iri_parse_base): Add 'const' to 'iri' param
* src/host.c (host_add_robotstxt_job): Add code removed in wget.c
* src/wget_host.h: Don't include wget_blacklist.h,
  amend signature of host_add_robotstxt_job()
* src/wget_job.h (struct JOB): Add 'const' to 'blacklist_entry'
2019-10-26 18:48:33 +02:00
04f7c28278 * libwget/iri.c: Improve C99 compliancy 2019-09-23 21:17:38 +02:00
4b9edbad2d Update Copyright statements to be compatible with update-copyright module 2019-09-10 23:41:56 +02:00
2dcc5da12d * libwget/iri.c (wget_iri_set_scheme): Fix amending https port 2019-08-21 21:46:21 +02:00
02deaa140d * libwget/iri.c (wget_iri_parse): Avoid casts to suppress warning 2019-08-17 17:42:22 +02:00
c048d7ee68 * libwget/iri.c (wget_iri_parse): Return NULL on unknown scheme 2019-08-17 17:01:51 +02:00
b1c3af367f * libwget/iri.c (wget_iri_parse): Set scheme to -1 if unknown 2019-08-17 16:10:03 +02:00
1f635966bd * libwget/iri.c: Cast enums to avoid warnings with clang 2019-08-14 10:26:38 +02:00
65ec4901b4 Remove global wget_iri_schemes[], add wget_iri_scheme enum
* include/wget/wget.h: Remove global wget_iri_schemes[],
  add wget_iri_scheme enum
* libwget/iri.c: Add struct iri_scheme,
  remove wget_iri_schemes and iri_ports,
  new function wget_iri_scheme_get_name(),
  fix code
* examples/check_url_types.c: Use comparison instead of wget_strcasecmp
* fuzz/libwget_iri_fuzzer.c: Use WGET_IRI_SCHEME_HTTPS instead of string
* libwget/http.c: Use wget_iri_scheme_get_name()
* libwget/http.h: Change scheme from string to wget_iri_scheme
* libwget/http_parse.c: Fix wget_http_get_scheme()
* src/blacklist.c: Fix hash_iri()
* src/host.c: Fix _host_hash()
* src/options.c: Use WGET_IRI_SCHEME_* instead of string
* src/stats_server.c: Use wget_iri_scheme for scheme member
* src/wget.c: Fix code
* src/wget_host.h: Use wget_iri_scheme fro scheme member
* unit-tests/test.c: Fix tests
2019-08-13 16:55:53 +02:00
12f1a03693 Changed G_GNUC_WGET prefix to WGET_GCC
* include/wget/wget.h: Changed G_GNUC_WGET prefix to WGET_GCC
* */*.[ch]: Likewise
2019-08-08 17:13:24 +02:00
b6e9bcc7de Allow -1 as len parameter to wget_iri_relative_to_abs()
* libwget/http_highlevel.c (wget_http_get): Use -1 as len
* libwget/iri.c (wget_iri_parse_base): Likewise
* src/wget.c (process_response_header): Likewise
* unit-tests/test.c (test_iri_relative_to_absolute): Likewise
2019-08-08 15:46:11 +02:00
50be5af992 Rename wget_iri_t -> wget_iri
* include/wget/wget.h: Rename wget_iri_t -> wget_iri
* examples/*.c: Likewise
* fuzz/*.c: Likewise
* libwget/*.c: Likewise
* src/*.c: Likewise
* tests/*.c: Likewise
* unit-tests/*.c: Likewise
2019-07-18 13:10:27 +02:00
db50c6a801 Rename wget_buffer_t -> wget_buffer
* include/wget/wget.h: Rename wget_buffer_t -> wget_buffer
* examples/*.c: Likewise
* fuzz/libwget_iri_fuzzer.c: Likewise
* libwget/*.c: Likewise
* src/*.c: Likewise
* tests/*.c: Likewise
* unit-tests/*.c: Likewise
2019-07-18 12:30:57 +02:00
97975c4c2d * libwget/iri.c (wget_iri_parse,wget_iri_clone): Check result of malloc() 2019-06-20 16:34:37 +02:00
90b9be9462 * libwget/iri.c (_iri_unescape_inline): Fix integer overflow (harmless) 2019-04-25 21:27:13 +02:00
6e5c820cf7 Fix heap buffer overflow introduced in previous commit
* libwget/iri.c (_iri_unescape_inline): Add missing 'continue'

Found by OSS-FUZZ (Issue 14428)
2019-04-24 09:40:20 +02:00
8df8100af9 Add basic HTML entity decoding
* libwget/iri.c (_iri_unescape_inline): Decode basic HTML entities
* tests/test-base.c: Add test for &, #ddd; and #xHH;

This only cares for the basic HTML entities mentioned in RFC1866.
This commit closes Gitlab issue 44, though it is not a full handling
of entities. But not sure if anyone ever requests it.

Fixes #44
2019-04-23 11:33:49 +02:00
cdb3600791 Update copyrigght to 2019 2019-01-02 12:42:53 +01:00
3178560d7b Fix misspellings 2018-12-27 19:22:27 -02:00
3eb90ca3d4 Revert "Fix -k/--convert-links"
This reverts commit 89ff57ee93.

The commit fixed a bug by generating a regression at another place.
See https://gitlab.com/gnuwget/wget2/issues/415#note_124641970
2018-12-13 11:26:57 +01:00
89ff57ee93 Fix -k/--convert-links
When different URLs result in the same local file so that
-k converts a file several times, that file becomes corrupted.

This commit fixes it and adds a test case.

Reported-by: rgpublic on Gitab.com
Fixes #411
2018-10-31 15:47:13 +01:00
3688ffb941 Update copyright to 2018 2018-04-30 20:52:11 +02:00
332b689925 Remove _GL_INLINE from functions
* libwget/hashmap.c: Likewise
* libwget/iri.c: Likewise
* libwget/ssl_gnutls.c: Likewise
* libwget/utils.c: Likewise

Gcc 4.1.3 on NetBSD 5.1 errors when duplicate _GL_INLINE is given as a
function attribute.
From the gnulib manual: "C code ordinarily should not use inline. Typically it
is better to let the compiler figure out whether to inline, as compilers are
pretty good about optimization nowadays. In this sense, inline is like
register, another keyword that is typically no longer needed"
2018-04-30 20:07:49 +02:00
863338764a Dont't recognize %3A as colon in relative URLs
* include/wget/wget.h: New function wget_iri_unescape_url_inline()
* libwget/iri.c: Likewise
* src/wget.c (_normalize_uri): Use wget_iri_unescape_url_inline()
* examples/check_url_types.c: Likewise
* tests/test-base.c: Add test case
* unit-tests/test.c (test_iri_relative_to_absolute): Add test case

Relative URLS like in href='foo%3A/' have been detected and parsed
as absolute URL with (unknown) scheme 'foo:'.
This commit fixes it.
2018-04-13 19:23:11 +02:00
3509c10df3 * libwget/iri.c (wget_iri_get_escaped_resource): Escape space in query 2018-03-01 11:14:27 +01:00
7590af60fc Add options --default-http-port and --default-https-port
* docs/wget2.md: Add docs for the two options
* include/wget/wget.h: Prototype for wget_iri_set_defaultport()
* libwget/iri.c: New function wget_iri_set_defaultport()
* src/options.c: New options --default-http-port and --default-https-port
* src/wget_options.h (struct config): Add default_http_port and default_https_port
2018-01-30 12:32:25 +01:00
944522111e Add http:// to URIs without scheme
* libwget/iri.c (wget_iri_parse): Add http:// to uri without scheme
* include/wget/wget.h (wget_iri_t): Add member msize

This commit also fixes a buffer overflow in wget_iri_set_scheme in
combination with wget_iri_clone.
2018-01-30 12:32:24 +01:00
6de7792091 Simplify code for --https-enforce
* libwget/iri.c (wget_iri_set_scheme): Cleanup
* src/host.c (host_add_robotstxt_job): Add param http_fallback
* src/wget_host.h: Likewise
* src/job.c (job_init): Add param http_fallback
* src/wget_job.h: Likewise
* src/wget.c: Remove http_fallback_urls and http_fallback_urls_mutex,
  (add_url_to_queue): Set http_fallback for new jobs,
  (add_url): Likewise
2018-01-30 12:32:24 +01:00
3049c6be0a * libwget/iri.c: Don't percent encode the query portion of a IRI 2018-01-04 16:14:07 +01:00
bc0aaaee3c Fix some more URLs from http:// to https://
* .travis.sh: Likewise
* .travis_setup.sh: Likewise
* COPYING: Likewise
* COPYING.LESSER: Likewise
* bootstrap: Likewise
* include/wget/wget.h: Likewise
* include/wget/wgetver.h.in: Likewise
* libwget/atom_url.c: Likewise
* libwget/iri.c: Likewise
2017-12-12 11:06:25 +01:00
b0579db097 Fix several issues foudn by cppcheck
* libwget/bar.c (_wget_bar_st): Remove unused member screen_width
* libwget/css.c: Reduce scope of variables
* libwget/encoding.c: Likewise
* libwget/hpkp.c: Likewise
* libwget/http.c: Likewise
* libwget/http_parse.c: Likewise
* libwget/iri.c: Likewise
* libwget/net.c: Likewise
* libwget/sitemap_url.c: Likewise
* libwget/ssl_gnutls.c: Likewise
* libwget/vector.c: Likewise
* src/dl.c: Likewise
2017-12-06 11:15:08 +01:00
f64a41ac14 Add wget_strscpy()
* include/wget/wget.h: Add wget_strscpy()
* libwget/strscpy.c: New file
* libwget/Makefile.am: Add strscpy.c
* libwget/html_url.c: Use wget_strscpy() instead of wget_strlcpy()
* libwget/http.c: Likewise
* libwget/iri.c: Likewise
* libwget/metalink.c: Likewise
* libwget/ssl_gnutls.c: Likewise
* libwget/test_linking.c: Likewise
* src/job.c: Likewise
* src/plugin.c: Likewise
* src/wget.c: Likewise
2017-10-13 10:57:55 +02:00
98f168a7aa * libwget/iri.c (wget_iri_relative_to_abs): Fix null pointer dereference 2017-10-07 15:17:57 +02:00
6948deae02 Use type bool instead of char
* include/wget/wget.h: Use type bool instead of char
* libwget/bar.c: Likewise
* libwget/base64.c: Likewise
* libwget/cookie.c: Likewise
* libwget/encoding.c: Likewise
* libwget/hsts.c: Likewise
* libwget/http.h: Likewise
* libwget/http_highlevel.c: Likewise
* libwget/http_parse.c: Likewise
* libwget/ip.c: Likewise
* libwget/iri.c: Likewise
* libwget/logger.c: Likewise
* libwget/net.h: Likewise
* libwget/ocsp.c: Likewise
* libwget/tls_session.c: Likewise
* libwget/vector.c: Likewise
* src/plugin.c: Likewise
* src/wget_host.h: Likewise
* src/wget_job.h: Likewise
* src/wget_plugin.h: Likewise
* tests/test-plugin-dummy.c: Likewise
2017-09-22 13:08:49 +00:00
9c376c742f Fix bug with combination of host-directories and cut-dirs
* libwget/iri.c(wget_iri_get_path): Change API so that a path separator
is added _only_ if one does not already exist at the end of the buffer.
This should be a non-breaking change.
* src/wget.c(get_local_filename): Append a "/" at the end of the
hostname, always.
2017-09-16 10:30:11 +02:00
13e552441e Hide internal structures/defines from doxygen
* include/wget/wget.h: Use _ as prefix for internals
* libwget/atom_url.c: Likewise
* libwget/cookie.c: Likewise
* libwget/iri.c: Likewise
* libwget/net.c: Likewise
* libwget/rss_url.c: Likewise
* libwget/sitemap_url.c: Likewise
* src/wget_options.h: Likewise
2017-08-04 12:21:16 +02:00
6c0ac1f324 Fix more clang warnings
* libwget/css_tokenizer.lex: Skip -Wsuggest-attribute=pure for clang
* libwget/iri.c (wget_iri_parse): Explicit cast
* src/dl.c: Use 'int' for vector size
* src/options.c: Likewise,
  remove 'return' after call to exit(),
  (set_long_option): Rename local var 'env' to 'path'
* src/plugin.c: Make global vars 'static',
  Use 'int' for vector size
2017-07-28 11:56:43 +02:00
230aec0971 * libwget/iri.c (iri_ports): Remove redundant const 2017-07-26 17:58:31 +05:30