* include/wget/wget.h: Remove function wget_iri_isunreserved_path().
* libwget/iri.c: Remove function wget_iri_isunreserved_path().
(iri_ctype): Extend array with unreserved characters.
(wget_iri_isunreserved): Simplify code.
(wget_iri_escape): Use macro iri_isunreserved instead of wget_iri_isunreserved().
(wget_iri_escape_path): Add RFC links to function comment,
Fix check whether char needs percent-encoding or not,
Use macro iri_isunreserved instead of wget_iri_isunreserved().
* include/wget/wget.h: Remove global wget_iri_schemes[],
add wget_iri_scheme enum
* libwget/iri.c: Add struct iri_scheme,
remove wget_iri_schemes and iri_ports,
new function wget_iri_scheme_get_name(),
fix code
* examples/check_url_types.c: Use comparison instead of wget_strcasecmp
* fuzz/libwget_iri_fuzzer.c: Use WGET_IRI_SCHEME_HTTPS instead of string
* libwget/http.c: Use wget_iri_scheme_get_name()
* libwget/http.h: Change scheme from string to wget_iri_scheme
* libwget/http_parse.c: Fix wget_http_get_scheme()
* src/blacklist.c: Fix hash_iri()
* src/host.c: Fix _host_hash()
* src/options.c: Use WGET_IRI_SCHEME_* instead of string
* src/stats_server.c: Use wget_iri_scheme for scheme member
* src/wget.c: Fix code
* src/wget_host.h: Use wget_iri_scheme fro scheme member
* unit-tests/test.c: Fix tests
* libwget/iri.c (_iri_unescape_inline): Decode basic HTML entities
* tests/test-base.c: Add test for &, #ddd; and #xHH;
This only cares for the basic HTML entities mentioned in RFC1866.
This commit closes Gitlab issue 44, though it is not a full handling
of entities. But not sure if anyone ever requests it.
Fixes#44
When different URLs result in the same local file so that
-k converts a file several times, that file becomes corrupted.
This commit fixes it and adds a test case.
Reported-by: rgpublic on Gitab.com
Fixes#411
* libwget/hashmap.c: Likewise
* libwget/iri.c: Likewise
* libwget/ssl_gnutls.c: Likewise
* libwget/utils.c: Likewise
Gcc 4.1.3 on NetBSD 5.1 errors when duplicate _GL_INLINE is given as a
function attribute.
From the gnulib manual: "C code ordinarily should not use inline. Typically it
is better to let the compiler figure out whether to inline, as compilers are
pretty good about optimization nowadays. In this sense, inline is like
register, another keyword that is typically no longer needed"
* include/wget/wget.h: New function wget_iri_unescape_url_inline()
* libwget/iri.c: Likewise
* src/wget.c (_normalize_uri): Use wget_iri_unescape_url_inline()
* examples/check_url_types.c: Likewise
* tests/test-base.c: Add test case
* unit-tests/test.c (test_iri_relative_to_absolute): Add test case
Relative URLS like in href='foo%3A/' have been detected and parsed
as absolute URL with (unknown) scheme 'foo:'.
This commit fixes it.
* docs/wget2.md: Add docs for the two options
* include/wget/wget.h: Prototype for wget_iri_set_defaultport()
* libwget/iri.c: New function wget_iri_set_defaultport()
* src/options.c: New options --default-http-port and --default-https-port
* src/wget_options.h (struct config): Add default_http_port and default_https_port
* libwget/iri.c (wget_iri_parse): Add http:// to uri without scheme
* include/wget/wget.h (wget_iri_t): Add member msize
This commit also fixes a buffer overflow in wget_iri_set_scheme in
combination with wget_iri_clone.
* libwget/iri.c(wget_iri_get_path): Change API so that a path separator
is added _only_ if one does not already exist at the end of the buffer.
This should be a non-breaking change.
* src/wget.c(get_local_filename): Append a "/" at the end of the
hostname, always.
* libwget/css_tokenizer.lex: Skip -Wsuggest-attribute=pure for clang
* libwget/iri.c (wget_iri_parse): Explicit cast
* src/dl.c: Use 'int' for vector size
* src/options.c: Likewise,
remove 'return' after call to exit(),
(set_long_option): Rename local var 'env' to 'path'
* src/plugin.c: Make global vars 'static',
Use 'int' for vector size