Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
164 commits
Select commit Hold shift + click to select a range
80cb760
implement URLPattern skeleton
anonrig Nov 30, 2024
fc4b193
use correct value for clang-format
anonrig Nov 30, 2024
cd9df5e
fix build errors
anonrig Nov 30, 2024
09c3f42
create url_pattern-inl.h
anonrig Nov 30, 2024
6656757
add canonicalize methods
anonrig Dec 4, 2024
686af7d
add ada::parse_url_pattern function
anonrig Dec 4, 2024
3c08805
add more comments
anonrig Dec 4, 2024
66da95c
implement getters
anonrig Dec 4, 2024
8377009
add has_regexp_groups()
anonrig Dec 4, 2024
61f4b67
start implementing tokenizer & tokenize
anonrig Dec 4, 2024
fe3af13
add initial parser_url_pattern method
anonrig Dec 4, 2024
1262f8c
add todos and remove redundant qualifiers
anonrig Dec 4, 2024
20691e3
implement escape pattern
anonrig Dec 4, 2024
eed6d80
add CompileComponentOptions
anonrig Dec 7, 2024
f153e00
minor fixes for add-url-pattern (#800)
lemire Dec 8, 2024
7dbf175
rename commits
anonrig Dec 8, 2024
5f36b46
add more parse_url_pattern
anonrig Dec 9, 2024
4619c9d
rename url_pattern class
anonrig Dec 9, 2024
5cdb6db
complete parse_url_pattern implementation
anonrig Dec 9, 2024
5f28a37
add `_component` suffix to components
anonrig Dec 9, 2024
20d7529
remove unnecessary void
anonrig Dec 9, 2024
16aace3
implement generate regular expression methods
anonrig Dec 10, 2024
14d217c
continue working on parser
anonrig Dec 10, 2024
4a31b3f
fix build error
anonrig Dec 12, 2024
f10d3b2
implement constructor string parser
anonrig Dec 12, 2024
4c20080
implement all of tokenizer's functions
anonrig Dec 12, 2024
74f72fd
fix build errors
anonrig Dec 12, 2024
969c87a
fix warnings
anonrig Dec 12, 2024
6276ce8
complete tokenizer
anonrig Dec 12, 2024
5f02b24
implement escape_regexp_string
anonrig Dec 12, 2024
f55e3f5
implement generate_pattern_string
anonrig Dec 12, 2024
cc73e97
fix compiler warnings
anonrig Dec 12, 2024
ca60161
semi-implement match
anonrig Dec 13, 2024
1a47532
complete one more todo
anonrig Dec 13, 2024
a67fe01
simplify create_component_match_result
anonrig Dec 13, 2024
37dc747
simplify
anonrig Dec 13, 2024
d4d843d
use correct inputs for match/exec/test
anonrig Dec 13, 2024
5c212d7
rename wpt_tests to wpt_url_tests
anonrig Dec 13, 2024
d33f228
add wpt_urlpattern_tests skeleton
anonrig Dec 13, 2024
530deb4
add first test
anonrig Dec 14, 2024
f1e04ce
Build fixes (#801)
lemire Dec 14, 2024
a10ba16
fix 2 bugs
anonrig Dec 14, 2024
b67580d
fix linter issues
anonrig Dec 14, 2024
42d6c32
fix 2 more bugs
anonrig Dec 14, 2024
8d8acb2
more progress on missing features
anonrig Dec 16, 2024
ac0817e
move url_pattern_helpers to separate file
anonrig Dec 16, 2024
8523594
fix build errors
anonrig Dec 16, 2024
096e159
use url_pattern_encoding_callback
anonrig Dec 17, 2024
f711faf
fix url pattern constructor error
anonrig Dec 17, 2024
fc5b020
fix more issues
anonrig Dec 17, 2024
690a14a
add initial version of wpt test runner
anonrig Dec 17, 2024
4f1dc9b
simplify json logic (#802)
lemire Dec 17, 2024
21109d1
add fuzzer
anonrig Dec 17, 2024
8929462
removing the reset
lemire Dec 17, 2024
6759d37
update ada idna
anonrig Dec 18, 2024
e8897d9
use ada idna method for valid name code point
anonrig Dec 18, 2024
4e96bbe
fix add part implementation
anonrig Dec 18, 2024
43c806d
fix invalid access errors
anonrig Dec 18, 2024
029e17f
implement tests correctly
anonrig Dec 18, 2024
c4c373b
improve test runner
anonrig Dec 18, 2024
4b3f34d
add url_pattern_init to_string() method
anonrig Dec 19, 2024
8d4994c
update WPT tests
anonrig Dec 19, 2024
5e6f934
fix last remaining todo
anonrig Dec 19, 2024
71468e2
simplify test runner
anonrig Dec 19, 2024
6a4c9a5
minor fixes
lemire Dec 19, 2024
fd6d1d4
some reworking
lemire Dec 19, 2024
7dca1de
make sure to skip invalid tests
anonrig Dec 19, 2024
6d38085
remove std::ranges::iota due to clang
anonrig Dec 20, 2024
abb2af0
add more fuzzing coverage
anonrig Dec 20, 2024
a0df533
try to fix windows issues
anonrig Dec 20, 2024
aeb4699
remove unnecessary copy
anonrig Dec 20, 2024
1eeab05
start testing the validity of the correct responses
anonrig Dec 20, 2024
208c2ff
fix couple of bugs
anonrig Dec 20, 2024
664ed1c
fix invalid ascii checks
anonrig Dec 20, 2024
60c4015
make pattern generation more verbose
anonrig Dec 20, 2024
5e989f0
fix regex error
anonrig Dec 20, 2024
5539349
remove semicolon due to -Werror,-Wextra-semi
anonrig Dec 20, 2024
04252cd
guarding regex call (#805)
lemire Dec 20, 2024
3eac233
add more logging
anonrig Dec 23, 2024
3f7536c
change ada_idna to char32_t
anonrig Dec 23, 2024
602a565
remove try/catch
anonrig Dec 23, 2024
fc3e76e
make canonicalize_ methods more flexible
anonrig Dec 23, 2024
9407a49
fix change_state
anonrig Dec 23, 2024
6d8e960
fix invalid substr call
anonrig Dec 23, 2024
67fb323
fix generate_pattern_string impl
anonrig Dec 23, 2024
dbd003d
fix more small issues
anonrig Dec 23, 2024
8619179
improve url_pattern_init::process
anonrig Dec 23, 2024
a4f0c42
correctly computing the next code point (#808)
lemire Dec 23, 2024
099fb43
adding checks
lemire Dec 23, 2024
049dd11
use std string view to avoid copy
anonrig Dec 23, 2024
6b29fed
use next_index instead of index
anonrig Dec 23, 2024
61f45be
highlight the error message
anonrig Dec 23, 2024
d2bcf67
better decoding
lemire Dec 23, 2024
e997a28
I think that the test is in error (#810)
lemire Dec 23, 2024
6e96857
remove invalid WPT test data
anonrig Dec 24, 2024
188e171
remove invalid assertion
anonrig Dec 24, 2024
5682bf1
fix ipv6 address canonicalize
anonrig Dec 24, 2024
67f9708
fix canonicalize_ipv6_hostname
anonrig Dec 24, 2024
681bf67
simplify test runner
anonrig Dec 24, 2024
3304dd0
fix test runner
anonrig Dec 24, 2024
40f85e3
add a todo
anonrig Dec 24, 2024
fdb044e
remove invalid test case
anonrig Dec 24, 2024
8ee26f4
add tests for expected object
anonrig Dec 24, 2024
7f4acf2
fix hostname tests
anonrig Dec 25, 2024
505f526
complete match implementation
anonrig Dec 25, 2024
6f284c4
fix empty component tests
anonrig Dec 26, 2024
d928625
revert some wpt changes
anonrig Dec 26, 2024
64c6968
add some optional result logging (#812)
lemire Dec 26, 2024
8090940
lint
lemire Dec 26, 2024
f204a8c
fixing logging
lemire Dec 26, 2024
d7b92eb
removing diagram printout
Dec 27, 2024
fc884cb
fix asan build errors
anonrig Dec 28, 2024
77f44d3
simpler version of the yagiz/add-url-pattern branch (#815)
lemire Dec 28, 2024
ab71fa0
simplify implementation
anonrig Dec 29, 2024
ca66004
improve url_pattern_part emplace_back calls
anonrig Dec 29, 2024
b2d9e70
fix url_pattern_component constructor
anonrig Dec 31, 2024
baeafc6
remove the usage of ada.h inside src
anonrig Dec 31, 2024
487582d
move all helper methods to url_pattern.cpp
anonrig Dec 31, 2024
ffee76c
fix urlpatterntestdata.json
anonrig Dec 31, 2024
0100006
fix build errors
anonrig Dec 31, 2024
757683b
add missing check
anonrig Dec 31, 2024
8dc937e
more tests (#817)
lemire Dec 31, 2024
53ba80f
fix assertion error
anonrig Dec 31, 2024
edbf6c0
don't move function calls
anonrig Dec 31, 2024
5f74dd3
fix token reference asan error
anonrig Dec 31, 2024
a5580c7
another test (#818)
lemire Dec 31, 2024
db7acf9
simplify parser and tests
anonrig Jan 1, 2025
c60c2dc
remove unnecessary duplicate_name method
anonrig Jan 1, 2025
385f554
convert Token to class
anonrig Jan 1, 2025
dab41f6
minor cleanups
anonrig Jan 1, 2025
cf69585
remove invalid std::move
anonrig Jan 1, 2025
bd9655d
simplify parser
anonrig Jan 1, 2025
393f515
remove invalid pathname WPT
anonrig Jan 1, 2025
64f66c6
leave some todos for WPT
anonrig Jan 1, 2025
1f563d4
complete inputs parsing
anonrig Jan 1, 2025
52c33b5
removed duplicated code
anonrig Jan 3, 2025
6ae710b
merge error enums
anonrig Jan 3, 2025
1b59155
fix a boolean operation
anonrig Jan 3, 2025
dd20066
update urlpatterntestdata.json
anonrig Jan 3, 2025
528027c
remove unnecessary assertions
anonrig Jan 3, 2025
65fe0b6
removing GLIBCXX debug
Jan 3, 2025
613d60d
updating macos ci
Jan 3, 2025
943f0aa
indent
Jan 3, 2025
9bb11ad
keeping only static
Jan 3, 2025
5b1de58
improve wpt runner
anonrig Jan 3, 2025
1ec8ea0
fix match
anonrig Jan 3, 2025
c858831
add assertions for object return
anonrig Jan 4, 2025
36a7b72
check __cpp_lib_format
lemire Jan 4, 2025
ff2bf00
adding version header (#824)
lemire Jan 4, 2025
0feb9a6
fix match related bugs
anonrig Jan 5, 2025
57accd5
fix port canonicalize
anonrig Jan 5, 2025
67f9988
fix port setting caused by url parser bug
anonrig Jan 5, 2025
9deaa41
add temporary check for special schemes
anonrig Jan 5, 2025
d47ca13
revert opaque host change
anonrig Jan 6, 2025
14e6c53
fix match when input needs to be parsed
anonrig Jan 6, 2025
6f2838f
fix match hash and search prefix
anonrig Jan 6, 2025
89c8bea
fix internal assertion
anonrig Jan 6, 2025
e3f4fe2
improve wpt test runner
anonrig Jan 6, 2025
8b8d5e6
improve regexp matching
anonrig Jan 7, 2025
a47d8c5
fix wpt testrunner
anonrig Jan 7, 2025
b620b09
fix test implementation
anonrig Jan 7, 2025
36a9097
add half-working match_result
anonrig Jan 7, 2025
87def0a
improve regex matching
anonrig Jan 8, 2025
61728b2
remove invalid WPT test
anonrig Jan 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
semi-implement match
  • Loading branch information
anonrig committed Jan 8, 2025
commit ca60161bf552b8c75519b720c3a24a21e3e8ef92
25 changes: 25 additions & 0 deletions include/ada/url_pattern-inl.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,31 @@ url_pattern_component::get_group_name_list() const noexcept ada_lifetime_bound {
return group_name_list;
}

inline url_pattern_component_result
url_pattern_component::create_component_match_result(
std::string_view input, const std::vector<std::string>& exec_result) {
// Let result be a new URLPatternComponentResult.
auto result = url_pattern_component_result{};
// Set result["input"] to input.
result.input = std::string(input);
// Let groups be a record<USVString, (USVString or undefined)>.
result.groups = {};
// Let index be 1.
size_t index = 1;
// While index is less than Get(execResult, "length"):
while (index < exec_result.size()) {
// Let name be component’s group name list[index − 1].
auto name = group_name_list[index - 1];
// Let value be Get(execResult, ToString(index)).
auto value = exec_result.at(index);
// Set groups[name] to value.
result.groups.insert({name, value});
// Increment index by 1.
index++;
}
return result;
}

inline std::string_view url_pattern::get_protocol() const ada_lifetime_bound {
// Return this's associated URL pattern's protocol component's pattern string.
return protocol_component.get_pattern();
Expand Down
28 changes: 16 additions & 12 deletions include/ada/url_pattern.h
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,16 @@ struct url_pattern_compile_component_options {
static url_pattern_compile_component_options PATHNAME;
};

// A struct providing the URLPattern matching results for a single
// URL component. The URLPatternComponentResult is only ever used
// as a member attribute of a URLPatternResult struct. The
// URLPatternComponentResult API is defined as part of the URLPattern
// specification.
struct url_pattern_component_result {
std::string input;
std::unordered_map<std::string, std::string> groups;
};

class url_pattern_component {
public:
url_pattern_component() = default;
Expand All @@ -184,6 +194,10 @@ class url_pattern_component {
std::string_view input, F encoding_callback,
url_pattern_compile_component_options& options);

// @see https://urlpattern.spec.whatwg.org/#create-a-component-match-result
url_pattern_component_result create_component_match_result(
std::string_view input, const std::vector<std::string>& exec_result);

std::string_view get_pattern() const noexcept ada_lifetime_bound;
std::string_view get_regexp() const noexcept ada_lifetime_bound;
const std::vector<std::string>& get_group_name_list() const noexcept
Expand All @@ -201,16 +215,6 @@ class url_pattern_component {
bool has_regexp_groups_ = false;
};

// A struct providing the URLPattern matching results for a single
// URL component. The URLPatternComponentResult is only ever used
// as a member attribute of a URLPatternResult struct. The
// URLPatternComponentResult API is defined as part of the URLPattern
// specification.
struct url_pattern_component_result {
std::string input;
std::unordered_map<std::string, std::string> groups;
};

using url_pattern_input = std::variant<std::string_view, url_pattern_init>;

// A struct providing the URLPattern matching results for all
Expand Down Expand Up @@ -245,15 +249,15 @@ class url_pattern {
std::optional<url_pattern_options> options);

// @see https://urlpattern.spec.whatwg.org/#dom-urlpattern-exec
tl::expected<url_pattern_result, url_pattern_errors> exec(
tl::expected<std::optional<url_pattern_result>, url_pattern_errors> exec(
std::variant<url_pattern_init, url_aggregator> input,
std::string_view* base_url);
// @see https://urlpattern.spec.whatwg.org/#dom-urlpattern-test
bool test(std::variant<url_pattern_init, url_aggregator> input,
std::string_view* base_url);

// @see https://urlpattern.spec.whatwg.org/#url-pattern-match
tl::expected<url_pattern_result, url_pattern_errors> match(
tl::expected<std::optional<url_pattern_result>, url_pattern_errors> match(
std::variant<url_pattern_init, url_aggregator> input,
std::string_view* base_url_string);

Expand Down
203 changes: 192 additions & 11 deletions src/url_pattern.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1605,9 +1605,9 @@ bool protocol_component_matches_special_scheme(std::string_view input) {

// TODO: This function argument should bve url_pattern_input but the spec is
// vague.
tl::expected<url_pattern_result, url_pattern_errors> url_pattern::exec(
std::variant<url_pattern_init, url_aggregator> input,
std::string_view* base_url = nullptr) {
tl::expected<std::optional<url_pattern_result>, url_pattern_errors>
url_pattern::exec(std::variant<url_pattern_init, url_aggregator> input,
std::string_view* base_url = nullptr) {
// Return the result of match given this's associated URL pattern, input, and
// baseURL if given.
return match(input, base_url);
Expand All @@ -1621,15 +1621,16 @@ bool url_pattern::test(std::variant<url_pattern_init, url_aggregator> input,
// Implement a fast path just like `can_parse()` in ada_url.
// Let result be the result of match given this's associated URL pattern,
// input, and baseURL if given.
auto result = match(input, base_url);
// If result is null, return false.
// Return true.
return result.has_value();
if (auto result = match(input, base_url); result.has_value()) {
return result->has_value();
}
return false;
}

tl::expected<url_pattern_result, url_pattern_errors> url_pattern::match(
std::variant<url_pattern_init, url_aggregator> input,
std::string_view* base_url_string) {
tl::expected<std::optional<url_pattern_result>, url_pattern_errors>
url_pattern::match(std::variant<url_pattern_init, url_aggregator> input,
std::string_view* base_url_string) {
std::string protocol{};
std::string username{};
std::string password{};
Expand Down Expand Up @@ -1657,10 +1658,190 @@ tl::expected<url_pattern_result, url_pattern_errors> url_pattern::match(
auto apply_result = url_pattern_init::process(
std::get<url_pattern_init>(input), "url", protocol, username, password,
hostname, port, pathname, search, hash);

// Set protocol to applyResult["protocol"].
ADA_ASSERT_TRUE(apply_result->protocol.has_value());
protocol = apply_result->protocol.value();

// Set username to applyResult["username"].
ADA_ASSERT_TRUE(apply_result->username.has_value());
username = apply_result->username.value();

// Set password to applyResult["password"].
ADA_ASSERT_TRUE(apply_result->password.has_value());
password = apply_result->password.value();

// Set hostname to applyResult["hostname"].
ADA_ASSERT_TRUE(apply_result->hostname.has_value());
hostname = apply_result->hostname.value();

// Set port to applyResult["port"].
ADA_ASSERT_TRUE(apply_result->port.has_value());
port = apply_result->port.value();

// Set pathname to applyResult["pathname"].
ADA_ASSERT_TRUE(apply_result->pathname.has_value());
pathname = apply_result->pathname.value();

// Set search to applyResult["search"].
ADA_ASSERT_TRUE(apply_result->search.has_value());
search = apply_result->search.value();

// Set hash to applyResult["hash"].
ADA_ASSERT_TRUE(apply_result->hash.has_value());
hash = apply_result->hash.value();
} else {
// Let url be input.
auto url = std::get<url_aggregator>(input);

// Let baseURL be null.
result<url_aggregator> base_url;

// If input is a USVString:
// TODO: Implement this.
if (true) {
// If baseURLString was given, then:
if (base_url_string) {
// Let baseURL be the result of parsing baseURLString.
base_url = ada::parse<url_aggregator>(*base_url_string, nullptr);

// If baseURL is failure, return null.
if (!base_url) {
return std::nullopt;
}

// Append baseURLString to inputs.
inputs.emplace_back(*base_url);
}

url_aggregator* base_url_value =
base_url.has_value() ? &*base_url : nullptr;

// Set url to the result of parsing input given baseURL.
auto parsed_url =
ada::parse<url_aggregator>(url.get_href(), base_url_value);

// If url is failure, return null.
if (!parsed_url) {
return std::nullopt;
}

url = parsed_url.value();
}

// Set protocol to url’s scheme.
protocol = url.get_protocol();
// Set username to url’s username.
username = url.get_username();
// Set password to url’s password.
password = url.get_password();
// Set hostname to url’s host, serialized, or the empty string if the value
// is null.
hostname = url.get_hostname();
// Set port to url’s port, serialized, or the empty string if the value is
// null.
port = url.get_port();
// Set pathname to the result of URL path serializing url.
pathname = url.get_pathname();
// Set search to url’s query or the empty string if the value is null.
search = url.get_search();
// Set hash to url’s fragment or the empty string if the value is null.
hash = url.get_hash();
}

// TODO: Implement this
return {};
// TODO: Make this function pluggable using a parameter.
// Let protocolExecResult be RegExpBuiltinExec(urlPattern’s protocol
// component's regular expression, protocol). auto protocol_exec_result =
// RegExpBuiltinExec(url_pattern.protocol.get_regexp(), protocol);

// Let usernameExecResult be RegExpBuiltinExec(urlPattern’s username
// component's regular expression, username). auto username_exec_result =
// RegExpBuiltinExec(url_pattern.username.get_regexp(), username);

// Let passwordExecResult be RegExpBuiltinExec(urlPattern’s password
// component's regular expression, password). auto password_exec_result =
// RegExpBuiltinExec(url_pattern.password.get_regexp(), password);

// Let hostnameExecResult be RegExpBuiltinExec(urlPattern’s hostname
// component's regular expression, hostname). auto hostname_exec_result =
// RegExpBuiltinExec(url_pattern.hostname.get_regexp(), hostname);

// Let portExecResult be RegExpBuiltinExec(urlPattern’s port component's
// regular expression, port). auto port_exec_result =
// RegExpBuiltinExec(url_pattern.port.get_regexp(), port);

// Let pathnameExecResult be RegExpBuiltinExec(urlPattern’s pathname
// component's regular expression, pathname). auto pathname_exec_result =
// RegExpBuiltinExec(url_pattern.pathname.get_regexp(), pathname);

// Let searchExecResult be RegExpBuiltinExec(urlPattern’s search component's
// regular expression, search). auto search_exec_result =
// RegExpBuiltinExec(url_pattern.search.get_regexp(), search);

// Let hashExecResult be RegExpBuiltinExec(urlPattern’s hash component's
// regular expression, hash). auto hash_exec_result =
// RegExpBuiltinExec(url_pattern.hash.get_regexp(), hash);

// If protocolExecResult, usernameExecResult, passwordExecResult,
// hostnameExecResult, portExecResult, pathnameExecResult, searchExecResult,
// or hashExecResult are null then return null. if
// (!protocol_exec_result.has_value() || !username_exec_result.has_value() ||
// !password_exec_result.has_value() || !hostname_exec_result.has_value() ||
// !port_exec_result.has_value() || !pathname_exec_result.has_value() ||
// !search_exec_result.has_value() || !hash_exec_result.has_value()) {
// return tl::unexpected(url_pattern_errors::null);
// }

// Let result be a new URLPatternResult.
auto result = url_pattern_result{};
// Set result["inputs"] to inputs.
// result.inputs = std::move(inputs);
// Set result["protocol"] to the result of creating a component match result
// given urlPattern’s protocol component, protocol, and protocolExecResult.
// result.protocol =
// protocol_component.create_component_match_result(protocol,
// protocol_exec_result.value());

// Set result["username"] to the result of creating a component match result
// given urlPattern’s username component, username, and usernameExecResult.
// result.username =
// username_component.create_component_match_result(username,
// username_exec_result.value());

// Set result["password"] to the result of creating a component match result
// given urlPattern’s password component, password, and passwordExecResult.
// result.password =
// password_component.create_component_match_result(password,
// password_exec_result.value());

// Set result["hostname"] to the result of creating a component match result
// given urlPattern’s hostname component, hostname, and hostnameExecResult.
// result.hostname =
// hostname_component.create_component_match_result(hostname,
// hostname_exec_result.value());

// Set result["port"] to the result of creating a component match result given
// urlPattern’s port component, port, and portExecResult. result.port =
// port_component.create_component_match_result(port,
// port_exec_result.value());

// Set result["pathname"] to the result of creating a component match result
// given urlPattern’s pathname component, pathname, and pathnameExecResult.
// result.pathname =
// pathname_component.create_component_match_result(pathname,
// pathname_exec_result.value());

// Set result["search"] to the result of creating a component match result
// given urlPattern’s search component, search, and searchExecResult.
// result.search = search_component.create_component_match_result(search,
// search_exec_result.value());

// Set result["hash"] to the result of creating a component match result given
// urlPattern’s hash component, hash, and hashExecResult. result.hash =
// hash_component.create_component_match_result(hash,
// hash_exec_result.value());

return result;
}

} // namespace ada