Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
164 commits
Select commit Hold shift + click to select a range
80cb760
implement URLPattern skeleton
anonrig Nov 30, 2024
fc4b193
use correct value for clang-format
anonrig Nov 30, 2024
cd9df5e
fix build errors
anonrig Nov 30, 2024
09c3f42
create url_pattern-inl.h
anonrig Nov 30, 2024
6656757
add canonicalize methods
anonrig Dec 4, 2024
686af7d
add ada::parse_url_pattern function
anonrig Dec 4, 2024
3c08805
add more comments
anonrig Dec 4, 2024
66da95c
implement getters
anonrig Dec 4, 2024
8377009
add has_regexp_groups()
anonrig Dec 4, 2024
61f4b67
start implementing tokenizer & tokenize
anonrig Dec 4, 2024
fe3af13
add initial parser_url_pattern method
anonrig Dec 4, 2024
1262f8c
add todos and remove redundant qualifiers
anonrig Dec 4, 2024
20691e3
implement escape pattern
anonrig Dec 4, 2024
eed6d80
add CompileComponentOptions
anonrig Dec 7, 2024
f153e00
minor fixes for add-url-pattern (#800)
lemire Dec 8, 2024
7dbf175
rename commits
anonrig Dec 8, 2024
5f36b46
add more parse_url_pattern
anonrig Dec 9, 2024
4619c9d
rename url_pattern class
anonrig Dec 9, 2024
5cdb6db
complete parse_url_pattern implementation
anonrig Dec 9, 2024
5f28a37
add `_component` suffix to components
anonrig Dec 9, 2024
20d7529
remove unnecessary void
anonrig Dec 9, 2024
16aace3
implement generate regular expression methods
anonrig Dec 10, 2024
14d217c
continue working on parser
anonrig Dec 10, 2024
4a31b3f
fix build error
anonrig Dec 12, 2024
f10d3b2
implement constructor string parser
anonrig Dec 12, 2024
4c20080
implement all of tokenizer's functions
anonrig Dec 12, 2024
74f72fd
fix build errors
anonrig Dec 12, 2024
969c87a
fix warnings
anonrig Dec 12, 2024
6276ce8
complete tokenizer
anonrig Dec 12, 2024
5f02b24
implement escape_regexp_string
anonrig Dec 12, 2024
f55e3f5
implement generate_pattern_string
anonrig Dec 12, 2024
cc73e97
fix compiler warnings
anonrig Dec 12, 2024
ca60161
semi-implement match
anonrig Dec 13, 2024
1a47532
complete one more todo
anonrig Dec 13, 2024
a67fe01
simplify create_component_match_result
anonrig Dec 13, 2024
37dc747
simplify
anonrig Dec 13, 2024
d4d843d
use correct inputs for match/exec/test
anonrig Dec 13, 2024
5c212d7
rename wpt_tests to wpt_url_tests
anonrig Dec 13, 2024
d33f228
add wpt_urlpattern_tests skeleton
anonrig Dec 13, 2024
530deb4
add first test
anonrig Dec 14, 2024
f1e04ce
Build fixes (#801)
lemire Dec 14, 2024
a10ba16
fix 2 bugs
anonrig Dec 14, 2024
b67580d
fix linter issues
anonrig Dec 14, 2024
42d6c32
fix 2 more bugs
anonrig Dec 14, 2024
8d8acb2
more progress on missing features
anonrig Dec 16, 2024
ac0817e
move url_pattern_helpers to separate file
anonrig Dec 16, 2024
8523594
fix build errors
anonrig Dec 16, 2024
096e159
use url_pattern_encoding_callback
anonrig Dec 17, 2024
f711faf
fix url pattern constructor error
anonrig Dec 17, 2024
fc5b020
fix more issues
anonrig Dec 17, 2024
690a14a
add initial version of wpt test runner
anonrig Dec 17, 2024
4f1dc9b
simplify json logic (#802)
lemire Dec 17, 2024
21109d1
add fuzzer
anonrig Dec 17, 2024
8929462
removing the reset
lemire Dec 17, 2024
6759d37
update ada idna
anonrig Dec 18, 2024
e8897d9
use ada idna method for valid name code point
anonrig Dec 18, 2024
4e96bbe
fix add part implementation
anonrig Dec 18, 2024
43c806d
fix invalid access errors
anonrig Dec 18, 2024
029e17f
implement tests correctly
anonrig Dec 18, 2024
c4c373b
improve test runner
anonrig Dec 18, 2024
4b3f34d
add url_pattern_init to_string() method
anonrig Dec 19, 2024
8d4994c
update WPT tests
anonrig Dec 19, 2024
5e6f934
fix last remaining todo
anonrig Dec 19, 2024
71468e2
simplify test runner
anonrig Dec 19, 2024
6a4c9a5
minor fixes
lemire Dec 19, 2024
fd6d1d4
some reworking
lemire Dec 19, 2024
7dca1de
make sure to skip invalid tests
anonrig Dec 19, 2024
6d38085
remove std::ranges::iota due to clang
anonrig Dec 20, 2024
abb2af0
add more fuzzing coverage
anonrig Dec 20, 2024
a0df533
try to fix windows issues
anonrig Dec 20, 2024
aeb4699
remove unnecessary copy
anonrig Dec 20, 2024
1eeab05
start testing the validity of the correct responses
anonrig Dec 20, 2024
208c2ff
fix couple of bugs
anonrig Dec 20, 2024
664ed1c
fix invalid ascii checks
anonrig Dec 20, 2024
60c4015
make pattern generation more verbose
anonrig Dec 20, 2024
5e989f0
fix regex error
anonrig Dec 20, 2024
5539349
remove semicolon due to -Werror,-Wextra-semi
anonrig Dec 20, 2024
04252cd
guarding regex call (#805)
lemire Dec 20, 2024
3eac233
add more logging
anonrig Dec 23, 2024
3f7536c
change ada_idna to char32_t
anonrig Dec 23, 2024
602a565
remove try/catch
anonrig Dec 23, 2024
fc3e76e
make canonicalize_ methods more flexible
anonrig Dec 23, 2024
9407a49
fix change_state
anonrig Dec 23, 2024
6d8e960
fix invalid substr call
anonrig Dec 23, 2024
67fb323
fix generate_pattern_string impl
anonrig Dec 23, 2024
dbd003d
fix more small issues
anonrig Dec 23, 2024
8619179
improve url_pattern_init::process
anonrig Dec 23, 2024
a4f0c42
correctly computing the next code point (#808)
lemire Dec 23, 2024
099fb43
adding checks
lemire Dec 23, 2024
049dd11
use std string view to avoid copy
anonrig Dec 23, 2024
6b29fed
use next_index instead of index
anonrig Dec 23, 2024
61f45be
highlight the error message
anonrig Dec 23, 2024
d2bcf67
better decoding
lemire Dec 23, 2024
e997a28
I think that the test is in error (#810)
lemire Dec 23, 2024
6e96857
remove invalid WPT test data
anonrig Dec 24, 2024
188e171
remove invalid assertion
anonrig Dec 24, 2024
5682bf1
fix ipv6 address canonicalize
anonrig Dec 24, 2024
67f9708
fix canonicalize_ipv6_hostname
anonrig Dec 24, 2024
681bf67
simplify test runner
anonrig Dec 24, 2024
3304dd0
fix test runner
anonrig Dec 24, 2024
40f85e3
add a todo
anonrig Dec 24, 2024
fdb044e
remove invalid test case
anonrig Dec 24, 2024
8ee26f4
add tests for expected object
anonrig Dec 24, 2024
7f4acf2
fix hostname tests
anonrig Dec 25, 2024
505f526
complete match implementation
anonrig Dec 25, 2024
6f284c4
fix empty component tests
anonrig Dec 26, 2024
d928625
revert some wpt changes
anonrig Dec 26, 2024
64c6968
add some optional result logging (#812)
lemire Dec 26, 2024
8090940
lint
lemire Dec 26, 2024
f204a8c
fixing logging
lemire Dec 26, 2024
d7b92eb
removing diagram printout
Dec 27, 2024
fc884cb
fix asan build errors
anonrig Dec 28, 2024
77f44d3
simpler version of the yagiz/add-url-pattern branch (#815)
lemire Dec 28, 2024
ab71fa0
simplify implementation
anonrig Dec 29, 2024
ca66004
improve url_pattern_part emplace_back calls
anonrig Dec 29, 2024
b2d9e70
fix url_pattern_component constructor
anonrig Dec 31, 2024
baeafc6
remove the usage of ada.h inside src
anonrig Dec 31, 2024
487582d
move all helper methods to url_pattern.cpp
anonrig Dec 31, 2024
ffee76c
fix urlpatterntestdata.json
anonrig Dec 31, 2024
0100006
fix build errors
anonrig Dec 31, 2024
757683b
add missing check
anonrig Dec 31, 2024
8dc937e
more tests (#817)
lemire Dec 31, 2024
53ba80f
fix assertion error
anonrig Dec 31, 2024
edbf6c0
don't move function calls
anonrig Dec 31, 2024
5f74dd3
fix token reference asan error
anonrig Dec 31, 2024
a5580c7
another test (#818)
lemire Dec 31, 2024
db7acf9
simplify parser and tests
anonrig Jan 1, 2025
c60c2dc
remove unnecessary duplicate_name method
anonrig Jan 1, 2025
385f554
convert Token to class
anonrig Jan 1, 2025
dab41f6
minor cleanups
anonrig Jan 1, 2025
cf69585
remove invalid std::move
anonrig Jan 1, 2025
bd9655d
simplify parser
anonrig Jan 1, 2025
393f515
remove invalid pathname WPT
anonrig Jan 1, 2025
64f66c6
leave some todos for WPT
anonrig Jan 1, 2025
1f563d4
complete inputs parsing
anonrig Jan 1, 2025
52c33b5
removed duplicated code
anonrig Jan 3, 2025
6ae710b
merge error enums
anonrig Jan 3, 2025
1b59155
fix a boolean operation
anonrig Jan 3, 2025
dd20066
update urlpatterntestdata.json
anonrig Jan 3, 2025
528027c
remove unnecessary assertions
anonrig Jan 3, 2025
65fe0b6
removing GLIBCXX debug
Jan 3, 2025
613d60d
updating macos ci
Jan 3, 2025
943f0aa
indent
Jan 3, 2025
9bb11ad
keeping only static
Jan 3, 2025
5b1de58
improve wpt runner
anonrig Jan 3, 2025
1ec8ea0
fix match
anonrig Jan 3, 2025
c858831
add assertions for object return
anonrig Jan 4, 2025
36a7b72
check __cpp_lib_format
lemire Jan 4, 2025
ff2bf00
adding version header (#824)
lemire Jan 4, 2025
0feb9a6
fix match related bugs
anonrig Jan 5, 2025
57accd5
fix port canonicalize
anonrig Jan 5, 2025
67f9988
fix port setting caused by url parser bug
anonrig Jan 5, 2025
9deaa41
add temporary check for special schemes
anonrig Jan 5, 2025
d47ca13
revert opaque host change
anonrig Jan 6, 2025
14e6c53
fix match when input needs to be parsed
anonrig Jan 6, 2025
6f2838f
fix match hash and search prefix
anonrig Jan 6, 2025
89c8bea
fix internal assertion
anonrig Jan 6, 2025
e3f4fe2
improve wpt test runner
anonrig Jan 6, 2025
8b8d5e6
improve regexp matching
anonrig Jan 7, 2025
a47d8c5
fix wpt testrunner
anonrig Jan 7, 2025
b620b09
fix test implementation
anonrig Jan 7, 2025
36a9097
add half-working match_result
anonrig Jan 7, 2025
87def0a
improve regex matching
anonrig Jan 8, 2025
61728b2
remove invalid WPT test
anonrig Jan 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
implement generate_pattern_string
  • Loading branch information
anonrig committed Jan 8, 2025
commit f55e3f58ddde2992aff3582437958ceeafe45bff
2 changes: 1 addition & 1 deletion include/ada/url_pattern-inl.h
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,7 @@ Tokenizer::process_tokenizing_error(size_t next_position,
}

// @see https://urlpattern.spec.whatwg.org/#is-a-valid-name-code-point
inline bool Tokenizer::is_valid_name_code_point(char cp, bool first) {
inline bool is_valid_name_code_point(char cp, bool first) {
// If first is true return the result of checking if code point is contained
// in the IdentifierStart set of code points. Otherwise return the result of
// checking if code point is contained in the IdentifierPart set of code
Expand Down
8 changes: 4 additions & 4 deletions include/ada/url_pattern.h
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ struct url_pattern_init {

enum class url_pattern_part_type : uint8_t {
// The part represents a simple fixed text string.
FIXED_TEST,
FIXED_TEXT,
// The part represents a matching group with a custom regular expression.
REGEXP,
// The part represents a matching group that matches code points up to the
Expand Down Expand Up @@ -361,9 +361,6 @@ class Tokenizer {
tl::expected<void, url_pattern_errors> process_tokenizing_error(
size_t next_position, size_t value_position);

// @see https://urlpattern.spec.whatwg.org/#is-a-valid-name-code-point
bool is_valid_name_code_point(char code_point, bool first);

// has an associated input, a pattern string, initially the empty string.
std::string input{};
// has an associated policy, a tokenize policy, initially "strict".
Expand Down Expand Up @@ -573,6 +570,9 @@ std::string convert_modifier_to_string(url_pattern_part_modifier modifier);
std::string generate_segment_wildcard_regexp(
url_pattern_compile_component_options options);

// @see https://urlpattern.spec.whatwg.org/#is-a-valid-name-code-point
bool is_valid_name_code_point(char code_point, bool first);

} // namespace url_pattern_helpers

} // namespace ada
Expand Down
189 changes: 182 additions & 7 deletions src/url_pattern.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#include "ada.h"

#include <optional>
#include <ranges>
#include <regex>
#include <string>

Expand Down Expand Up @@ -926,7 +927,7 @@ std::vector<Token> tokenize(std::string_view input, token_policy policy) {
bool first_code_point = name_position == name_start;
// Let valid code point be the result of running is a valid name code
// point given tokenizer’s code point and first code point.
auto valid_code_point = tokenizer.is_valid_name_code_point(
auto valid_code_point = is_valid_name_code_point(
tokenizer.code_point.at(0), first_code_point);
// If valid code point is false break.
if (!valid_code_point) break;
Expand Down Expand Up @@ -1154,7 +1155,7 @@ std::string escape_regexp_string(std::string_view input) {
for (const auto& c : input) {
// TODO: Optimize this even further
if (should_escape_regexp_char(c)) {
result.append("\\" + c);
result.append(std::string("\\") + c);
} else {
result.push_back(c);
}
Expand Down Expand Up @@ -1208,10 +1209,184 @@ std::vector<url_pattern_part> parse_pattern_string(
std::string generate_pattern_string(
std::vector<url_pattern_part>& part_list,
url_pattern_compile_component_options& options) {
(void)part_list;
(void)options;
// TODO: Implement this
return {};
// Let result be the empty string.
std::string result{};
// Let index list be the result of getting the indices for part list.
// For each index of index list:
for (size_t index : std::views::iota(size_t{0}, part_list.size())) {
// Let part be part list[index].
auto part = part_list[index];
// Let previous part be part list[index - 1] if index is greater than 0,
// otherwise let it be null.
// TODO: Optimization opportunity. Find a way to avoid making a copy here.
std::optional<url_pattern_part> previous_part =
index == 0 ? std::nullopt : std::optional(part_list.at(index - 1));
// Let next part be part list[index + 1] if index is less than index list’s
// size - 1, otherwise let it be null.
std::optional<url_pattern_part> next_part =
index < part_list.size() - 1 ? std::optional(part_list.at(index + 1))
: std::nullopt;
// If part’s type is "fixed-text" then:
if (part.type == url_pattern_part_type::FIXED_TEXT) {
// If part’s modifier is "none" then:
if (part.modifier == url_pattern_part_modifier::NONE) {
// Append the result of running escape a pattern string given part’s
// value to the end of result.
result.append(escape_pattern(part.value));
continue;
}
// Append "{" to the end of result.
result += "{";
// Append the result of running escape a pattern string given part’s value
// to the end of result.
result.append(escape_pattern(part.value));
// Append "}" to the end of result.
result += "}";
// Append the result of running convert a modifier to a string given
// part’s modifier to the end of result.
result.append(convert_modifier_to_string(part.modifier));
continue;
}
// Let custom name be true if part’s name[0] is not an ASCII digit;
// otherwise false.
// TODO: Optimization opportunity: Find a way to directly check
// is_ascii_digit.
bool custom_name = idna::is_ascii(std::string_view(part.name.data(), 1));
// Let needs grouping be true if at least one of the following are true,
// otherwise let it be false:
// - part’s suffix is not the empty string.
// - part’s prefix is not the empty string and is not options’s prefix code
// point.
// TODO: part.prefix is a string, but options.prefix is a char. Which one is
// true?
bool needs_grouping =
!part.suffix.empty() ||
(!part.prefix.empty() && part.prefix[0] != options.prefix);

// If all of the following are true:
// - needs grouping is false; and
// - custom name is true; and
// - part’s type is "segment-wildcard"; and
// - part’s modifier is "none"; and
// - next part is not null; and
// - next part’s prefix is the empty string; and
// - next part’s suffix is the empty string
if (!needs_grouping && custom_name &&
part.type == url_pattern_part_type::SEGMENT_WILDCARD &&
part.modifier == url_pattern_part_modifier::NONE &&
next_part.has_value() && next_part->prefix.empty() &&
next_part->suffix.empty()) {
// If next part’s type is "fixed-text":
if (next_part->type == url_pattern_part_type::FIXED_TEXT) {
// Set needs grouping to true if the result of running is a valid name
// code point given next part’s value's first code point and the boolean
// false is true.
// TODO: Implement this.
} else {
// Set needs grouping to true if next part’s name[0] is an ASCII digit.
needs_grouping =
idna::is_ascii(std::string_view(next_part->name.data(), 1));
}
}

// If all of the following are true:
// - needs grouping is false; and
// - part’s prefix is the empty string; and
// - previous part is not null; and
// - previous part’s type is "fixed-text"; and
// - previous part’s value's last code point is options’s prefix code point.
// then set needs grouping to true.
if (!needs_grouping && part.prefix.empty() && previous_part.has_value() &&
previous_part->type == url_pattern_part_type::FIXED_TEXT &&
previous_part->value.at(previous_part->value.size() - 1) ==
options.prefix.value()) {
needs_grouping = true;
}

// Assert: part’s name is not the empty string or null.
ADA_ASSERT_TRUE(!part.name.empty());

// If needs grouping is true, then append "{" to the end of result.
if (needs_grouping) {
result.append("{");
}

// Append the result of running escape a pattern string given part’s prefix
// to the end of result.
result.append(escape_pattern(part.prefix));

// If custom name is true:
if (custom_name) {
// Append ":" to the end of result.
result.append(":");
// Append part’s name to the end of result.
result.append(part.name);
}

// If part’s type is "regexp" then:
if (part.type == url_pattern_part_type::REGEXP) {
// Append "(" to the end of result.
result.append("(");
// Append part’s value to the end of result.
result.append(part.value);
// Append ")" to the end of result.
result.append(")");
} else if (part.type == url_pattern_part_type::SEGMENT_WILDCARD) {
// Otherwise if part’s type is "segment-wildcard" and custom name is
// false: Append "(" to the end of result.
result.append("(");
// Append the result of running generate a segment wildcard regexp given
// options to the end of result.
result.append(generate_segment_wildcard_regexp(options));
// Append ")" to the end of result.
result.append(")");
} else if (part.type == url_pattern_part_type::FULL_WILDCARD) {
// Otherwise if part’s type is "full-wildcard":
// If custom name is false and one of the following is true:
// - previous part is null; or
// - previous part’s type is "fixed-text"; or
// - previous part’s modifier is not "none"; or
// - needs grouping is true; or
// - part’s prefix is not the empty string
// - then append "*" to the end of result.
if (!custom_name &&
(!previous_part.has_value() ||
previous_part->type == url_pattern_part_type::FIXED_TEXT ||
previous_part->modifier != url_pattern_part_modifier::NONE ||
needs_grouping || !part.prefix.empty())) {
result.append("*");
} else {
// Append "(" to the end of result.
// Append full wildcard regexp value to the end of result.
// Append ")" to the end of result.
result.append("(.*)");
}
}

// If all of the following are true:
// - part’s type is "segment-wildcard"; and
// - custom name is true; and
// - part’s suffix is not the empty string; and
// - The result of running is a valid name code point given part’s suffix's
// first code point and the boolean false is true then append U+005C (\) to
// the end of result.
if (part.type == url_pattern_part_type::SEGMENT_WILDCARD && custom_name &&
!part.suffix.empty() &&
is_valid_name_code_point(part.suffix[0], true)) {
result.append("\\");
}

// Append the result of running escape a pattern string given part’s suffix
// to the end of result.
result.append(escape_pattern(part.suffix));
// If needs grouping is true, then append "}" to the end of result.
if (needs_grouping) result.append("}");
// Append the result of running convert a modifier to a string given part’s
// modifier to the end of result.
result.append(convert_modifier_to_string(part.modifier));
}
// Return result.
return result;
}

} // namespace url_pattern_helpers
Expand Down Expand Up @@ -1275,7 +1450,7 @@ generate_regular_expression_and_name_list(
// For each part of part list:
for (const url_pattern_part& part : part_list) {
// If part's type is "fixed-text":
if (part.type == url_pattern_part_type::FIXED_TEST) {
if (part.type == url_pattern_part_type::FIXED_TEXT) {
// If part's modifier is "none"
if (part.modifier == url_pattern_part_modifier::NONE) {
// Append the result of running escape a regexp string given part's
Expand Down