Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
303a82c
fix
ArthurZucker Oct 3, 2023
cbf179a
Merge branch 'main' of github.com:huggingface/transformers into fix-main
ArthurZucker Oct 3, 2023
01e18db
last attempt
ArthurZucker Oct 3, 2023
08a560a
current work
ArthurZucker Oct 4, 2023
23c9513
fix forward compatibility
ArthurZucker Oct 4, 2023
0ae13ed
save all special tokens
ArthurZucker Oct 5, 2023
d887f68
Merge branch 'fix-main' of github.com:ArthurZucker/transformers into …
ArthurZucker Oct 5, 2023
72ff80e
current state
ArthurZucker Oct 5, 2023
b7b7d13
revert additional changes
ArthurZucker Oct 5, 2023
36d5303
updates
ArthurZucker Oct 5, 2023
ae93856
remove tokenizer.model
ArthurZucker Oct 5, 2023
88ea352
add a test and the fix
ArthurZucker Oct 5, 2023
ca98fbd
nit
ArthurZucker Oct 5, 2023
3c22fbb
revert one more break
ArthurZucker Oct 5, 2023
dc93d5e
fix typefield issue
ArthurZucker Oct 5, 2023
00997e9
quality
ArthurZucker Oct 5, 2023
6143634
more tests
ArthurZucker Oct 5, 2023
907591f
fix fields for FC
ArthurZucker Oct 5, 2023
5df5a83
Merge branch 'fix-main' of github.com:ArthurZucker/transformers into …
ArthurZucker Oct 5, 2023
66ecb9e
Merge branch 'fix-main' of github.com:ArthurZucker/transformers into …
ArthurZucker Oct 5, 2023
0e7bd61
more nits?
ArthurZucker Oct 5, 2023
381a0ec
Merge branch 'fix-main' of github.com:ArthurZucker/transformers into …
ArthurZucker Oct 6, 2023
bf75334
new additional changes
ArthurZucker Oct 6, 2023
fafbbed
how
ArthurZucker Oct 6, 2023
c6de7b2
some updates
ArthurZucker Oct 6, 2023
9a6e750
simplify all
ArthurZucker Oct 7, 2023
8c4ec2c
more nits
ArthurZucker Oct 7, 2023
621ebae
revert some things to original
ArthurZucker Oct 7, 2023
6a6095e
nice
ArthurZucker Oct 7, 2023
e0e5dea
nits
ArthurZucker Oct 7, 2023
92c7754
a small hack
ArthurZucker Oct 7, 2023
9fbbafe
more nits
ArthurZucker Oct 7, 2023
25e2df9
ahhaha
ArthurZucker Oct 7, 2023
2b18cc2
Merge branch 'main' of github.com:huggingface/transformers into fix-main
ArthurZucker Oct 7, 2023
078c94e
fixup
ArthurZucker Oct 7, 2023
ef1e598
update
ArthurZucker Oct 9, 2023
9bf12a8
make test run on ci
ArthurZucker Oct 11, 2023
e6d0381
use subtesting
ArthurZucker Oct 11, 2023
112e4b1
update
ArthurZucker Oct 11, 2023
f794a91
Update .circleci/create_circleci_config.py
ArthurZucker Oct 11, 2023
65aa232
updates
ArthurZucker Oct 11, 2023
8ea095b
Merge branch 'fix-main' of github.com:ArthurZucker/transformers into …
ArthurZucker Oct 11, 2023
efc5e7b
fixup
ArthurZucker Oct 11, 2023
aa569b7
nits
ArthurZucker Oct 11, 2023
5ad55f3
replace typo
ArthurZucker Oct 11, 2023
1c22269
fix the test
ArthurZucker Oct 11, 2023
3b93653
nits
ArthurZucker Oct 11, 2023
a2e977a
Merge branch 'main' of github.com:huggingface/transformers into fix-main
ArthurZucker Oct 11, 2023
1acf2dd
update
ArthurZucker Oct 11, 2023
2dde542
None max dif pls
ArthurZucker Oct 11, 2023
9ebf76e
a partial fix
ArthurZucker Oct 11, 2023
6d2c00e
had to revert one thing
ArthurZucker Oct 11, 2023
e4bcb5e
test the fast
ArthurZucker Oct 11, 2023
3d4bffd
updates
ArthurZucker Oct 11, 2023
8bcb345
fixup
ArthurZucker Oct 11, 2023
d9e5fad
and more nits
ArthurZucker Oct 11, 2023
fc34148
more fixes
ArthurZucker Oct 12, 2023
8389094
update
ArthurZucker Oct 12, 2023
78f1ac4
Oupsy :eye:
ArthurZucker Oct 12, 2023
62eb816
Merge branch 'main' of github.com:huggingface/transformers into fix-main
ArthurZucker Oct 12, 2023
5c1ae9c
nits
ArthurZucker Oct 12, 2023
df8ab6f
fix marian
ArthurZucker Oct 12, 2023
677fcb2
on our way to heaven
ArthurZucker Oct 12, 2023
5a3407e
Update src/transformers/models/t5/tokenization_t5.py
ArthurZucker Oct 12, 2023
856a43d
fixup
ArthurZucker Oct 12, 2023
a3cb498
Update src/transformers/tokenization_utils_fast.py
ArthurZucker Oct 12, 2023
62cf2d0
Update src/transformers/tokenization_utils_base.py
ArthurZucker Oct 12, 2023
fe8bba0
fix phobert
ArthurZucker Oct 13, 2023
be68fc2
skip some things, test more
ArthurZucker Oct 13, 2023
814d978
nits
ArthurZucker Oct 13, 2023
f969713
fixup
ArthurZucker Oct 13, 2023
56b0619
fix deberta
ArthurZucker Oct 13, 2023
f2a5447
update
ArthurZucker Oct 13, 2023
5d7bdab
update
ArthurZucker Oct 13, 2023
49dd8b2
more updates
ArthurZucker Oct 13, 2023
3a03c77
skip one test
ArthurZucker Oct 13, 2023
707a688
more updates
ArthurZucker Oct 13, 2023
bbfc382
fix camembert
ArthurZucker Oct 13, 2023
b6b8aed
can't test this one
ArthurZucker Oct 13, 2023
dac7b89
more good fixes
ArthurZucker Oct 14, 2023
b4ca44e
kind of a major update
ArthurZucker Oct 14, 2023
5245825
fixup
ArthurZucker Oct 14, 2023
0724ebf
more fixups
ArthurZucker Oct 14, 2023
066854a
fix pegasus and mpnet
ArthurZucker Oct 15, 2023
f646ab8
remove skipped tests
ArthurZucker Oct 15, 2023
53e2390
fix phoneme tokenizer if self.verbose
ArthurZucker Oct 15, 2023
e0a967f
fix individual models
ArthurZucker Oct 15, 2023
a353871
update common tests
ArthurZucker Oct 15, 2023
fbc4c4f
update testing files
ArthurZucker Oct 15, 2023
64a6bc4
all over again
ArthurZucker Oct 15, 2023
4219b32
nits
ArthurZucker Oct 15, 2023
48b937a
skip test for markup lm
ArthurZucker Oct 15, 2023
d1a4537
fixups
ArthurZucker Oct 15, 2023
60173aa
fix order of addition in fast by sorting the added tokens decoder
ArthurZucker Oct 16, 2023
8402602
proper defaults for deberta
ArthurZucker Oct 16, 2023
d782bbd
correct default for fnet
ArthurZucker Oct 16, 2023
05ab2c2
nits on add tokens, string initialized to special if special
ArthurZucker Oct 16, 2023
bd6c5a5
skip irrelevant herbert tests
ArthurZucker Oct 16, 2023
8a267d3
main fixes
ArthurZucker Oct 16, 2023
7bda15e
update test added_tokens_serialization
ArthurZucker Oct 16, 2023
ac75cd3
the fix for bart like models and class instanciating
ArthurZucker Oct 16, 2023
640885e
update bart
ArthurZucker Oct 16, 2023
45801c0
nit!
ArthurZucker Oct 16, 2023
14c576f
update idefix test
ArthurZucker Oct 16, 2023
2a78cf9
fix whisper!
ArthurZucker Oct 16, 2023
6f28584
some fixup
ArthurZucker Oct 16, 2023
c12656b
fixups
ArthurZucker Oct 16, 2023
8f8c3f1
revert some of the wrong chanegs
ArthurZucker Oct 16, 2023
de51ef7
fixup
ArthurZucker Oct 16, 2023
0f0a3fe
fixup
ArthurZucker Oct 16, 2023
4b693b9
Merge branch 'main' of github.com:huggingface/transformers into fix-main
ArthurZucker Oct 18, 2023
4b82043
skip marian
ArthurZucker Oct 18, 2023
340df3d
skip the correct tests
ArthurZucker Oct 18, 2023
f9fb43d
skip for tf and flax as well
ArthurZucker Oct 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
fix
  • Loading branch information
ArthurZucker committed Oct 3, 2023
commit 303a82cc97582b47d152cc2afd18625448d51ff6
6 changes: 1 addition & 5 deletions src/transformers/tokenization_utils_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -852,8 +852,6 @@ def __init__(self, verbose=True, **kwargs):
continue
if key in self.SPECIAL_TOKENS_ATTRIBUTES:
if key == "additional_special_tokens":
# TODO THIS IS NASTY! Will always reset tokens to default rstrip and lstrip because self.set_attr on strings
# will not check the addedtokens decoder. WILL FIX TOMORROW
assert isinstance(value, (list, tuple)), f"Value {value} is not a list or tuple"
assert all(
isinstance(t, (str, AddedToken)) for t in value
Expand Down Expand Up @@ -2204,8 +2202,6 @@ def _from_pretrained(
if str(token) in additional_special_tokens:
# at this point the token is in `additional_special_tokens` as an str, let's add the AddedToken info
additional_special_tokens.remove(str(token))
if token.special and token not in additional_special_tokens:
additional_special_tokens.append(token)
else:
raise ValueError(
f"Found a {token.__class__} in the saved `added_tokens_decoder`, should be a dictionary."
Expand Down Expand Up @@ -2438,7 +2434,7 @@ def save_pretrained(

# Sanitize AddedTokens in special_tokens_map

# kept for forward compatibility, will be removed in transoformers 5
# kept for forward compatibility, will be removed in transoformers 5. Adding typefield
write_dict = self.convert_added_tokens(self.special_tokens_map_extended, add_type_field=True)
with open(special_tokens_map_file, "w", encoding="utf-8") as f:
out_str = json.dumps(write_dict, indent=2, sort_keys=True, ensure_ascii=False) + "\n"
Expand Down