ElectraTokenizer #1357

pranavvp16 · 2023-12-08T05:18:59Z

I have added the electratokenizer . notebook demonstrating that tokens match

mattdangerw · 2023-12-09T00:39:31Z

Please check the tests! Looks like some legitimate failures.

mattdangerw · 2023-12-09T00:40:00Z

I'll be out next week, so tagged @tirthasheshpatel and @nkovela1 to take a look.

tirthasheshpatel · 2023-12-15T04:10:20Z

keras_nlp/models/electra/electra_tokenizer.py

+
+
+@keras_nlp_export("keras_nlp.models.ElectraTokenizer")
+class ElectraTokenizer(WordPieceTokenizer):


Since we are exporting this class, can you add an example on how to use this? Something like:

vocab = ["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"] vocab += ["The", "quick", "brown", "fox", "jumped", "."] # Instantiate the tokenizer. tokenizer = keras_nlp.models.ElectraTokenizer(vocabulary=vocab) # Unbatched input. tokenizer("The quick brown fox jumped.") # Batched input. tokenizer(["The quick brown fox jumped.", "The fox slept."]) # Detokenization. tokenizer.detokenize(tokenizer("The quick brown fox jumped."))

tirthasheshpatel

LGTM except one minor comment. Thanks for the PR @pranavvp16!

tirthasheshpatel · 2023-12-16T06:34:49Z

Merged, thanks @pranavvp16!

pranavvp16 and others added 10 commits October 29, 2023 19:17

Added ElectraBackbone

f812c39

Merge branch 'keras-team:master' into electra

879020a

Added backbone tests for ELECTRA

c2aa9bd

Fix config

79df89f

Add model import to __init__

7bc3697

add electra tokenizer

b7bcfcf

add tests for tokenizer

8d9dd15

add __init__ file

273075a

add tokenizer and backbone to models __init__

bfbf648

Merge branch 'master' into electra

a79deb1

mattdangerw requested review from nkovela1 and tirthasheshpatel December 9, 2023 00:39

pranavvp16 added 2 commits December 9, 2023 12:15

Fix Failing tokenization test

538d938

Merge remote-tracking branch 'origin/electra' into electra

eb8baa5

tirthasheshpatel added the kokoro:force-run Runs Tests on GPU label Dec 15, 2023

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Dec 15, 2023

tirthasheshpatel reviewed Dec 15, 2023

View reviewed changes

tirthasheshpatel approved these changes Dec 15, 2023

View reviewed changes

pranavvp16 and others added 2 commits December 16, 2023 11:05

Merge branch 'keras-team:master' into electra

b3f81d5

Add example on usage of the tokenizer with custom vocabulary

47c9119

tirthasheshpatel merged commit f78276f into keras-team:master Dec 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ElectraTokenizer #1357

ElectraTokenizer #1357

Uh oh!

pranavvp16 commented Dec 8, 2023

Uh oh!

mattdangerw commented Dec 9, 2023

Uh oh!

mattdangerw commented Dec 9, 2023

Uh oh!

tirthasheshpatel Dec 15, 2023

Uh oh!

tirthasheshpatel left a comment

Uh oh!

tirthasheshpatel commented Dec 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		@keras_nlp_export("keras_nlp.models.ElectraTokenizer")
		class ElectraTokenizer(WordPieceTokenizer):

ElectraTokenizer #1357

ElectraTokenizer #1357

Uh oh!

Conversation

pranavvp16 commented Dec 8, 2023

Uh oh!

mattdangerw commented Dec 9, 2023

Uh oh!

mattdangerw commented Dec 9, 2023

Uh oh!

tirthasheshpatel Dec 15, 2023

Choose a reason for hiding this comment

Uh oh!

tirthasheshpatel left a comment

Choose a reason for hiding this comment

Uh oh!

tirthasheshpatel commented Dec 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants