Skip to content

Conversation

mattdangerw
Copy link
Member

Draft of weight conversion script for T5 1.1 weights.

Still need to figure out...

  • A better approach for approx gelu.
  • How to store the weights for the language model output layer.

@dathudeptrai
Copy link

@mattdangerw any update ?

@mattdangerw
Copy link
Member Author

@dathudeptrai thanks for the ping. Should be updates on this shortly!

Basically, we were focused on a 0.5 release (just out the a few days ago), with generation utils and simple decoder models.

But that's done, so now we will be full speed ahead on landing T5 and BART, to seed a seq2seq offering.

@abuelnasr0
Copy link
Contributor

abuelnasr0 commented May 13, 2023

Can I contribute in shipping this model?
There is changes I want to make to the T5Tokenizer. there is no changes in this PR for the tokenizer so there will be no conflicts.

  1. the tokenizer only checks for the pad_token exists. I will make it check pad_token, end_token, unk_token also.
  2. this is written in the code:
# T5 uses the same start token as end token, i.e., "<\s>".

but actually T5 uses pad_token as start token for decoder input. documatation from huggingFace read the decoder_input_ids argument.
3. add extra_ids argument that will add extra ids at the end of the vocabularies for use as sentinels for T5 training.

mattdangerw added a commit to mattdangerw/keras-hub that referenced this pull request Jun 16, 2023
This was noticed on keras-team#900, but we should probably get the fix into the
forward pass without waiting on checkpoints.
@mattdangerw mattdangerw mentioned this pull request Jun 16, 2023
mattdangerw added a commit that referenced this pull request Jun 21, 2023
This was noticed on #900, but we should probably get the fix into the
forward pass without waiting on checkpoints.
@mattdangerw
Copy link
Member Author

This is coming in finally on #1277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants