Add Model Card for Hugging Face Upload #1578

SamanehSaadat · 2024-04-17T20:36:30Z

This PR adds a basic model card for Hugging Face upload, which was discussed in #1529.

Text classification example:

Text generation model example:

SamanehSaadat · 2024-04-17T21:57:15Z

Hi @Wauplin !

I'm not sure why I can't add you as a reviewer here but please let me know if you have any feedback on this PR.

fchollet · 2024-04-17T22:24:57Z

keras_nlp/utils/preset_utils.py

+        if config["class_name"].endswith("Backbone")
+        else config["class_name"]
+    )
+    markdown_content += f"This is a `{model_name}` model and has been uploaded using the KerasNLP library.\n"


remove "and has been"

fchollet · 2024-04-17T22:25:18Z

keras_nlp/utils/preset_utils.py

+        if config["class_name"].endswith("Backbone")
+        else config["class_name"]
+    )
+    markdown_content += f"This is a `{model_name}` model and has been uploaded using the KerasNLP library.\n"


You may want to add a link to the relevant keras.io docs page for the model.

Added a link to model pages like this.

fchollet · 2024-04-17T22:25:40Z

keras_nlp/utils/preset_utils.py

+    markdown_content += "\n"
+    markdown_content += (
+        "This model card has been generated automatically and should be completed "
+        "by the model author. See https://huggingface.co/docs/hub/model-cards.\n"


Should this be a markdown link instead?

Changed it to markdown link.

fchollet · 2024-04-18T01:37:36Z

keras_nlp/utils/preset_utils.py

+        if config["class_name"].endswith("Backbone")
+        else config["class_name"]
+    )
+    model_link = f"https://keras.io/api/keras_nlp/models/{model_name.lower()}"


Is model_name.lower() always correct? Can you try it on all Model instances found in the namespace?

Right! It doesn't work for some models!
I was trying to figure out if I can programatically convert the model name to model path, e.g. converting CamelCase to snake_case, but things like XLMRoberta -> xlm_roberta makes it complicated! I can create a manual mapping in a dictionary from model to path based on this which seems to be our source of truth! But then kerasNLP and keras.io won't be automatically synced if we change one!

Fixed this issue. I believe it should now work for all models.

Wauplin

Thanks @SamanehSaadat! Left a couple comments otherwise the model card already looks great!

keras_nlp/utils/preset_utils.py

Wauplin · 2024-04-18T07:47:18Z

keras_nlp/utils/preset_utils.py

+            else task_config["class_name"]
+        )
+        markdown_content += (
+            f"This model is related to a `{task_type}` task.\n\n"


Is there any chance we can bind this task_type to a task from https://huggingface.co/tasks?

If yes, that would be awesome to also set it as a pipeline_tag in the yaml part of the model card. This way the models would be recognized as such and therefore searchable on the Hub (for instance on https://huggingface.co/models?pipeline_tag=text-classification). Even if we can't assign a pipeline_tag in every cases (because of uncertainty), having it for a subset of the models would already be nice. If some models support multiple tasks, you can set the main one as pipeline_tag and then secondary ones listed as tags.

You can list all supported tasks like this:

curl -s https://huggingface.co/api/tasks | jq -r 'keys[]'

which outputs:

audio-classification audio-to-audio automatic-speech-recognition depth-estimation document-question-answering feature-extraction fill-mask image-classification image-feature-extraction image-segmentation image-to-3d image-to-image image-to-text mask-generation object-detection question-answering reinforcement-learning sentence-similarity summarization table-question-answering tabular-classification tabular-regression text-classification text-generation text-to-3d text-to-image text-to-speech text-to-video token-classification translation unconditional-image-generation video-classification visual-question-answering zero-shot-classification zero-shot-image-classification zero-shot-object-detection

If not possible, then let's keep it as it is now.

We only capture the high-level task type in our configs, i.e. classification vs. generation. If a user picks a text generation model and trains it on a text summarization dataset to make it a text summarization model, I'm not sure if they can record that anywhere. If the high-level task type sounds good to you, I can add that.

Added high-level task as pipeline_tag.

Awesome! Having text-generation or text-classification is already a very nice thing. If the user retrains on a summarization dataset, then it is their responsibility to update the model card correctly (in my opinion).

Wauplin · 2024-04-18T07:55:26Z

keras_nlp/utils/preset_utils.py

                "Please install with `pip install huggingface_hub`."
            )
        hf_handle = uri.removeprefix(HF_PREFIX)
+        create_model_card(preset)


With current implementation, if the create_repo call fails (e.g. if user is not logged in) then the local model card is not deleted. To be sure the model card is correctly deleted, I would move the create_model_card call just before upload_folder and encapsulate it in a try/except.

So doing something like this:

try: huggingface_hub.upload_folder( repo_id=repo_url.repo_id, folder_path=preset ) finally: # Clean up the preset directory in case user attempts to upload the # preset directory into Kaggle hub as well. delete_model_card(preset)

You're right! Thanks for catching that! Done!

Wauplin · 2024-04-18T07:56:09Z

keras_nlp/utils/preset_utils.py

+    markdown_content += "\n"
+    markdown_content += (
+        "This model card has been generated automatically and should be completed "
+        "by the model author. See [Model Cards documentation]"


Nice one nudging the author to complete the card :)

SamanehSaadat

Thanks for the review, @Wauplin!

keras_nlp/utils/preset_utils.py

SamanehSaadat · 2024-04-18T22:50:36Z

keras_nlp/utils/preset_utils.py

                "Please install with `pip install huggingface_hub`."
            )
        hf_handle = uri.removeprefix(HF_PREFIX)
+        create_model_card(preset)


You're right! Thanks for catching that! Done!

fchollet

LGTM, thanks!

Wauplin · 2024-04-19T07:11:25Z

keras_nlp/utils/preset_utils.py

+        try:
+            huggingface_hub.upload_folder(
+                repo_id=repo_url.repo_id, folder_path=preset
+            )
+        finally:
+            # Clean up the preset directory in case user attempts to upload the
+            # preset directory into Kaggle hub as well.
+            delete_model_card(preset)


Sorry, last thing I forgot to mention. If the user push a model to an existing repo, we don't want to generate a model card as it might overwrite content that the user has manually written already. So I would implement a logic like that:

create the repo with exists_ok=True

check if file_exists

generate model card (if doesn't exist)

upload folder

delete local model card (if generated in 3.)

Suggested change

try:

huggingface_hub.upload_folder(

repo_id=repo_url.repo_id, folder_path=preset

)

finally:

# Clean up the preset directory in case user attempts to upload the

# preset directory into Kaggle hub as well.

delete_model_card(preset)

has_model_card = huggingface_hub.file_exists(repo_id=repo_url.repo_id, filename=README_FILE)

if not has_model_card:

# Remote repo do not contain a model card => create one

create_model_card(preset)

try:

huggingface_hub.upload_folder(

repo_id=repo_url.repo_id, folder_path=preset

)

finally:

if not requires_model_card:

# Clean up the preset directory in case user attempts to upload the

# preset directory into Kaggle hub as well.

delete_model_card(preset)

(and do not forget to remove the create_model_card call above L449)

Good point! Updated the code to only create mode card if it doesn't exist.

Wauplin

Looks good to me! Thanks for iterating on it @SamanehSaadat 🤗

SamanehSaadat · 2024-04-22T16:36:34Z

Thanks for the review and feedback @Wauplin!

SamanehSaadat added 4 commits April 17, 2024 20:11

Add model card for Hugging Face upload.

3891a19

Add task type.

10c4559

Add README_FILE constant.

f44669e

Improve error handling for delete_model_card.

d88e93b

SamanehSaadat requested a review from fchollet April 17, 2024 20:36

fchollet reviewed Apr 17, 2024

View reviewed changes

Address reviews.

d2be154

fchollet reviewed Apr 18, 2024

View reviewed changes

Wauplin reviewed Apr 18, 2024

View reviewed changes

SamanehSaadat force-pushed the hf-model-card branch from 55f8ea1 to d2be154 Compare April 18, 2024 22:33

Address reviews.

452bb02

SamanehSaadat commented Apr 18, 2024

View reviewed changes

SamanehSaadat added 2 commits April 18, 2024 23:17

Add pipeline_tag for tasks.

71f2718

Fix model name to model link conversion.

3a0effe

fchollet approved these changes Apr 19, 2024

View reviewed changes

Wauplin reviewed Apr 19, 2024

View reviewed changes

Don't create model card if it already exists.

7840dfa

Wauplin approved these changes Apr 20, 2024

View reviewed changes

SamanehSaadat merged commit aec862a into keras-team:master Apr 22, 2024

Wauplin mentioned this pull request Apr 24, 2024

[RfC] Ideas for better Hugging Face Hub integration #1529

Closed

Add Model Card for Hugging Face Upload #1578

Add Model Card for Hugging Face Upload #1578

Uh oh!

Conversation

SamanehSaadat commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamanehSaadat commented Apr 17, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamanehSaadat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fchollet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

SamanehSaadat commented Apr 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SamanehSaadat commented Apr 17, 2024 •

edited

Loading