Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
mobilebert model card update
  • Loading branch information
Reshan Gomis authored and stevhliu committed Apr 4, 2025
commit 883564613e4bd77deb15dbeadb218181285b4766
97 changes: 70 additions & 27 deletions docs/source/en/model_doc/mobilebert.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,43 +14,86 @@ rendered properly in your Markdown viewer.

-->

# MobileBERT

<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
</div>
</div>

## Overview
# MobileBERT

The MobileBERT model was proposed in [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny
Zhou. It's a bidirectional transformer based on the BERT model, which is compressed and accelerated using several
[MobileBERT](https://github.com/google-research/google-research/tree/master/mobilebert) is a lightweight and efficient variant of BERT, specifically designed for resource-limited devices such as mobile phones. It retains the bidirectional transformer architecture of BERT but significantly reduces model size and inference latency while maintaining strong performance on NLP tasks.
The MobileBERT model was proposed in [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984). It's a bidirectional transformer based on the BERT model, which is compressed and accelerated using several
approaches.

The abstract from the paper is the following:
> [!TIP]
> Click on the models in the right sidebar for more examples of how to apply Gemma to different vision and language tasks.

*Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds
of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot
be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing and accelerating
the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied to
various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while
equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks.
To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated BERT_LARGE
model. Then, we conduct knowledge transfer from this teacher to MobileBERT. Empirical studies show that MobileBERT is
4.3x smaller and 5.5x faster than BERT_BASE while achieving competitive results on well-known benchmarks. On the
natural language inference tasks of GLUE, MobileBERT achieves a GLUEscore o 77.7 (0.6 lower than BERT_BASE), and 62 ms
latency on a Pixel 4 phone. On the SQuAD v1.1/v2.0 question answering task, MobileBERT achieves a dev F1 score of
90.0/79.2 (1.5/2.1 higher than BERT_BASE).*
## Usage tips

This model was contributed by [vshampor](https://huggingface.co/vshampor). The original code can be found [here](https://github.com/google-research/google-research/tree/master/mobilebert).
- 4.3× smaller and 5.5× faster than BERT_BASE.
- MobileBERT is a thin version of BERT_LARGE, featuring bottleneck layers that reduce computational complexity while preserving model expressiveness.
- The model is deeper than BERT_BASE but with narrower layers, making it more parameter-efficient.
- Achieves competitive performance on standard NLP benchmarks
* GLUE score: 77.7 (only 0.6 lower than BERT_BASE).
* SQuAD v1.1/v2.0: F1 scores of 90.0/79.2 (outperforming BERT_BASE by 1.5/2.1 points).

## Usage tips

- MobileBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather
than the left.
- MobileBERT is similar to BERT and therefore relies on the masked language modeling (MLM) objective. It is therefore
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained
with a causal language modeling (CLM) objective are better in that regard.
The example below demonstrates how to generate text with [`Pipeline`] or the [`AutoModel`], and from the command line.

<hfoptions id="usage">
<hfoption id="Pipeline">

```py
from transformers import pipeline

# Load MobileBERT for masked token prediction
mask_filler = pipeline("fill-mask", model="google/mobilebert-uncased")

# Example: Predict the masked word
results = mask_filler("The capital of France is [MASK].")
print(results[0]["sequence"]) # Output: "The capital of France is paris."
```
</hfoption>
<hfoption id="AutoModel">

```py
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/mobilebert-uncased")
model = AutoModelForMaskedLM.from_pretrained("google/mobilebert-uncased")

# Example input with a [MASK] token
text = "I want to [MASK] a coffee."
inputs = tokenizer(text, return_tensors="pt")

# Find the position of the [MASK] token
mask_token_index = torch.where(inputs["input_ids"][0] == tokenizer.mask_token_id)[0]

# Predict the masked token
outputs = model(**inputs)
logits = outputs.logits
mask_logits = logits[0, mask_token_index, :]
predicted_token_id = torch.argmax(mask_logits, dim=-1)

# Decode the prediction
predicted_token = tokenizer.decode(predicted_token_id)
print(f"Predicted word: {predicted_token}") # Output: "buy" or "order"
```

</hfoption>
<hfoption id="transformers-cli">

```bash
python -c "from transformers import pipeline; print(pipeline('fill-mask', model='google/mobilebert-uncased')('Artificial intelligence will [MASK] the world.')[0]['sequence'])"
```

</hfoption>
</hfoptions>


## Resources
Expand Down