Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions docs/jobs/_toctree.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
- local: index
title: Hugging Face Jobs

- title: Overview
sections:
- local: index
title: Hugging Face Jobs
- local: quickstart
title: Quickstart
- local: pricing
title: Pricing and Billing

- title: Tutorials
sections:
- title: Training
sections:
- local: training1
title: Training Tuto 1
- title: Inference
sections:
- local: inference1
title: Inference Tuto 1
- title: Data
sections:
- local: data1
title: Data Tuto 1

- title: Guides
sections:
- local: manage
title: Manage Jobs
- local: configuration
title: Configuration
- local: frameworks
title: Frameworks Setups
- local: schedule
title: Schedule Jobs
- local: webhooks
title: Webhook Automation
Empty file added docs/jobs/configuration.md
Empty file.
Empty file added docs/jobs/data1.md
Empty file.
Empty file added docs/jobs/frameworks.md
Empty file.
39 changes: 39 additions & 0 deletions docs/jobs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Hugging Face Jobs

Run compute jobs on Hugging Face infrastructure with a familiar UV & Docker-like interface!

<div class="-mt-3 grid grid-cols-2 rounded-xl border lg:grid-cols-4"><div class="border-r p-4 max-lg:border-b"><h3 class="flex items-center gap-1.5 font-semibold"><svg class="text-green-500 flex-none" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 13 13"><path d="M5.22433 7.95134L3.91933 6.64634C3.80933 6.53634 3.67433 6.48134 3.51433 6.48134C3.35433 6.48134 3.21433 6.54134 3.09433 6.66134C2.98433 6.77134 2.92933 6.91134 2.92933 7.08134C2.92933 7.25134 2.98433 7.39134 3.09433 7.50134L4.80433 9.21134C4.91433 9.32134 5.05433 9.37634 5.22433 9.37634C5.39433 9.37634 5.53433 9.32134 5.64433 9.21134L9.04933 5.80634C9.15933 5.69634 9.21433 5.56134 9.21433 5.40134C9.21433 5.24134 9.15433 5.10134 9.03433 4.98134C8.92433 4.87134 8.78433 4.81634 8.61433 4.81634C8.44433 4.81634 8.30433 4.87134 8.19433 4.98134L5.22433 7.95134ZM6.06433 12.8713C5.23433 12.8713 4.45433 12.7137 3.72433 12.3985C2.99433 12.0837 2.35933 11.6563 1.81933 11.1163C1.27933 10.5763 0.851931 9.94134 0.537131 9.21134C0.221931 8.48134 0.0643311 7.70134 0.0643311 6.87134C0.0643311 6.04134 0.221931 5.26134 0.537131 4.53134C0.851931 3.80134 1.27933 3.16634 1.81933 2.62634C2.35933 2.08634 2.99433 1.65874 3.72433 1.34354C4.45433 1.02874 5.23433 0.871338 6.06433 0.871338C6.89433 0.871338 7.67433 1.02874 8.40433 1.34354C9.13433 1.65874 9.76933 2.08634 10.3093 2.62634C10.8493 3.16634 11.2767 3.80134 11.5915 4.53134C11.9067 5.26134 12.0643 6.04134 12.0643 6.87134C12.0643 7.70134 11.9067 8.48134 11.5915 9.21134C11.2767 9.94134 10.8493 10.5763 10.3093 11.1163C9.76933 11.6563 9.13433 12.0837 8.40433 12.3985C7.67433 12.7137 6.89433 12.8713 6.06433 12.8713Z" fill="currentColor"></path></svg>UV & Docker-like CLI</h3> <p class="font-mono text-xs text-gray-600">uv,run,ps,logs,inspect</p></div> <div class="p-4 dark:border-gray-900 max-lg:border-b lg:border-r"><h3 class="flex items-center gap-1.5 font-semibold"><svg class="text-green-500 flex-none" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 13 13"><path d="M5.22433 7.95134L3.91933 6.64634C3.80933 6.53634 3.67433 6.48134 3.51433 6.48134C3.35433 6.48134 3.21433 6.54134 3.09433 6.66134C2.98433 6.77134 2.92933 6.91134 2.92933 7.08134C2.92933 7.25134 2.98433 7.39134 3.09433 7.50134L4.80433 9.21134C4.91433 9.32134 5.05433 9.37634 5.22433 9.37634C5.39433 9.37634 5.53433 9.32134 5.64433 9.21134L9.04933 5.80634C9.15933 5.69634 9.21433 5.56134 9.21433 5.40134C9.21433 5.24134 9.15433 5.10134 9.03433 4.98134C8.92433 4.87134 8.78433 4.81634 8.61433 4.81634C8.44433 4.81634 8.30433 4.87134 8.19433 4.98134L5.22433 7.95134ZM6.06433 12.8713C5.23433 12.8713 4.45433 12.7137 3.72433 12.3985C2.99433 12.0837 2.35933 11.6563 1.81933 11.1163C1.27933 10.5763 0.851931 9.94134 0.537131 9.21134C0.221931 8.48134 0.0643311 7.70134 0.0643311 6.87134C0.0643311 6.04134 0.221931 5.26134 0.537131 4.53134C0.851931 3.80134 1.27933 3.16634 1.81933 2.62634C2.35933 2.08634 2.99433 1.65874 3.72433 1.34354C4.45433 1.02874 5.23433 0.871338 6.06433 0.871338C6.89433 0.871338 7.67433 1.02874 8.40433 1.34354C9.13433 1.65874 9.76933 2.08634 10.3093 2.62634C10.8493 3.16634 11.2767 3.80134 11.5915 4.53134C11.9067 5.26134 12.0643 6.04134 12.0643 6.87134C12.0643 7.70134 11.9067 8.48134 11.5915 9.21134C11.2767 9.94134 10.8493 10.5763 10.3093 11.1163C9.76933 11.6563 9.13433 12.0837 8.40433 12.3985C7.67433 12.7137 6.89433 12.8713 6.06433 12.8713Z" fill="currentColor"></path></svg>Any Hardware</h3> <p class="text-sm text-gray-600">CPUs to A100s &amp; TPUs</p></div> <div class="border-r p-4"><h3 class="flex items-center gap-1.5 font-semibold"><svg class="text-green-500 flex-none" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 13 13"><path d="M5.22433 7.95134L3.91933 6.64634C3.80933 6.53634 3.67433 6.48134 3.51433 6.48134C3.35433 6.48134 3.21433 6.54134 3.09433 6.66134C2.98433 6.77134 2.92933 6.91134 2.92933 7.08134C2.92933 7.25134 2.98433 7.39134 3.09433 7.50134L4.80433 9.21134C4.91433 9.32134 5.05433 9.37634 5.22433 9.37634C5.39433 9.37634 5.53433 9.32134 5.64433 9.21134L9.04933 5.80634C9.15933 5.69634 9.21433 5.56134 9.21433 5.40134C9.21433 5.24134 9.15433 5.10134 9.03433 4.98134C8.92433 4.87134 8.78433 4.81634 8.61433 4.81634C8.44433 4.81634 8.30433 4.87134 8.19433 4.98134L5.22433 7.95134ZM6.06433 12.8713C5.23433 12.8713 4.45433 12.7137 3.72433 12.3985C2.99433 12.0837 2.35933 11.6563 1.81933 11.1163C1.27933 10.5763 0.851931 9.94134 0.537131 9.21134C0.221931 8.48134 0.0643311 7.70134 0.0643311 6.87134C0.0643311 6.04134 0.221931 5.26134 0.537131 4.53134C0.851931 3.80134 1.27933 3.16634 1.81933 2.62634C2.35933 2.08634 2.99433 1.65874 3.72433 1.34354C4.45433 1.02874 5.23433 0.871338 6.06433 0.871338C6.89433 0.871338 7.67433 1.02874 8.40433 1.34354C9.13433 1.65874 9.76933 2.08634 10.3093 2.62634C10.8493 3.16634 11.2767 3.80134 11.5915 4.53134C11.9067 5.26134 12.0643 6.04134 12.0643 6.87134C12.0643 7.70134 11.9067 8.48134 11.5915 9.21134C11.2767 9.94134 10.8493 10.5763 10.3093 11.1163C9.76933 11.6563 9.13433 12.0837 8.40433 12.3985C7.67433 12.7137 6.89433 12.8713 6.06433 12.8713Z" fill="currentColor"></path></svg>Run Anything</h3> <p class="text-sm text-gray-600">UV, Docker, HF Spaces &amp; more</p></div> <div class="p-4"><h3 class="flex items-center gap-1.5 font-semibold"><svg class="text-green-500 flex-none" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 13 13"><path d="M5.22433 7.95134L3.91933 6.64634C3.80933 6.53634 3.67433 6.48134 3.51433 6.48134C3.35433 6.48134 3.21433 6.54134 3.09433 6.66134C2.98433 6.77134 2.92933 6.91134 2.92933 7.08134C2.92933 7.25134 2.98433 7.39134 3.09433 7.50134L4.80433 9.21134C4.91433 9.32134 5.05433 9.37634 5.22433 9.37634C5.39433 9.37634 5.53433 9.32134 5.64433 9.21134L9.04933 5.80634C9.15933 5.69634 9.21433 5.56134 9.21433 5.40134C9.21433 5.24134 9.15433 5.10134 9.03433 4.98134C8.92433 4.87134 8.78433 4.81634 8.61433 4.81634C8.44433 4.81634 8.30433 4.87134 8.19433 4.98134L5.22433 7.95134ZM6.06433 12.8713C5.23433 12.8713 4.45433 12.7137 3.72433 12.3985C2.99433 12.0837 2.35933 11.6563 1.81933 11.1163C1.27933 10.5763 0.851931 9.94134 0.537131 9.21134C0.221931 8.48134 0.0643311 7.70134 0.0643311 6.87134C0.0643311 6.04134 0.221931 5.26134 0.537131 4.53134C0.851931 3.80134 1.27933 3.16634 1.81933 2.62634C2.35933 2.08634 2.99433 1.65874 3.72433 1.34354C4.45433 1.02874 5.23433 0.871338 6.06433 0.871338C6.89433 0.871338 7.67433 1.02874 8.40433 1.34354C9.13433 1.65874 9.76933 2.08634 10.3093 2.62634C10.8493 3.16634 11.2767 3.80134 11.5915 4.53134C11.9067 5.26134 12.0643 6.04134 12.0643 6.87134C12.0643 7.70134 11.9067 8.48134 11.5915 9.21134C11.2767 9.94134 10.8493 10.5763 10.3093 11.1163C9.76933 11.6563 9.13433 12.0837 8.40433 12.3985C7.67433 12.7137 6.89433 12.8713 6.06433 12.8713Z" fill="currentColor"></path></svg>Pay-as-you-go</h3> <p class="text-sm text-gray-600">Pay only for seconds used</p></div></div>

The Hugging Face Hub provides compute for AI and data workflows via Jobs.

Jobs runs on Hugging Face infrastructure and aim at providing AI builders, Data engineers, developers and AI agents an easy access to cloud infrastructure to run their workloads. They are ideal to fine tune AI models and run inference with GPUs, but also for data ingestion and processing as well.

A job is defined with a command to run (e.g. a UV or python command), a hardware flavor (CPU, GPU, TPU), and optionnally a Docker Image from Hugging Face Spaces or Docker Hub. Many jobs can run in parallel, which is useful e.g. for parameters tuning or parallel inference and data processing.

## Run Jobs from anywhere

There are multiple tools you can use to run jobs:

* the `hf` Command Line Interface (see the [CLI installation steps](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) and the [Jobs CLI documentation](https://huggingface.co/docs/huggingface_hub/guides/cli#hf-jobs) for more information)
* the `huggingface_hub` Python client (see the [`huggingface_hub` Jobs documentation](https://huggingface.co/docs/huggingface_hub/guides/jobs) for more information)
* the Jobs HTTP API (see the [Jobs HTTP API documentation](./http) for more information)

## Run any workload

The `hf` Jobs CLI and the `huggingface_hub` Python client offer a UV-like interface to run Python workloads. UV installs the required Python dependencies and run the Python script in one single command. Python dependencies may also be defined in a self-contained UV script, and in this case there is no need to specify anything but the UV script to run the Job.

```diff
- uv run <script.py>
+ hf jobs uv run <script.py>
```

More generally, Hugging Face Jobs supports any workload based on Docker and a command. Jobs offers a Docker-like interface to rub Jobs, where you can specify a Docker image from Hugging Face Spaces or Docker Hub, as well as the command to run. Docker provides the ability to package ready-to-use environments as Docker images that are shared by the community or custom made. Therefore you may choose or define your Docker image based on what your workloads need (e.g. python, torch, vllm) and run any command. This is more advanced than using UV but provides more flexibility.

```diff
- docker run <image> <command>
+ hf jobs run <image> <command>
```

## Automate Jobs

Trigger Jobs automatically with a schedule or using webhooks. With a schedule, you can run Jobs every X minutes, hours, days, weeks or months. Scheduling Jobs uses the `cron` syntax like `"*/5 * * * *"` for "every 5 minutes", or aliases like `"@hourly"`, `"@daily"`, `"weekly"` or `"@monthly"`. With webhooks, Jobs can run whenever there is an update on Hugging face. For example you can configure webhooks to trigger for every model update under a given account, and retrieve the updated model from the webhook payload in the Job.
Empty file added docs/jobs/inference1.md
Empty file.
Empty file added docs/jobs/manage.md
Empty file.
29 changes: 29 additions & 0 deletions docs/jobs/pricing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Pricing and Billing

Billing on Jobs is based on hardware usage and is computed by the minute: you get charged for every minute the Jobs runs on the requested hardware.

During a Job’s lifecycle, it is only billed when the Job is Starting or Running. This means that there is no cost during build.

If a running Job starts to fail, it will be automatically suspended and the billing will stop.

Jobs have a timeout of 30 minutes by default. You can change this behavior by setting a custom `timeout` when creating the Job. For example in the CLI:

```bash
hf jobs run --timeout 3h ...
```

You can look at your current billing information for Jobs in in your [Billing](https://huggingface.co/settings/billing) page, under the "Compute Usage" section:

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/billing.png"/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/billing-dark.png"/>
</div>

To interrupt the billing on a Job, you can cancel it:

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/cancel-jobs.png"/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/cancel-jobs-dark.png"/>
</div>

Additional information about billing can be found in the dedicated Hub-wide section.
145 changes: 145 additions & 0 deletions docs/jobs/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Quickstart

In this guide you will run a Job to fine-tune an open source model on Hugging Face infrastastructure in only a few minutes. Make sure you are logged in to Hugging Face and have access to your [Jobs page](https://huggingface.co/settings/jobs).

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/jobs-page.png"/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/jobs-page-dark.png"/>
</div>

## Getting started

First install the Hugging Face CLI:

1. Install the CLI

```bash
>>> curl -LsSf https://hf.co/cli/install.sh | bash
```

Install the CLI (using Homebrew)

```bash
>>> brew install huggingface-cli
```

Install the CLI (using uv)

```bash
>>> uv tool install hf
```

2. Login to your Hugging Face account:

Login

```bash
>>> hf auth login
```

3. Create your first jobs using the `hf jobs` command:

Run a UV command or script

```bash
>>> hf jobs uv run python -c 'print("Hello from the cloud!")'
Job started with ID: 693aef401a39f67af5a41c0e
View at: https://huggingface.co/jobs/lhoestq/693aef401a39f67af5a41c0e
Hello from the cloud!
```

```bash
>>> hf jobs uv run path/to/script.py
```

Run a Docker command

```bash
>>> hf jobs run ubuntu echo 'Hello from the cloud!'
Job started with ID: 693aee76c67c9f186cfe233e
View at: https://huggingface.co/jobs/lhoestq/693aee76c67c9f186cfe233e
Hello from the cloud!
```

4. Check your first jobs

The job logs appear in your terminal, but you can also see them in your jobs page. Open the job page to see the job information, status and logs:

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/first-job-page.png"/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/firts-job-page-dark.png"/>
</div>


## The training script

Here is a simple training script to fine-tune a base model to a conversational model using Supervised Fine-Tuning (SFT). It uses the [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) model and the [trl-lib/Capybara](https://huggingface.co/datasets/trl-lib/Capybara) dataset, and the [TRL](https://huggingface.co/docs/trl/en/index) library, and saves the resulting model to your Hugging Face account under the name `"Qwen2.5-0.5B-SFT"`:

```python
from datasets import load_dataset
from trl import SFTTrainer

dataset = load_dataset("trl-lib/Capybara", split="train")
trainer = SFTTrainer(
model="Qwen/Qwen2.5-0.5B",
train_dataset=dataset,
)
trainer.train()
trainer.push_to_hub("Qwen2.5-0.5B-SFT")
```

Save this script as `train.py`, and we can now run it with UV on Hugging Face Jobs.

## Run the training job

`hf jobs` takes several arguments: select the hardware with `--flavor`, and pass environment variable with `--env` and `--secrets`. Here we use the A100 Large GPU flavor with `--flavor a100-large` and pass your Hugging Face token as a secret with `--secrets HF_TOKEN` in order to be able to push the resulting model to your account.

Moreover, UV accepts the `--with` argument to define python dependencies, so we use `--with trl` to have the `trl` library available.

You can now run the final command which looks like this:

```bash
hf jobs uv run \
--flavor a100-large \
--with trl \
--secrets HF_TOKEN \
train.py
```

The logs appear in your terminal, and you can safely Ctrl+C to stop streaming the logs, the job will keep running.

```
...
Downloaded nvidia-cudnn-cu12
Downloaded torch
Installed 66 packages in 233ms
Generating train split: 100%|██████████| 15806/15806 [00:00<00:00, 76686.50 examples/s]
Generating test split: 100%|██████████| 200/200 [00:00<00:00, 43880.36 examples/s]
Tokenizing train dataset: 100%|██████████| 15806/15806 [00:41<00:00, 384.97 examples/s]
Truncating train dataset: 100%|██████████| 15806/15806 [00:00<00:00, 212272.92 examples/s]
The model is already on multiple devices. Skipping the move to device specified in `args`.
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
{'loss': 1.7357, 'grad_norm': 4.8733229637146, 'learning_rate': 1.9969635627530365e-05, 'entropy': 1.7238958358764649, 'num_tokens': 59528.0, 'mean_token_accuracy': 0.6124177813529968, 'epoch': 0.01}
{'loss': 1.6239, 'grad_norm': 6.200186729431152, 'learning_rate': 1.9935897435897437e-05, 'entropy': 1.644005584716797, 'num_tokens': 115219.0, 'mean_token_accuracy': 0.6259662985801697, 'epoch': 0.01}
{'loss': 1.4449, 'grad_norm': 6.167325496673584, 'learning_rate': 1.990215924426451e-05, 'entropy': 1.5156117916107177, 'num_tokens': 171787.0, 'mean_token_accuracy': 0.6586395859718323, 'epoch': 0.02}
{'loss': 1.6023, 'grad_norm': 5.133708953857422, 'learning_rate': 1.986842105263158e-05, 'entropy': 1.6885507702827454, 'num_tokens': 226067.0, 'mean_token_accuracy': 0.6271904468536377, 'epoch': 0.02}
```

Follow the Job advancements on the job page on Hugging Face:


<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/trl-sft-job-page.png"/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/trl-sft-job-page-dark.png"/>
</div>

Once the job is done, find your model on your account:

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/trl-sft-model-page.png"/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/jobs/trl-sft-model-page-dark.png"/>
</div>

Congrats ! You just run your first Job to fine-tune an open source model 🔥

Feel free to try out your model locally and evaluate it using e.g. [tranfomers](https://huggingface.co/docs/transformers) by clicking on "Use this model", or deploy it to [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) in one click using the "Deploy" button.
Empty file added docs/jobs/schedule.md
Empty file.
Empty file added docs/jobs/training1.md
Empty file.
Empty file added docs/jobs/webhooks.md
Empty file.