You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Instruction tuning |[`SFTTrainer`]| Fine-tuning Google Gemma LLMs using ChatML format with QLoRA |[Philipp Schmid](https://huggingface.co/philschmid)|[Link](https://www.philschmid.de/fine-tune-google-gemma)|[](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/gemma-lora-example.ipynb)|
10
-
| Structured Generation |[`SFTTrainer`]| Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFT |[Mohammadreza Esmaeilian](https://huggingface.co/Mohammadreza)|[Link](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format)|[](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format.ipynb)|
11
-
| Preference Optimization |[`DPOTrainer`]| Align Mistral-7b using Direct Preference Optimization for human preference alignment |[Maxime Labonne](https://huggingface.co/mlabonne)|[Link](https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html)|[](https://colab.research.google.com/github/mlabonne/llm-course/blob/main/Fine_tune_a_Mistral_7b_model_with_DPO.ipynb)|
12
-
| Preference Optimization |[`ORPOTrainer`]| Fine-tuning Llama 3 with ORPO combining instruction tuning and preference alignment |[Maxime Labonne](https://huggingface.co/mlabonne)|[Link](https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html)|[](https://colab.research.google.com/drive/1eHNWg9gnaXErdAa8_mcvjMupbSS6rDvi)|
| Reinforcement Learning |[`GRPOTrainer`]| Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial |[Philipp Schmid](https://huggingface.co/philschmid)|[Link](https://www.philschmid.de/mini-deepseek-r1)|[](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/mini-deepseek-r1-aha-grpo.ipynb)|
10
+
| Instruction tuning |[`SFTTrainer`]| Fine-tuning Google Gemma LLMs using ChatML format with QLoRA |[Philipp Schmid](https://huggingface.co/philschmid)|[Link](https://www.philschmid.de/fine-tune-google-gemma)|[](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/gemma-lora-example.ipynb)|
11
+
| Structured Generation |[`SFTTrainer`]| Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFT |[Mohammadreza Esmaeilian](https://huggingface.co/Mohammadreza)|[Link](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format)|[](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format.ipynb)|
12
+
| Preference Optimization |[`DPOTrainer`]| Align Mistral-7b using Direct Preference Optimization for human preference alignment |[Maxime Labonne](https://huggingface.co/mlabonne)|[Link](https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html)|[](https://colab.research.google.com/github/mlabonne/llm-course/blob/main/Fine_tune_a_Mistral_7b_model_with_DPO.ipynb)|
13
+
| Preference Optimization |[`ORPOTrainer`]| Fine-tuning Llama 3 with ORPO combining instruction tuning and preference alignment |[Maxime Labonne](https://huggingface.co/mlabonne)|[Link](https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html)|[](https://colab.research.google.com/drive/1eHNWg9gnaXErdAa8_mcvjMupbSS6rDvi)|
13
14
| Instruction tuning |[`SFTTrainer`]| How to fine-tune open LLMs in 2025 with Hugging Face |[Philipp Schmid](https://huggingface.co/philschmid)|[Link](https://www.philschmid.de/fine-tune-llms-in-2025)|[](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fine-tune-llms-in-2025.ipynb)|
| Visual QA |[`SFTTrainer`]| Fine-tuning Qwen2-VL-7B for visual question answering on ChartQA dataset |[Sergio Paniego](https://huggingface.co/sergiopaniego)|[Link](https://huggingface.co/learn/cookbook/fine_tuning_vlm_trl)|[](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_vlm_trl.ipynb)|
22
-
| Visual QA |[`SFTTrainer`]| Fine-tuning SmolVLM with TRL on a consumer GPU |[Sergio Paniego](https://huggingface.co/sergiopaniego)|[Link](https://huggingface.co/learn/cookbook/fine_tuning_smol_vlm_sft_trl)|[](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_smol_vlm_sft_trl.ipynb)|
23
-
| SEO Description |[`SFTTrainer`]| Fine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from images |[Philipp Schmid](https://huggingface.co/philschmid)|[Link](https://www.philschmid.de/fine-tune-multimodal-llms-with-trl)|[](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fine-tune-multimodal-llms-with-trl.ipynb)|
24
-
| Visual QA |[`DPOTrainer`]| PaliGemma 🤝 Direct Preference Optimization |[Merve Noyan](https://huggingface.co/merve)|[Link](https://github.com/merveenoyan/smol-vision/blob/main/PaliGemma_DPO.ipynb)|[](https://colab.research.google.com/github/merveenoyan/smol-vision/blob/main/PaliGemma_DPO.ipynb)|
25
-
| Visual QA |[`DPOTrainer`]| Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPU |[Sergio Paniego](https://huggingface.co/sergiopaniego)|[Link](https://huggingface.co/learn/cookbook/fine_tuning_vlm_dpo_smolvlm_instruct)|[](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_vlm_dpo_smolvlm_instruct.ipynb)|
| Visual QA |[`SFTTrainer`]| Fine-tuning Qwen2-VL-7B for visual question answering on ChartQA dataset |[Sergio Paniego](https://huggingface.co/sergiopaniego)|[Link](https://huggingface.co/learn/cookbook/fine_tuning_vlm_trl)|[](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_vlm_trl.ipynb)|
23
+
| Visual QA |[`SFTTrainer`]| Fine-tuning SmolVLM with TRL on a consumer GPU |[Sergio Paniego](https://huggingface.co/sergiopaniego)|[Link](https://huggingface.co/learn/cookbook/fine_tuning_smol_vlm_sft_trl)|[](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_smol_vlm_sft_trl.ipynb)|
24
+
| SEO Description |[`SFTTrainer`]| Fine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from images |[Philipp Schmid](https://huggingface.co/philschmid)|[Link](https://www.philschmid.de/fine-tune-multimodal-llms-with-trl)|[](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fine-tune-multimodal-llms-with-trl.ipynb)|
25
+
| Visual QA |[`DPOTrainer`]| PaliGemma 🤝 Direct Preference Optimization |[Merve Noyan](https://huggingface.co/merve)|[Link](https://github.com/merveenoyan/smol-vision/blob/main/PaliGemma_DPO.ipynb)|[](https://colab.research.google.com/github/merveenoyan/smol-vision/blob/main/PaliGemma_DPO.ipynb)|
26
+
| Visual QA |[`DPOTrainer`]| Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPU |[Sergio Paniego](https://huggingface.co/sergiopaniego)|[Link](https://huggingface.co/learn/cookbook/fine_tuning_vlm_dpo_smolvlm_instruct)|[](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_vlm_dpo_smolvlm_instruct.ipynb)|
0 commit comments