Skip to content
This repository was archived by the owner on Aug 28, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Three minor updates to the Fine-Tuning Scheduler tutorial to better s…
…upport the release of Lightning 1.9.x (supported by Fine-Tuning Scheduler 0.4.x)
  • Loading branch information
speediedan committed Jan 25, 2023
commit aa838bbdbe0d4b0c8e056e937a2ca9b5a016f661
9 changes: 4 additions & 5 deletions lightning_examples/finetuning-scheduler/.meta.yml
Original file line number Diff line number Diff line change
@@ -1,20 +1,19 @@
title: Fine-Tuning Scheduler
author: "[Dan Dale](https://github.com/speediedan)"
created: 2021-11-29
updated: 2022-09-20
updated: 2023-01-24
license: CC BY-SA
build: 0
tags:
- Fine-Tuning
description: |
This notebook introduces the [Fine-Tuning Scheduler](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) extension
and demonstrates the use of it to fine-tune a small foundational model on the
and demonstrates the use of it to fine-tune a small foundation model on the
[RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of
[SuperGLUE](https://super.gluebenchmark.com/) with iterative early-stopping defined according to a user-specified
schedule. It uses Hugging Face's ``datasets`` and ``transformers`` libraries to retrieve the relevant benchmark data
and foundational model weights. The required dependencies are installed via the finetuning-scheduler ``[examples]`` extra.
and foundation model weights. The required dependencies are installed via the finetuning-scheduler ``[examples]`` extra.
requirements:
- finetuning-scheduler[examples]>=0.3.0
- datasets<2.8.0 # todo: AttributeError: module 'datasets.arrow_dataset' has no attribute 'Batch'
- finetuning-scheduler[examples]>=0.4.0
accelerator:
- GPU
45 changes: 23 additions & 22 deletions lightning_examples/finetuning-scheduler/finetuning-scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@
# <div style="display:inline" id="a1">
#
# Fundamentally, [Fine-Tuning Scheduler](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) enables
# scheduled, multi-phase, fine-tuning of foundational models. Gradual unfreezing (i.e. thawing) can help maximize
# foundational model knowledge retention while allowing (typically upper layers of) the model to
# scheduled, multi-phase, fine-tuning of foundation models. Gradual unfreezing (i.e. thawing) can help maximize
# foundation model knowledge retention while allowing (typically upper layers of) the model to
# optimally adapt to new tasks during transfer learning [1, 2, 3](#f1)
#
# </div>
Expand Down Expand Up @@ -111,7 +111,7 @@
#
#
#
# The end-to-end example in this notebook ([Scheduled Fine-Tuning For SuperGLUE](#superglue)) uses [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) in explicit mode to fine-tune a small foundational model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/).
# The end-to-end example in this notebook ([Scheduled Fine-Tuning For SuperGLUE](#superglue)) uses [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) in explicit mode to fine-tune a small foundation model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/).
# Please see the [official Fine-Tuning Scheduler documentation](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) if you are interested in a similar [CLI-based example](https://finetuning-scheduler.readthedocs.io/en/stable/index.html#example-scheduled-fine-tuning-for-superglue) using the LightningCLI.

# %% [markdown]
Expand Down Expand Up @@ -147,23 +147,24 @@
#
# **Note:** Currently, [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) supports the following strategy types:
#
# - ``DP``
# - ``DDP``
# - ``DDP_FORK`` (and its aliases e.g. ``ddp_notebook``)
# - ``DDP_SPAWN``
# - ``DDP_SHARDED``
# - ``DDP_SHARDED_SPAWN``
# - ``ddp`` (and alias ``ddp_find_unused_parameters_false``)
# - ``fsdp_native`` (and alias ``fsdp_native_full_shard_offload``)
# - ``ddp_spawn`` (and aliases ``ddp_fork``, ``ddp_notebook``)
# - ``dp``
# - ``ddp_sharded`` (deprecated, to be removed in 2.0)
# - ``ddp_sharded_spawn`` (deprecated, to be removed in 2.0)
#
# Custom or officially unsupported strategies can be used by setting [FinetuningScheduler.allow_untested](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html?highlight=allow_untested#finetuning_scheduler.fts.FinetuningScheduler.params.allow_untested) to ``True``.
# Note that most currently unsupported strategies are so because they require varying degrees of modification to be compatible (e.g. ``deepspeed`` requires an ``add_param_group`` method, ``tpu_spawn`` an override of the current broadcast method to include python objects)
# Note that most currently unsupported strategies are so because they require varying degrees of modification to be compatible. For example, ``deepspeed`` will require a [StrategyAdapter](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.strategy_adapters.html#finetuning_scheduler.strategy_adapters.StrategyAdapter) to be written (similar to the one for ``FSDP``, [FSDPStrategyAdapter](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.strategy_adapters.html#finetuning_scheduler.strategy_adapters.FSDPStrategyAdapter)) before support can be added (PRs welcome!),
# while ``tpu_spawn`` would require an override of the current broadcast method to include python objects.
# </div>

# %% [markdown]
# <div id="superglue"></div>
#
# ## Scheduled Fine-Tuning For SuperGLUE
#
# The following example demonstrates the use of [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) to fine-tune a small foundational model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/). Iterative early-stopping will be applied according to a user-specified schedule.
# The following example demonstrates the use of [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) to fine-tune a small foundation model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/). Iterative early-stopping will be applied according to a user-specified schedule.
#

# %%
Expand All @@ -180,7 +181,7 @@
import pytorch_lightning as pl
import torch
from datasets import logging as datasets_logging
from lightning_lite.accelerators.cuda import is_cuda_available
from lightning_fabric.accelerators.cuda import is_cuda_available
from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint
from pytorch_lightning.loggers.tensorboard import TensorBoardLogger
from pytorch_lightning.utilities import rank_zero_warn
Expand Down Expand Up @@ -251,8 +252,8 @@ def __init__(
model_name_or_path (str):
Can be either:
- A string, the ``model id`` of a pretrained model hosted inside a model repo on huggingface.co.
Valid model ids can be located at the root-level, like ``bert-base-uncased``, or namespaced under
a user or organization name, like ``dbmdz/bert-base-german-cased``.
Valid model ids can be located at the root-level, like ``bert-base-uncased``, or namespaced
under a user or organization name, like ``dbmdz/bert-base-german-cased``.
- A path to a ``directory`` containing model weights saved using
:meth:`~transformers.PreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
task_name (str, optional): Name of the SuperGLUE task to execute. This module supports 'rte' or 'boolq'.
Expand All @@ -261,7 +262,7 @@ def __init__(
train_batch_size (int, optional): Training batch size. Defaults to 16.
eval_batch_size (int, optional): Batch size to use for validation and testing splits. Defaults to 16.
tokenizers_parallelism (bool, optional): Whether to use parallelism in the tokenizer. Defaults to True.
\**dataloader_kwargs: Arguments passed when initializing the dataloader
\**dataloader_kwargs: Arguments passed when initializing the dataloader.
"""
super().__init__()
task_name = task_name if task_name in TASK_NUM_LABELS.keys() else DEFAULT_TASK
Expand Down Expand Up @@ -300,7 +301,7 @@ def train_dataloader(self):
def val_dataloader(self):
return DataLoader(self.dataset["validation"], batch_size=self.hparams.eval_batch_size, **self.dataloader_kwargs)

def _convert_to_features(self, example_batch: datasets.arrow_dataset.Batch) -> BatchEncoding:
def _convert_to_features(self, example_batch: datasets.arrow_dataset.LazyDict) -> BatchEncoding:
"""Convert raw text examples to a :class:`~transformers.tokenization_utils_base.BatchEncoding` container
(derived from python dict) of features that includes helpful methods for translating between word/character
space and token space.
Expand All @@ -309,7 +310,7 @@ def _convert_to_features(self, example_batch: datasets.arrow_dataset.Batch) -> B
example_batch ([type]): The set of examples to convert to token space.

Returns:
``BatchEncoding``: A batch of encoded examples (note default tokenizer batch_size=1000)
``BatchEncoding``: A batch of encoded examples (note default tokenizer batch_size=1000).
"""
text_pairs = list(zip(example_batch[self.text_fields[0]], example_batch[self.text_fields[1]]))
# Tokenize the text/text pairs
Expand All @@ -323,8 +324,8 @@ def _convert_to_features(self, example_batch: datasets.arrow_dataset.Batch) -> B

# %%
class RteBoolqModule(pl.LightningModule):
"""A ``LightningModule`` that can be used to fine-tune a foundational model on either the RTE or BoolQ
SuperGLUE tasks using Hugging Face implementations of a given model and the `SuperGLUE Hugging Face dataset."""
"""A ``LightningModule`` that can be used to fine-tune a foundation model on either the RTE or BoolQ SuperGLUE
tasks using Hugging Face implementations of a given model and the `SuperGLUE Hugging Face dataset."""

def __init__(
self,
Expand All @@ -337,9 +338,9 @@ def __init__(
):
"""
Args:
model_name_or_path (str): Path to pretrained model or identifier from https://huggingface.co/models
model_name_or_path (str): Path to pretrained model or identifier from https://huggingface.co/models.
optimizer_init (Dict[str, Any]): The desired optimizer configuration.
lr_scheduler_init (Dict[str, Any]): The desired learning rate scheduler config
lr_scheduler_init (Dict[str, Any]): The desired learning rate scheduler config.
model_cfg (Optional[Dict[str, Any]], optional): Defines overrides of the default model config. Defaults to
``None``.
task_name (str, optional): The SuperGLUE task to execute, one of ``'rte'``, ``'boolq'``. Defaults to "rte".
Expand Down Expand Up @@ -495,7 +496,7 @@ def configure_optimizers(self):
# Though other optimizers can arguably yield some marginal advantage contingent on the context,
# the Adam optimizer (and the [AdamW version](https://pytorch.org/docs/stable/_modules/torch/optim/adamw.html#AdamW) which
# implements decoupled weight decay) remains robust to hyperparameter choices and is commonly used for fine-tuning
# foundational language models. See [(Sivaprasad et al., 2020)](#f2) and [(Mosbach, Andriushchenko & Klakow, 2020)](#f3) for theoretical and systematic empirical justifications of Adam and its use in fine-tuning
# foundation language models. See [(Sivaprasad et al., 2020)](#f2) and [(Mosbach, Andriushchenko & Klakow, 2020)](#f3) for theoretical and systematic empirical justifications of Adam and its use in fine-tuning
# large transformer-based language models. The values used here have some justification
# in the referenced literature but have been largely empirically determined and while a good
# starting point could be could be further tuned.
Expand Down