Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions docs/fundamentals/art-client.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,29 @@ backend = SkyPilotBackend.initialize_cluster(
await model.register(backend)
```

### Initializing from an existing SFT LoRA

If you've already fine-tuned a model with SFT using a LoRA adapter (e.g., Unsloth/PEFT) and have a standard Hugging Face–style adapter directory, you can start RL training from those weights by passing the adapter directory path as `base_model` when creating your `TrainableModel`.

Why this?

- Warm-start from task-aligned weights to reduce steps/GPU cost.
- Stabilize early training, especially for small models (1B–8B) that may get near-zero rewards at RL start.

```python
import art

model = art.TrainableModel(
name="agent-001",
project="my-agentic-task",
# Point to the local SFT LoRA adapter directory
# (e.g., contains adapter_config.json and adapter_model.bin/safetensors)
base_model="/path/to/my_sft_lora_adapter",
)
```

ART will load the adapter as the initial checkpoint and proceed with RL updates from there.

You're now ready to start training your agent.

## Running inference
Expand Down
16 changes: 16 additions & 0 deletions docs/getting-started/faq.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,22 @@ By allowing an LLM to make multiple attempts at accomplishing a task and scoring

</Accordion>

<Accordion title="Can I start RL from an existing SFT LoRA adapter?">
Yes. If you have a standard Hugging Face–style LoRA adapter directory (e.g., produced by Unsloth/PEFT), pass the adapter folder path as the `base_model` when creating your `TrainableModel`.

```python
import art

model = art.TrainableModel(
name="agent-001",
project="my-agentic-task",
base_model="/path/to/my_sft_lora_adapter", # HF-style adapter dir
)
```

ART will load the adapter as the initial checkpoint and proceed with RL updates from there.
</Accordion>

<Accordion title="How does ART work under the hood?">
This flow chart shows a highly simplified flow of how ART optimizes your agent. Your code is responsible for actually running the agent in the environment it will operate in, as well as scoring the trajectory (deciding whether the agent did a good job or not). ART is then able to take those trajectories and scores and use them to iteratively train your agent and improve performance.

Expand Down