diff --git a/docs/fundamentals/art-client.mdx b/docs/fundamentals/art-client.mdx
index 72840d7b..807177f1 100644
--- a/docs/fundamentals/art-client.mdx
+++ b/docs/fundamentals/art-client.mdx
@@ -66,6 +66,29 @@ backend = SkyPilotBackend.initialize_cluster(
await model.register(backend)
```
+### Initializing from an existing SFT LoRA
+
+If you've already fine-tuned a model with SFT using a LoRA adapter (e.g., Unsloth/PEFT) and have a standard Hugging Face–style adapter directory, you can start RL training from those weights by passing the adapter directory path as `base_model` when creating your `TrainableModel`.
+
+Why this?
+
+- Warm-start from task-aligned weights to reduce steps/GPU cost.
+- Stabilize early training, especially for small models (1B–8B) that may get near-zero rewards at RL start.
+
+```python
+import art
+
+model = art.TrainableModel(
+ name="agent-001",
+ project="my-agentic-task",
+ # Point to the local SFT LoRA adapter directory
+ # (e.g., contains adapter_config.json and adapter_model.bin/safetensors)
+ base_model="/path/to/my_sft_lora_adapter",
+)
+```
+
+ART will load the adapter as the initial checkpoint and proceed with RL updates from there.
+
You're now ready to start training your agent.
## Running inference
diff --git a/docs/getting-started/faq.mdx b/docs/getting-started/faq.mdx
index 87b554c5..54c4d846 100644
--- a/docs/getting-started/faq.mdx
+++ b/docs/getting-started/faq.mdx
@@ -14,6 +14,22 @@ By allowing an LLM to make multiple attempts at accomplishing a task and scoring
+
+ Yes. If you have a standard Hugging Face–style LoRA adapter directory (e.g., produced by Unsloth/PEFT), pass the adapter folder path as the `base_model` when creating your `TrainableModel`.
+
+```python
+import art
+
+model = art.TrainableModel(
+ name="agent-001",
+ project="my-agentic-task",
+ base_model="/path/to/my_sft_lora_adapter", # HF-style adapter dir
+)
+```
+
+ART will load the adapter as the initial checkpoint and proceed with RL updates from there.
+
+
This flow chart shows a highly simplified flow of how ART optimizes your agent. Your code is responsible for actually running the agent in the environment it will operate in, as well as scoring the trajectory (deciding whether the agent did a good job or not). ART is then able to take those trajectories and scores and use them to iteratively train your agent and improve performance.