Yuxiang Lu1,2
Β Β
Zhe Liu1
Β Β
Xianzhe Fan1
Β Β
Zhenya Yang1
Β Β
Jinghua Hou1
Β Β
Junyi Li1
Β Β
Kaixin Ding1
Β Β
Hengshuang Zhao1
1 The University of Hong Kong 2 ACE Robotics
This repository provides the official implementation of FASTER, together with an unofficial implementation of Training-time RTC, built on top of openpi.
Real-time reaction in VLAs is constrained not only by inference latency, but also by how action chunks are generated and executed. FASTER introduces a new paradigm for fast action sampling under asynchronous execution. By compressing the sampling process for immediate reaction into a single step, FASTER achieves 10x acceleration over
demo.mp4
- Apr 30 2026: Code released.
- Mar 19 2026: Paper released.
Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in
This repository is built on top of openpi (jax version). We strongly recommend getting familiar with the original openpi workflow before moving on.
FASTER follows the same requirements as openpi. To run the models in this repository, you will need an NVIDIA GPU with at least the following specifications. These estimates assume a single GPU, but you can also use multiple GPUs with model parallelism to reduce per-GPU memory requirements by configuring fsdp_devices in the training config. Please note that the current training script does not yet support multi-node training.
| Mode | Memory Required | Example GPU |
|---|---|---|
| Inference | > 8 GB | RTX 4090 |
| Fine-Tuning (LoRA) | > 22.5 GB | RTX 4090 |
| Fine-Tuning (Full) | > 70 GB | A100 (80GB) / H100 |
This repository has been tested with full fine-tuning
When cloning this repo, make sure to update submodules:
git clone --recurse-submodules https://github.com/innovator-zero/FASTER.git
# Or if you already cloned the repo:
git submodule update --init --recursiveWe use uv to manage Python dependencies. See the uv installation instructions to set it up. Once uv is installed, run the following to set up the environment:
GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .Note: GIT_LFS_SKIP_SMUDGE=1 is needed to pull LeRobot (dataset v2.1, commit 0cf8648) as a dependency.
We provide a detailed guide on how to train and deploy FASTER on the AgileX Piper platform. If you have already adapted openpi to your robot platform, including data processing and mapping, running FASTER should be straightforward.
For simulation benchmarks, you can refer to LIBERO and CALVIN guides.
FASTER follows the standard fine-tuning pipeline in openpi. The main difference lies in the HAS (Horizon-Aware Schedule) instead of the conventional constant schedule.
We use LeRobot dataset v2.1 as the data loader. If you use AgileX Piper robotic arms or Cobot Magic systems, we strongly recommend our piper-aio toolkit. It covers the full robot-learning loop: hardware setup, teleoperated data collection, data replay, LeRobot dataset conversion, and policy inference.
The data-processing config for AgileX Piper data is provided in AgilexInputs, AgilexOutputs, and LeRobotAgilexDataConfig.
We provide the following example training configs in config.py:
-
pi05_agilex: fine-tuning the$\pi_{0.5}$ model with the constant schedule. -
pi05_rtc_agilex: fine-tuning$\pi_{0.5}$ model with action conditioning strategy proposed by Training-time RTC. -
pi05_faster_agilex: fine-tuning$\pi_{0.5}$ model with FASTER, using the following hyperparameters:-
max_delay: maximum prefix length$d_\text{max}$ , which simulates inference delay during training via action conditioning strategy. The default is10, supporting TTFA (Time to First Action) up to 333.3 ms on a 30 Hz robot. -
mix_prob: mixing probability$p$ in the mixed scheduling strategy. Each action sample uses HAS with probability$p$ and retains the original constant schedule with probability$1 - p$ . The default is0.5. -
alpha: HAS hyperparameter controlling how hit times vary across action indices. The default is0.6. -
u0: global timestep at which the first action is finalized, set to$(N-1)/N$ for$N$ inference sampling steps. Since$\pi_{0.5}$ uses 10 steps, we setu0=0.9by default.
-
Remember to compute normalization statistics for the training data before launching training. These statistics can be shared across configs with a symlink (ln -s):
uv run scripts/compute_norm_stats.py --config-name pi05_faster_agilexThen you can launch training:
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_faster_agilex --exp-name=my_experimentSince the robot controller typically requires a different environment or a separate machine, we use a client-server interface for policy deployment. The server handles policy inference while the client runs the robot controller, with communication established through WebSocket. You can spin up the policy server on the same machine or on a LAN-connected workstation. For remote inference, prefer wired LAN to reduce latency and packet loss.
On the client side, we use the inference scripts provided in our piper-aio toolkit. If you use your own robot platform, you can adapt those scripts to your setup.
This is the standard practice for action-chunking policies and serves as a baseline for VLAs such as
Start the policy server with the constant schedule:
uv run scripts/serve_policy.py policy:checkpoint --policy.config=pi05_agilex --policy.dir=checkpoints/pi05_agilex/my_experiment/49999Start the robot client in sync mode:
python inference/infer_sync.py # ... other argumentsThe robot initiates inference for the next chunk before the current chunk is fully executed. Once the next inference request is triggered, the robot continues executing the ongoing actions in the current chunk. Before the final action is completed, the newly predicted chunk is expected to be available, enabling seamless execution without halting.
Start the policy server with HAS (match alpha and u0 used in training):
uv run scripts/serve_policy.py \
--use-custom-sample-kwargs \
--infer-time-schedule=HAS \
--alpha=0.6 \
--u0=0.9 \
policy:checkpoint \
--policy.config=pi05_faster_agilex --policy.dir=checkpoints/pi05_faster_agilex/my_experiment/49999Start the robot client in async (rtc) mode:
python inference/infer_async.py --mode=rtc --delay=4 --exec_horizon=25 # ... other argumentsArguments:
-
--delay: inference delay$d:= \lfloor \Delta t_\text{infer}/\Delta t_{\text{ctrl}}\rfloor$ , determined by the inference latency$\Delta t_\text{infer}$ and control period$\Delta t_{\text{ctrl}}$ . We recommend setting it one step larger to account for variation in inference time and transmission latency. -
--exec_horizon: execution horizon$s$ for the action chunk. The client executes only the first$s$ valid actions (excluding delayed ones), then triggers a new inference request. This value should be larger than or equal to$d$ .
This mode is built on our Streaming Client-Server Interface and achieves the lowest TTFA (Time to First Action). Early actions can be dispatched to the robot client instantly upon completion. While the robot executes these initial movements, the policy server continues refining subsequent actions in parallel and progressively replenishes the clientβs action buffer. It is designed for tasks that require fast reactions, such as playing table tennis.
Start the policy server in streaming mode:
uv run scripts/serve_policy.py \
--use-custom-sample-kwargs \
--infer-time-schedule=HAS \
--alpha=0.6 \
--u0=0.9 \
--streaming \
--early-stop-actions=4 \
policy:checkpoint \
--policy.config=pi05_faster_agilex --policy.dir=checkpoints/pi05_faster_agilex/my_experiment/49999Arguments:
--early-stop-actions: if the policy has already generated this number of valid actions and dispatched them to the client, the remaining action-sampling iterations stop early. This value should be larger than or equal toexec_horizon, because the remaining noisy actions will not be executed by the robot and do not need to be generated. The policy server can then prepare for the next inference request.
Start the robot client in streaming mode:
python inference/infer_async.py --mode=rtc --delay=3 --exec_horizon=4 --streaming # ... other argumentsArguments:
-
--delay: can be smaller than in async mode because inference latency, measured by TTFA, is reduced. -
--exec_horizon: a smaller$s$ can help highly dynamic tasks by increasing inference frequency and tightening the inference-execution loop. For everyday tasks such as pick-and-place or folding towels, a small$s$ is usually unnecessary and may increase motion jitter.
@article{lu2026faster,
title={FASTER: Rethinking Real-Time Flow VLAs},
author={Yuxiang Lu and Zhe Liu and Xianzhe Fan and Zhenya Yang and Jinghua Hou and Junyi Li and Kaixin Ding and Hengshuang Zhao},
year={2026},
journal={arXiv preprint arXiv:2603.19199}
}
We thank the following repositories for their references and prior work: