- Release the evaluation code
- Release the dataset
- Release the base and fine-tuned model checkpoints
- Release the fine-tuning code
- Release the multi-dataset sampler and pre-training code
When cloning this repository, remember to initialize the submodules:
git clone --recurse-submodules git@github.com:ZhiyingDu/HiMoE-VLA.git
# If you've already cloned the project, you can fetch the submodules with:
git submodule update --init --recursive
First, install uv using the following command:
wget -qO- https://astral.sh/uv/install.sh | sh
Once uv is installed, create the environment and install all dependencies:
GIT_LFS_SKIP_SMUDGE=1 uv sync
After the environment has been created, replace the relevant packages with our modified versions:
cp -r third_party/lerobot .venv/lib/python3.11/site-packages/
cp third_party/modeling_gemma.py .venv/lib/python3.11/site-packages/transformers/models/gemma
We provide the following pretrained models:
| Model | Description | Download |
|---|---|---|
| Base model | Pretrained on OXE and ALOHA | Download |
| Calvin D | Finetuned on Calvin D Joint Angle | Download |
| Libero 10 | Finetuned on Libero 10 | Download |
| Libero Goal | Finetuned on Libero Goal | Download |
| Libero Object | Finetuned on Libero Object | Download |
| Libero Spatial | Finetuned on Libero Spatial | Download |
We use the LeRoBot dataset, so you should convert your own data into the LeRobot format. We provide example scripts for reference, such as examples/calvin/convert_calvin_data_to_lerobot_joint.py. You can modify it to convert your own data and run the script with:
uv run examples/calvin/convert_calvin_data_to_lerobot_joint.py --data_dir /path/to/your/calvin_d/data
Here, we use the calvin_d_joint as an example. You need to update the following components:
CalvinInputsandCalvinOutputs: Define the data mapping from the CALVIN environment to the model and vice versa. Will be used for both, training and inference.LeRobotCalvinJointDataConfig: Defines how to process raw CALVIN data from LeRobot dataset for training.DatasetConfig: Defines dataset_name and data_maskTrainConfig: Defines training hyperparameters, dataset_mixture, and the pretrain model.DATASET_MIXTURES: Defines the training datasets and their corresponding weights.
Note: The data dir is os.path.join(assets_base_dir, repo_id)
After completing the steps above, you need to compute normalization statistics for your own data. Run the script below with the name of your training config:
uv run scripts/compute_norm_stats.py --config-name calvin_d_joint
Note: The dataset_mixture of calvin_d_joint must contain only one dataset.
Now, you can run training using the command below:
accelerate launch scripts/train.py --deepspeed=scr/moevla/training/zero2.json --config=calvin_d_joint --exp-name=calvin_d_joint
Note: If you want to use wandbe, please update the wandbe key in line7 of train.py.
To effeciently manage environment, we use server and client to run evaluation. First, you can launch a model server by the command below:
uv run scripts/serve_policy.py --env CALVIN_D_FINETUNE --port 9000
You can then launch a client for quering the server. See the CALVIN README for more details.
For Real-World Deployment, you can run with the commands below:
from moevla.policies import policy_config as _policy_config
from moevla.training import config as _config
# specific these parameter
train_config = ""
dataset_config = ""
checkpoint_dir = ""
policy = _policy_config.create_trained_policy(
_config.get_training_config(train_config),
_config.get_dataset_config(dataset_config),
checkpoint_dir,
default_prompt=None
)
# Run inference on a dummy example.
example = {
"observation/exterior_image_1_left": ...,
"observation/wrist_image_left": ...,
...
"prompt": "fold clothes"
}
action_chunk = policy.infer(example)["actions"]
Even the commands above can run infer, we still recommand using server and client for deployment.
We are deeply grateful for the development of openpi, LeRobot and DeepSeekMoE, from which our project draws extensively. We extend our sincere thanks to all contributors to these libraries for their hard work and dedication.
If you find our work useful in your research, please consider citing our paper:
@article{du2025himoe,
title={HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies},
author={Du, Zhiying and Liu, Bei and Liang, Yaobo and Shen, Yichao and Cao, Haidong and Zheng, Xiangyu and Feng, Zhiyuan and Wu, Zuxuan and Yang, Jiaolong and Jiang, Yu-Gang},
journal={arXiv preprint arXiv:2512.05693},
year={2025}
}
