MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Yixin Wan^1,2, Lei Ke¹, Wenhao Yu¹, Kai-Wei Chang², Dong Yu¹

¹Tencent AI, Seattle ²University of California, Los Angeles

✨ Overview

MotionEdit is a novel dataset and benchmark for motion-centric image editing. We also propose MotionNFT (Motion-guided Negative-aware FineTuning), a post-training framework with motion alignment rewards to guide models on motion image editing task.

📣 News

[2026/02/20]: 🎉 MotionEdit is accepted to CVPR 2026! See you in Denver! 😄
[2026/01] We release MotionEdit-Bench and MotionEdit-Train. Enjoy! 😁
[2025/12/11]: 🤩 We release MotionEdit, a novel dataset and benchmark for motion-centric image editing. Along with the dataset, we propose MotionNFT (Motion-guided Negative-aware FineTuning), a post-training framework with motion alignment rewards to guide models on motion editing task.

🔧 Usage

🧱 To Start: Environment Setup

Clone this github repository and switch to the directory.

git clone https://github.com/elainew728/motion-edit.git
cd motion-edit

Create and activate the conda environment with dependencies that supports inference and training.

Note: some models like UltraEdit requires specific dependencies on the diffusers library. Please refer to their official repository to resolve dependencies before running inference.

conda env create -f environment.yml
conda activate motionedit
pip install flash-attn==2.7.4.post1 --no-build-isolation

Finally, configure your own huggingface token to access restricted models by modifying YOUR_HF_TOKEN_HERE in inference/run_image_editing.py.

🔹 Quick Single-Image Demo

If you just want to edit a single image with our MotionNFT checkpoint, place the original input image file and your text prompt (in .txt format, same file name as the image file) inside examples/input_examples/. Then, run examples/run_inference_single.py to inference on the input image with your prompt.

We have prepared 3 input images from our MotionEdit-Bench dataset in the examples/input_examples/ folder. Play around with them by running the following example code:

python examples/run_inference_single.py \
    --input_image examples/input_examples/512.jpg \
    --output_dir examples/output_examples

The script automatically loads examples/input_examples/512.txt when --prompt is omitted. You can still override the prompt or supply a local LoRA via --prompt/--lora_path if needed.

🚀 Training with MotionNFT

To run training code, first change your working directory to the train folder:

cd train

Step 0: Data Format (Optional: If you wish to use your own dataset.)

This step is for preprocessing and formatting your own data for training. You can safely ignore this step if you plan to use our MotionEdit-Train dataset for training.

Please format your training data according to the following structure. Place your {}_metadata.jsonl files under the folder motionedit_data/ in the train/ directory.

Data Folder structure:

- motionedit_data
  - images/
     - YOUR_IMAGE_DATA
     - ...
  - train_metadata.jsonl
  - test_metadata.jsonl

train_metadata.jsonl and test_metadata.jsonl format:

{"prompt": "PROMPT", "image": ["INPUT_IMAGE_PATH", "TARGET_IMAGE_PATH"]}
...

Step 1: Deploy vLLM Reward Server

To set up the vLLM server for the MLLM feedback reward, first configure the path to your local Qwen2.5-VL-32B-Instruct model checkpoint by modifying YOUR_MODEL_PATH in train/reward_server/reward_server.py.

Then, you can start the reward server:

python reward_server/reward_server.py

Step 2: Configure Training

See train/config/qwen_image_edit_nft.py and train/config/kontext_nft.py for available configurations.

The default setting uses MotionEdit-Train for training. If you hope to use your own dataset, set the following in the config file:

config.use_hf_dataset = False
config.dataset = # Your own dataset path

Step 3: Run Training

export REWARD_SERVER=[YOUR_REWARD_SERVICE_IP_ADDR]:12341
RANK=[MACHINE_RANK]
MASTER_ADDR=[MASTER_ADDR]
MASTER_PORT=[MASTER_PORT]

accelerate launch --config_file flow_grpo/accelerate_configs/deepspeed_zero2.yaml \
    --num_machines 2 --num_processes 16 \
    --machine_rank ${RANK} --main_process_ip ${MASTER_ADDR} --main_process_port ${MASTER_PORT} \
    scripts/train_nft_qwen_image_edit.py --config config/qwen_image_edit_nft.py:qwen_motion_edit_reward

🔍 Large-Scale Inferencing on MotionEdit-Bench with Image Editing Models

We have released our MotionEdit-Bench on Huggingface. In this Github Repository, we provide code that supports easy inference across open-source Image Editing models: Qwen-Image-Edit, Flux.1 Kontext [Dev], InstructPix2Pix, HQ-Edit, Step1X-Edit, UltraEdit, MagicBrush, and AnyEdit.

Step 1: Data Preparation

The inference script default to using our MotionEdit-Bench, which will download the dataset from Huggingface. You can specify a cache_dir for storing the cached data.

Additionally, you can construct your own dataset for inference. Please organize all input images into a folder INPUT_FOLDER and create a metadata.jsonl in the same directory. The metadata.jsonl file must at least contain entries with 2 entries:

{
    "file_name": IMAGE_NAME.EXT,
    "prompt": PROMPT
}

Then, load your dataset by:

from datasets import load_dataset
dataset = load_dataset("imagefolder", data_dir=INPUT_FOLDER)

Step 2: Running Inference

Use the following command to run inference on MotionEdit-Bench with our MotionNFT checkpoint, trained on MotionEdit with Qwen-Image-Edit as the base model:

python inference/run_image_editing.py \
    -o "./outputs/" \
    -m "motionedit" \
    --seed 42

Alternatively, our code supports inferencing multiple open-source image editing models. You can run inference on model of your choice by specifying in the arguments. For instance, here's a sample script for running inference on Qwen-Image-Edit:

python inference/run_image_editing.py \
    -o "./outputs/" \
    -m "qwen-image-edit" \
    --seed 42

✏️ Citing

Please consider citing our paper if you find our research useful. We appreciate your recognition!

@article{motionedit,
      title={MotionEdit: Benchmarking and Learning Motion-Centric Image Editing}, 
      author={Yixin Wan and Lei Ke and Wenhao Yu and Kai-Wei Chang and Dong Yu},
      year={2025},
      journal={arXiv preprint arXiv:2512.10284},
}

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
examples		examples
inference		inference
media		media
train		train
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

✨ Overview

📣 News

🔧 Usage

🧱 To Start: Environment Setup

🔹 Quick Single-Image Demo

🚀 Training with MotionNFT

Step 0: Data Format (Optional: If you wish to use your own dataset.)

Step 1: Deploy vLLM Reward Server

Step 2: Configure Training

Step 3: Run Training

🔍 Large-Scale Inferencing on MotionEdit-Bench with Image Editing Models

Step 1: Data Preparation

Step 2: Running Inference

✏️ Citing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

✨ Overview

📣 News

🔧 Usage

🧱 To Start: Environment Setup

🔹 Quick Single-Image Demo

🚀 Training with MotionNFT

Step 0: Data Format (Optional: If you wish to use your own dataset.)

Step 1: Deploy vLLM Reward Server

Step 2: Configure Training

Step 3: Run Training

🔍 Large-Scale Inferencing on MotionEdit-Bench with Image Editing Models

Step 1: Data Preparation

Step 2: Running Inference

✏️ Citing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages