Stars
A portable identity for Claude Code. Clone it anywhere, and Claude knows you.
[CVPR 2026] Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
Official implementation of V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising
Reinforcing Grounded Video Reasoning via Visual-Perception Prompting
[CVPR 2026] tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
Code for "StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos [CVPR 2026]"
Official implementation of Rethinking Training Dynamics in Scale-wise Autoregressive Generation
Official implementation of: Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale
Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)
Dynamic 3D Foundation Model using Causal Transformer. [ICLR 2026]
[ICRA 2026] Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
[EMNLP 2025 Findings] MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
Code release for paper "Test-Time Training Done Right"
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
🔥Hierarchical Fine-Grained Image Forgery Detection and Localization (CVPR23 + IJCV24)
Official implementation of EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance
Official Implementation of Diffusion Step Annealing (DiSA) in Autoregressive Image Generation
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation [Siggraph Asian 2025]
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Finding]"
