Stars
Code for ICCV 2025 paper — Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts
[CVPR 2024] MemFlow: Optical Flow Estimation and Prediction with Memory
[CVPR'22 Oral] GMFlow: Learning Optical Flow via Global Matching
【ICCV 2025】 InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation
[CVPR 2025] GenFusion: Closing the Loop between Reconstruction and Generation via Videos
[NeurIPS 2025] WorldMem: Long-term Consistent World Simulation with Memory
Implementation of paper "Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens"
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
[CVPR 2026] "GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation"
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
Collects papers on autonomous driving E2E learning, VLM/VLA and Hybrid systems, with organized research branches and trends in these fields.
Official implementation for “Towards Single-Source Domain Generalized Object Detection via Causal Visual Prompts” (NeurIPS 2025)
上海交通大学 LaTeX 论文模板 | Shanghai Jiao Tong University LaTeX Thesis Template
Official Code for Epona: Autoregressive Diffusion World Model for Autonomous Driving (ICCV 2025)
[ICCV2025] II-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting
[CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
[CVPR 2024] Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting
The official implementation of the ICML 2024 paper "MemoryLLM: Towards Self-Updatable Large Language Models" and "M+: Extending MemoryLLM with Scalable Long-Term Memory"
Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasoning"
CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine
MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark
[ACL'25 Main] Official Implementation of HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
