Stars
[CVPR 2025 Highlight] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
Devkit and documentation for the NVIDIA Physical AI Autonomous Vehicles Dataset
[CVPR 2026 Highlight] Implementation of "IntrinsicWeather: Controllable Weather Editing in Intrinsic Space".
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
[CVPR 2026] Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models.
[ICLR 2026]WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving
PyTorch Implementation of "PICS: Pairwise Image Compositing with Spatial Interactions", ICLR 2026
DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
The official implementation of the paper “VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction.”
[CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
PyTorch code and models for the DINOv2 self-supervised learning method.
[CVPR2023] LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
[NeurIPS 2025] Official code of Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting
An official implementation of the Anchor DETR.
⭐⭐⭐FightingCV Paper Reading, which helps you understand the most advanced research work in an easier way 🍀 🍀 🍀
[CVPR 2022 Oral] Official implementation of DN-DETR
[ECCV`24&ICLR`25] CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians
[AAAI2024] Far3D: Expanding the Horizon for Surround-view 3D Object Detection
An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community to help implement this model!
Object tracking measure in javascript (MOTA, IDF1 ...)
A suite of image and video neural tokenizers