-
-
Ego2Web Public
[CVPR 2026] Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
-
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
-
MEXA Public
[EMNLP 2025 Findings] MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
-
VEGGIE-VidEdit Public
[ICCV2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
-
CREMA Public
[ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
-
IVA-0 Public
[MM24] Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition
3 UpdatedJan 19, 2025 -
SeViLA Public
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
-
MoPRL Public
[TCSVT] Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection
-
LAVIS Public
Forked from salesforce/LAVISLAVIS - A One-stop Library for Language-Vision Intelligence
Python BSD 3-Clause "New" or "Revised" License UpdatedSep 27, 2022 -
VGT Public
Forked from sail-sg/VGTVideo Graph Transformer for Video Question Answering (ECCV'22)
Python Apache License 2.0 UpdatedAug 6, 2022 -
HOI-Learning-List Public
Forked from DirtyHarryLYL/HOI-Learning-ListA list of Human-Object Interaction Learning.
UpdatedJul 19, 2022 -
SlowFast Public
Forked from facebookresearch/SlowFastPySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Python Apache License 2.0 UpdatedJun 17, 2022 -
just-ask Public
Forked from antoyang/just-ask[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Jupyter Notebook Apache License 2.0 UpdatedMay 13, 2022 -
SJTUThesis Public
Forked from sjtug/SJTUThesisShanghai Jiao Tong University XeLaTeX Thesis Template
TeX Apache License 2.0 UpdatedApr 30, 2022 -
merlot_reserve Public
Forked from rowanz/merlot_reserveCode release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
Python MIT License UpdatedJan 25, 2022 -
video-swin-transformer-pytorch Public
Forked from haofanwang/video-swin-transformer-pytorchVideo Swin Transformer - PyTorch
Python MIT License UpdatedJan 4, 2022 -
ViLT Public
Forked from dandelin/ViLTCode for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Python Apache License 2.0 UpdatedDec 30, 2021 -
detectron2 Public
Forked from facebookresearch/detectron2Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Python Apache License 2.0 UpdatedNov 30, 2021 -
Person-Search-with-Natural-Language-Description Public
Forked from ShuangLI59/Person-Search-with-Natural-Language-DescriptionPerson Search with Natural Language Description
Lua UpdatedOct 12, 2021 -
grid-feats-vqa Public
Forked from facebookresearch/grid-feats-vqaGrid features pre-training code for visual question answering
Python Apache License 2.0 UpdatedSep 17, 2021 -
seg2vid Public
Forked from STVIR/seg2vidVideo Generation from Single Semantic Label Map
Python UpdatedSep 8, 2021 -
arunmallya.github.io Public
Forked from arunmallya/arunmallya.github.iomy public website
JavaScript UpdatedJul 23, 2021 -
Research Public
Forked from PaddlePaddle/Researchnovel deep learning research works with PaddlePaddle
Python Apache License 2.0 UpdatedMay 26, 2021 -
transformer-time-series-prediction Public
Forked from oliverguhr/transformer-time-series-predictionproof of concept for a transformer-based time series prediction model
Python MIT License UpdatedMay 4, 2021 -
awesome-vln Public
Forked from daqingliu/awesome-vlnA curated list of research papers in Vision-Language Navigation (VLN)
MIT License UpdatedMay 3, 2021 -
mmf Public
Forked from facebookresearch/mmfA modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Python Other UpdatedApr 2, 2021 -
-
awesome-anomaly-detection Public
Forked from hoya012/awesome-anomaly-detectionA curated list of awesome anomaly detection resources
UpdatedMar 5, 2021 -
AlphaPose Public
Forked from MVIG-SJTU/AlphaPoseReal-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System
Python Other UpdatedJan 24, 2021

