Skip to content
View Huntersxsx's full-sized avatar
😶‍🌫️
😶‍🌫️
  • SJTU
  • Shanghai, China

Block or report Huntersxsx

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code for the paper: "Sentence Specified Dynamic Video Thumbnail Generation"

Python 34 6 Updated Aug 8, 2019

👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)

Python 74 1 Updated Jan 20, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 18,851 2,413 Updated Mar 20, 2026

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Python 295 13 Updated Jun 13, 2024

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-contex…

365 23 Updated Mar 19, 2025

中国大模型

6,430 557 Updated Nov 30, 2024

This is a collection of our NAS and Vision Transformer work.

Python 1,829 241 Updated Jul 25, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 75,302 15,179 Updated Apr 5, 2026

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,734 452 Updated May 29, 2024

LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案

1,312 327 Updated Dec 14, 2023

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Jupyter Notebook 5,852 549 Updated Mar 31, 2026
Python 45 Updated Oct 3, 2023

source code of our MGPN in SIGIR 2022

Python 18 1 Updated Jun 8, 2022

数据挖掘、计算机视觉、自然语言处理、推荐系统竞赛知识、代码、思路

Jupyter Notebook 4,736 1,088 Updated Oct 22, 2025

fast-stable-diffusion + DreamBooth

Python 7,896 1,373 Updated Nov 29, 2025

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion

Jupyter Notebook 7,742 803 Updated Dec 8, 2022

All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment

Python 19 3 Updated Feb 11, 2025

This is the pytorch implement of our paper "RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model"

Python 656 43 Updated Jun 29, 2024

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

Python 72 6 Updated Jan 4, 2026

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editin…

Python 3,211 235 Updated Aug 20, 2024
Jupyter Notebook 787 75 Updated Aug 7, 2024

[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval

Python 1,280 122 Updated Jul 18, 2023

Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)

Python 544 42 Updated Mar 24, 2022

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 13,392 894 Updated Dec 17, 2024

Adapting Meta AI's Segment Anything to Downstream Tasks with Adapters and Prompts

Python 1,502 122 Updated Dec 1, 2025

SeqTR: A Simple yet Universal Network for Visual Grounding

Python 144 15 Updated Oct 30, 2024

Related papers about Weakly-supervised Audio-Visual Video Parsing (AVVP) & Audio-Visual Event Localization (AVE)

5 Updated Jun 11, 2024

Related papers about Referring Image Segmentation (RIS)

16 Updated Dec 26, 2023
Next