Skip to content
View SJP2022's full-sized avatar
💜
💜

Block or report SJP2022

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Self-hinting RL increases the usage rate of hard prompts, and improves LLM's performance.

Python 20 3 Updated Feb 9, 2026

[ICCV 2025] Code for "SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning"

Python 8 Updated Oct 27, 2025

From Hypothesis to Publication: A Comprehensive Survey of AI-Driven Research Support Systems

18 1 Updated Nov 23, 2025

[CVPR 2026] FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance

Python 45 3 Updated Mar 13, 2026

The official code of FineRMoE.

Python 19 Updated Mar 17, 2026
Jupyter Notebook 17 Updated Mar 14, 2026

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…

Python 13,255 1,284 Updated Mar 20, 2026

HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model

Python 91 4 Updated Jul 17, 2025

[CVPR-26] Official repository of "CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization"

Python 15 Updated Mar 9, 2026

Official Implementation of "Learning Accurate Segmentation Purely from Self-Supervision"

Python 10 Updated Mar 2, 2026

The official repo of FineSure (ACL-2024)

Python 36 9 Updated Jul 8, 2024

[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding

Python 44 Updated Mar 16, 2026

ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Understanding.

Python 49 2 Updated Mar 3, 2026

Efficient Triton Kernels for LLM Training

Python 6,217 502 Updated Mar 20, 2026

D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning

Python 7 Updated Feb 11, 2026

Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Python 26 Updated Feb 11, 2026

https://diadem-captioner.github.io/

Python 4 Updated Jan 31, 2026

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Python 676 27 Updated Oct 25, 2024
Python 83 2 Updated Jun 23, 2025

ChronusOmni: Improving Time Awareness of Omni Large Language Models

Python 13 Updated Jan 18, 2026

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,750 389 Updated Mar 16, 2026

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

Python 26 Updated Jan 23, 2026

[AAAI 2026] OwlCap: A motion-detail balanced video captioning MLLM.

Python 2 1 Updated Dec 23, 2025

The Source Code for OmniVideoBench @ICLR 2026

Python 66 3 Updated Feb 12, 2026

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Visual Reasoning with Qwen2.5-Omni]

Python 76 4 Updated May 18, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 24,793 4,916 Updated Mar 20, 2026

Official repo and evaluation implementation of VSI-Bench

Python 685 43 Updated Aug 5, 2025

Structured Video Comprehension of Real-World Shorts

Python 233 7 Updated Sep 21, 2025

Official repo for paper "HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies"

Python 27 Updated Dec 12, 2025

https://avocado-captioner.github.io/

Python 31 1 Updated Oct 16, 2025
Next