GitHub - Espere-1119-Song/Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark

】# Video-MMLU

Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark.

Video-MMLU specifically targets videos that focus on theorem demonstrations and probleming-solving, covering mathematics, physics, and chemistry. The videos deliver dense information through numbers and formulas, pose significant challenges for video LMMs in dynamic OCR recognition and comprehension.

Imagine a classroom where a large multimodal model is the student and Video-MMLUacts as the teacher.Video-MMLU evaluates whether the student can perceive and comprehend multi-discipline lectures, much like a student taking notes and being tested later. For each video, we generate a detailed caption as the standard "notes" to assess the model’s visual perception. Additionally, we create 15 questions as a "quiz" to evaluate content reasoning, challenging the model’s ability to apply learned knowledge.

News

[2025/3/27] Release Video-MMLU benchmark, as well as the evaluation code on lmms-eval and VLMEvalkit.

Evaluation Pipeline

We evaluate the Video-MMLU benchmark on two open-source multimodal large model evaluation frameworks, lmms-eval and VLMEvalkit. We use Qwen2.5-72B-Instruct as the judge model. Since loading the judge model will occupy memory and cause waste, we provide two ways to evaluate, including using SiliconFlowAPI (only support by VLMEvalkit) and local load Qwen2.5-72B for post-processing.

For more detailed, please refer to Eval Docs.

Leaderboard

We evaluate a total of 96 models across three categories:

3 Proprietary Models, including Gemini-1.5-Flash, GPT-4o, and Claude-3.5-sonnet,
78 Open-Source LMMs, encompassing state-of-the-art video-specific LMMs and image-based LMMs capable of processing multiple images, with model sizes ranging from 256M to 40B,
9 token compression models, especially for visual token reduction,
6 Vision-Blind Baselines.

To submit your model results, please send an email with your logs to enxin.23@intl.zju.edu.cn or open an issue in our repository.

To-Do List

Release Arxiv version
Upload source video, detailed captions and QA pairs
Upload lmms-eval code
Upload VLMEvalkit code
Upload figures_in_paper
Upload the frame captions, video captions and the transcripts
Upload keyframes

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
VLMEvalKit		VLMEvalKit
assets		assets
lmms-eval		lmms-eval
post_eval		post_eval
source		source
EVAL.md		EVAL.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News

Evaluation Pipeline

Leaderboard

To-Do List

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

News

Evaluation Pipeline

Leaderboard

To-Do List

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages