Skip to content
@llmeval

FDU-NLP LLMEval Team

Popular repositories Loading

  1. LLMEval-1 LLMEval-1 Public

    [AAAI 2024] LLMEval Phase I dataset — 17 categories, 453 questions, 2186 annotators for Chinese LLM evaluation

    114 2

  2. LLMEval-2 LLMEval-2 Public

    [AAAI 2024] LLMEval Phase II dataset — professional domain evaluation across 12 academic disciplines

    71 4

  3. LLMEval-Fair LLMEval-Fair Public

    [ACL 2026] A large-scale longitudinal study on robust and fair evaluation of LLMs — 200K+ generative questions across 13 disciplines

    37 2

  4. LLMEval-Med LLMEval-Med Public

    [EMNLP 2025] A real-world clinical benchmark for medical LLMs with physician validation — 2,996 questions from EHRs

    Python 27 1

  5. Llmeval-Gaokao2024-Math Llmeval-Gaokao2024-Math Public

    LLM evaluation on 2024 Chinese Gaokao Mathematics — zero-contamination benchmark with dual prompt formats

    19 1

  6. LLMEval-Logic LLMEval-Logic Public

    LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening (80% public release; 20% private holdout)

    Python 9

Repositories

Showing 7 of 7 repositories

Top languages

Loading…

Most used topics

Loading…