Hi, thanks for sharing the Baichuan-M2 series models.
I noticed that the table compares Baichuan-M2 Plus with OpenEvidence, GPT-5, and DeepSeek-R1-0528 on several medical exams.
Could you please clarify how OpenEvidence was evaluated in this comparison? Is there an OpenEvidence API available for research or reproduction of the results?
Thanks again for the great work!
