huggingface gpt oss 120b code and results by hpicsk · Pull Request #18 · lyang36/IMO25

hpicsk · 2025-09-22T07:06:39Z

solved 5 out 6 problems using reasoning_effort=low.

Setting reasoning to high gives OOM error.

yichenhuang · 2025-09-22T07:59:11Z

Thank you very much for working on integrating the open weight gpt-oss model into our pipeline. I have not gone through your log files in detail. I just quickly looked at the log file for Problem 4. It seems that the final solution of the model is only 6. This answer is wrong. Have you checked whether the model has really "solved" 5 out of 6 problems? Maybe reasoning_effort=low is too low?

hpicsk · 2025-09-22T15:14:14Z

Have you checked whether the model has really "solved" 5 out of 6 problems?
-> No. I am not able to solve the IMO problems, but I will compare the details with other solutions as best as I can.

When setting reasoning high, the model tends to consume all the tokens in the reasoning process on single A100 GPU. I will try another run with reasoning effort='medium'.

Thank you for the response.

hpicsk · 2025-09-22T15:50:00Z

The solution are wrong when checked with gemini 2.5 pro. I’ll close the thread afterward, even if there’s no improvement after trying out the medium reasoning.

yichenhuang · 2025-09-22T18:10:32Z

The model should at least get the final answers correct.

zhouzihao501 · 2025-10-22T09:46:59Z

Have you checked whether the model has really "solved" 5 out of 6 problems? -> No. I am not able to solve the IMO problems, but I will compare the details with other solutions as best as I can.

When setting reasoning high, the model tends to consume all the tokens in the reasoning process on single A100 GPU. I will try another run with reasoning effort='medium'.

Thank you for the response.

Hi, so did you solve 4/6 IMO using gpt-oss-120B as generator and verifier? That's impressive, could you share more information about it :)

bobbercheng · 2026-01-07T14:19:48Z

@hpicsk I really appreciate your PR here. However, low reasoning effort only can low verification checking and makes solutions pass but all solutions are not correct. Low reasoning effort only works for Formula derivation. If anyone is interested to use gpt oss 120, please check my respo https://github.com/bobbercheng/IMO25, I uses BFS+high reasoning and Formula derivation+low reasoning and solved all 6 problems. There are too many changes I did so I will not create a PR.

st and others added 2 commits September 22, 2025 16:02

Initial commit with IMO25 project files

27638bf

Merge branch 'lyang36:main' into master

e5be816

hpicsk changed the title ~~gpt oss 120b result with run logs~~ huggingface gpt oss 120b code and results Sep 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

huggingface gpt oss 120b code and results#18

huggingface gpt oss 120b code and results#18
hpicsk wants to merge 2 commits intolyang36:mainfrom
hpicsk:master

hpicsk commented Sep 22, 2025

Uh oh!

yichenhuang commented Sep 22, 2025

Uh oh!

hpicsk commented Sep 22, 2025

Uh oh!

hpicsk commented Sep 22, 2025

Uh oh!

yichenhuang commented Sep 22, 2025

Uh oh!

zhouzihao501 commented Oct 22, 2025 •

edited

Loading

Uh oh!

bobbercheng commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hpicsk commented Sep 22, 2025

Uh oh!

yichenhuang commented Sep 22, 2025

Uh oh!

hpicsk commented Sep 22, 2025

Uh oh!

hpicsk commented Sep 22, 2025

Uh oh!

yichenhuang commented Sep 22, 2025

Uh oh!

zhouzihao501 commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bobbercheng commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhouzihao501 commented Oct 22, 2025 •

edited

Loading