Skip to content

huggingface gpt oss 120b code and results#18

Open
hpicsk wants to merge 2 commits intolyang36:mainfrom
hpicsk:master
Open

huggingface gpt oss 120b code and results#18
hpicsk wants to merge 2 commits intolyang36:mainfrom
hpicsk:master

Conversation

@hpicsk
Copy link
Copy Markdown

@hpicsk hpicsk commented Sep 22, 2025

solved 5 out 6 problems using reasoning_effort=low.

Setting reasoning to high gives OOM error.

@hpicsk hpicsk changed the title gpt oss 120b result with run logs huggingface gpt oss 120b code and results Sep 22, 2025
@yichenhuang
Copy link
Copy Markdown
Collaborator

Thank you very much for working on integrating the open weight gpt-oss model into our pipeline. I have not gone through your log files in detail. I just quickly looked at the log file for Problem 4. It seems that the final solution of the model is only 6. This answer is wrong. Have you checked whether the model has really "solved" 5 out of 6 problems? Maybe reasoning_effort=low is too low?

@hpicsk
Copy link
Copy Markdown
Author

hpicsk commented Sep 22, 2025

Have you checked whether the model has really "solved" 5 out of 6 problems?
-> No. I am not able to solve the IMO problems, but I will compare the details with other solutions as best as I can.

When setting reasoning high, the model tends to consume all the tokens in the reasoning process on single A100 GPU. I will try another run with reasoning effort='medium'.

Thank you for the response.

@hpicsk
Copy link
Copy Markdown
Author

hpicsk commented Sep 22, 2025

The solution are wrong when checked with gemini 2.5 pro. I’ll close the thread afterward, even if there’s no improvement after trying out the medium reasoning.

@yichenhuang
Copy link
Copy Markdown
Collaborator

The model should at least get the final answers correct.

@zhouzihao501
Copy link
Copy Markdown

zhouzihao501 commented Oct 22, 2025

Have you checked whether the model has really "solved" 5 out of 6 problems? -> No. I am not able to solve the IMO problems, but I will compare the details with other solutions as best as I can.

When setting reasoning high, the model tends to consume all the tokens in the reasoning process on single A100 GPU. I will try another run with reasoning effort='medium'.

Thank you for the response.

Hi, so did you solve 4/6 IMO using gpt-oss-120B as generator and verifier? That's impressive, could you share more information about it :)

@bobbercheng
Copy link
Copy Markdown

@hpicsk I really appreciate your PR here. However, low reasoning effort only can low verification checking and makes solutions pass but all solutions are not correct. Low reasoning effort only works for Formula derivation. If anyone is interested to use gpt oss 120, please check my respo https://github.com/bobbercheng/IMO25, I uses BFS+high reasoning and Formula derivation+low reasoning and solved all 6 problems. There are too many changes I did so I will not create a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants