huggingface gpt oss 120b code and results#18
Conversation
|
Thank you very much for working on integrating the open weight gpt-oss model into our pipeline. I have not gone through your log files in detail. I just quickly looked at the log file for Problem 4. It seems that the final solution of the model is only 6. This answer is wrong. Have you checked whether the model has really "solved" 5 out of 6 problems? Maybe reasoning_effort=low is too low? |
|
Have you checked whether the model has really "solved" 5 out of 6 problems? When setting reasoning high, the model tends to consume all the tokens in the reasoning process on single A100 GPU. I will try another run with reasoning effort='medium'. Thank you for the response. |
|
The solution are wrong when checked with gemini 2.5 pro. I’ll close the thread afterward, even if there’s no improvement after trying out the medium reasoning. |
|
The model should at least get the final answers correct. |
Hi, so did you solve 4/6 IMO using gpt-oss-120B as generator and verifier? That's impressive, could you share more information about it :) |
|
@hpicsk I really appreciate your PR here. However, low reasoning effort only can low verification checking and makes solutions pass but all solutions are not correct. Low reasoning effort only works for Formula derivation. If anyone is interested to use gpt oss 120, please check my respo https://github.com/bobbercheng/IMO25, I uses BFS+high reasoning and Formula derivation+low reasoning and solved all 6 problems. There are too many changes I did so I will not create a PR. |
solved 5 out 6 problems using reasoning_effort=low.
Setting reasoning to high gives OOM error.