Configs and updates to gaia eval #304

ollmer · 2025-10-09T17:58:50Z

Use tool calling
Eval Apriel model
Added readme

Description by Korbit AI

What change is being made?

Update Gaia evaluation configs and flow to use Apriel 1.5 LLM, switch to function-call style guidance, extend max_turns settings, and wire LLM endpoint configuration via environment variable; rename action class, adjust Gaia Gym/Benchmark wiring, and add supporting experiment/docs.

Why are these changes being made?

Switch to an Apriel-based evaluation setup with function-call style guidance to improve tool use and reliability; expose LLM base URL via environment for flexibility; extend max_turns to better control evaluation length and align env creation with new config fields; and update docs to guide setup.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

korbit-ai · 2025-10-09T17:58:55Z

Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.

Your admin can change your review schedule in the Korbit Console

configs and updates to run gaia eval with apriel

2939d43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configs and updates to gaia eval #304

Configs and updates to gaia eval #304

Uh oh!

ollmer commented Oct 9, 2025 •

edited by korbit-ai bot

Loading

Uh oh!

korbit-ai bot commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Configs and updates to gaia eval #304

Are you sure you want to change the base?

Configs and updates to gaia eval #304

Uh oh!

Conversation

ollmer commented Oct 9, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai bot commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ollmer commented Oct 9, 2025 •

edited by korbit-ai bot

Loading