-
Notifications
You must be signed in to change notification settings - Fork 1.7k
autocompaction #1407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
autocompaction #1407
Conversation
also fixes this one #970 |
ah no i still get this AI_APICallError: prompt token count of 151316 exceeds the limit of 128000 going to try fix this up so if it ever gets this kind of error it does a /compact as well |
@lee101 tbh i don't know if auto-compacting is a good idea in this case, it will randomly break the agent loop and that's probably not what the user wants to see in the middle of the tool call and, if that's the direction opencode wants to take, 80k seems a bit too low anyways - you effectively kill large context size (e.g. for "how does this thing work?" prompts to 1Mil gemini) |
yea needs to be adaptable per model really like if theres only 10k tokens left for a given model, then compact Just trying another strategy that instead catches the error where a given LLM complains and decides to compact at that stage instead shameful plug that im also an API provider and have an open source summarization tool https://text-generator.io/docs https://github.com/TextGeneratorio/text-generator.io/blob/main/questions/summarization.py that accepts a max_length so gives you a summarization API with some control of where to cutoff. could probably be something you do on device too as summarization /compaction can take a long time/lot of tokens but probably doesnt need to be at a high quality |
regarding 80k specifically - we have info about which model is being used and it's limits (thx models.dev), maybe smth like this would make more sense diff --git a/packages/opencode/src/session/index.ts b/packages/opencode/src/session/index.ts
index 5c0cf83a..33c7f383 100644
--- a/packages/opencode/src/session/index.ts
+++ b/packages/opencode/src/session/index.ts
@@ -46,7 +46,7 @@ export namespace Session {
const log = Log.create({ service: "session" })
const OUTPUT_TOKEN_MAX = 32_000
- const AUTO_COMPACT_TOKEN_THRESHOLD = 80_000
+ const AUTO_COMPACT_TOKEN_THRESHOLD_PERCENTAGE = 0.8
function estimateTokensFromMessages(messages: { info: MessageV2.Info; parts: MessageV2.Part[] }[]): number {
let totalChars = 0
@@ -633,8 +633,11 @@ export namespace Session {
// auto compact if estimated tokens exceed 80k threshold
const estimatedTokens = estimateTokensFromMessages(msgs)
- if (estimatedTokens > AUTO_COMPACT_TOKEN_THRESHOLD) {
- log.info("auto-compact triggered", { estimatedTokens, threshold: AUTO_COMPACT_TOKEN_THRESHOLD })
+ if (estimatedTokens > model.info.limit.context * AUTO_COMPACT_TOKEN_THRESHOLD_PERCENTAGE) {
+ log.info("auto-compact triggered", {
+ estimatedTokens,
+ threshold: model.info.limit.context * AUTO_COMPACT_TOKEN_THRESHOLD_PERCENTAGE,
+ })
await summarize({
sessionID: input.sessionID,
providerID: input.providerID, |
Claude code does autocompaction, and for the compaction prompt, it tells the model to be very specific about what the agent is doing, what the user asked for, and where the agentic flow was in completing that task, and it tells the model to move on, and it works fine most of the time. I think implementing it would make sense. |
…ion of just the end
d41ebb7
to
6a4de40
Compare
if (part.url && part.url.startsWith("data:")) { | ||
msgChars += part.url.length * 0.75 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vibecoded code warning i dont really understand this file counting stuff here , i think really there wouldnt be file: data: type stuff in the prompts right... it would be something that is passed in semantically and has a set token cost per provider eg images in low_detail take up 64 ish tokens in openai etc so pretty token efficient these days but not part of the prompt
ok made those improvements, i think there might be some issues with images and such there though. Most likely im going to abandon this at this point when i think its working well enough for my use case sry, so someone would have to pick this up from here and fixup :) |
why has this not been merged? |
noticed this kind of thing #1092 where prompts are quickly too long, and this way it autocompacts if its over roughly 80k tokens