Skip to content

Conversation

lee101
Copy link

@lee101 lee101 commented Jul 30, 2025

noticed this kind of thing #1092 where prompts are quickly too long, and this way it autocompacts if its over roughly 80k tokens

@lee101
Copy link
Author

lee101 commented Jul 30, 2025

also fixes this one #970

@lee101
Copy link
Author

lee101 commented Jul 30, 2025

ah no i still get this AI_APICallError: prompt token count of 151316 exceeds the limit of 128000

going to try fix this up so if it ever gets this kind of error it does a /compact as well

@BOTKooper
Copy link

@lee101 tbh i don't know if auto-compacting is a good idea in this case, it will randomly break the agent loop and that's probably not what the user wants to see in the middle of the tool call

and, if that's the direction opencode wants to take, 80k seems a bit too low anyways - you effectively kill large context size (e.g. for "how does this thing work?" prompts to 1Mil gemini)

@lee101
Copy link
Author

lee101 commented Jul 30, 2025

yea needs to be adaptable per model really like if theres only 10k tokens left for a given model, then compact

Just trying another strategy that instead catches the error where a given LLM complains and decides to compact at that stage instead

shameful plug that im also an API provider and have an open source summarization tool https://text-generator.io/docs https://github.com/TextGeneratorio/text-generator.io/blob/main/questions/summarization.py that accepts a max_length so gives you a summarization API with some control of where to cutoff. could probably be something you do on device too as summarization /compaction can take a long time/lot of tokens but probably doesnt need to be at a high quality

@BOTKooper
Copy link

regarding 80k specifically - we have info about which model is being used and it's limits (thx models.dev), maybe smth like this would make more sense

diff --git a/packages/opencode/src/session/index.ts b/packages/opencode/src/session/index.ts
index 5c0cf83a..33c7f383 100644
--- a/packages/opencode/src/session/index.ts
+++ b/packages/opencode/src/session/index.ts
@@ -46,7 +46,7 @@ export namespace Session {
   const log = Log.create({ service: "session" })
 
   const OUTPUT_TOKEN_MAX = 32_000
-  const AUTO_COMPACT_TOKEN_THRESHOLD = 80_000
+  const AUTO_COMPACT_TOKEN_THRESHOLD_PERCENTAGE = 0.8
 
   function estimateTokensFromMessages(messages: { info: MessageV2.Info; parts: MessageV2.Part[] }[]): number {
     let totalChars = 0
@@ -633,8 +633,11 @@ export namespace Session {
 
     // auto compact if estimated tokens exceed 80k threshold
     const estimatedTokens = estimateTokensFromMessages(msgs)
-    if (estimatedTokens > AUTO_COMPACT_TOKEN_THRESHOLD) {
-      log.info("auto-compact triggered", { estimatedTokens, threshold: AUTO_COMPACT_TOKEN_THRESHOLD })
+    if (estimatedTokens > model.info.limit.context * AUTO_COMPACT_TOKEN_THRESHOLD_PERCENTAGE) {
+      log.info("auto-compact triggered", {
+        estimatedTokens,
+        threshold: model.info.limit.context * AUTO_COMPACT_TOKEN_THRESHOLD_PERCENTAGE,
+      })
       await summarize({
         sessionID: input.sessionID,
         providerID: input.providerID,

@Syazvinski
Copy link

Claude code does autocompaction, and for the compaction prompt, it tells the model to be very specific about what the agent is doing, what the user asked for, and where the agentic flow was in completing that task, and it tells the model to move on, and it works fine most of the time. I think implementing it would make sense.

@lee101 lee101 force-pushed the lee101/autocompact branch from d41ebb7 to 6a4de40 Compare July 30, 2025 22:11
if (part.url && part.url.startsWith("data:")) {
msgChars += part.url.length * 0.75
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vibecoded code warning i dont really understand this file counting stuff here , i think really there wouldnt be file: data: type stuff in the prompts right... it would be something that is passed in semantically and has a set token cost per provider eg images in low_detail take up 64 ish tokens in openai etc so pretty token efficient these days but not part of the prompt

@lee101
Copy link
Author

lee101 commented Jul 30, 2025

ok made those improvements, i think there might be some issues with images and such there though.

Most likely im going to abandon this at this point when i think its working well enough for my use case sry, so someone would have to pick this up from here and fixup :)

teamgroove pushed a commit to teamgroove/opencode that referenced this pull request Aug 3, 2025
Merged from: sst#1407
Author: @lee101
Auto-merged by opencode-fork integration system
@ansh
Copy link

ansh commented Sep 9, 2025

why has this not been merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants