refactor: optimize agent list payload and improve multimodal detection logic#12942
Merged
KevinHuSh merged 2 commits intoinfiniflow:mainfrom Feb 2, 2026
Merged
Conversation
Since model_type only represents the primary category (e.g., 'chat'), it cannot capture auxiliary capabilities. Switching to 'IMAGE2TEXT' tag detection allows multimodal support for versatile models like gpt-5.2-pro.
The dsl field in the agent list is typically large and causes unnecessary network overhead. DSL is now only fetched in the detail view.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #12942 +/- ##
=======================================
Coverage 44.46% 44.46%
=======================================
Files 43 43
Lines 9266 9266
Branches 107 107
=======================================
Hits 4120 4120
Misses 5127 5127
Partials 19 19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR focuses on API performance optimization and refining the model capability detection logic in the Agent/Canvas module.
1. Performance Optimization (Backend)
cls.model.dslfrom query fields inUserCanvasService.get_by_tenant_ids.dslobject is large and unnecessary for the Agent list view. Excluding it reduces the payload size of the/v1/canvas/listAPI, leading to faster serialization and reduced network latency./v1/canvas/get/<id>endpoint used in the detail view.2. Multimodal Detection Refinement (Frontend)
model_type === LlmModelType.Image2textwithtags?.includes('IMAGE2TEXT').model_typedefines the primary role of a model (e.g.,chat). However, many advanced Chat models are also vision-capable. Sincemodel_typeis a single-value field, it cannot represent these multiple capabilities.tagsfield (which supports multiple attributes) to check forIMAGE2TEXTensures that models likegpt-5.2-procorrectly display multimodal input options.Type of Change
Main Changes
api/db/services/canvas_service.py: Optimized DB query by excluding heavy DSL fields.web/src/pages/agent/form/agent-form/index.tsx: Enhanced capability detection using the tags system.Verification
chatmodels with theIMAGE2TEXTtag now correctly enable the multimodal input UI.