-
Notifications
You must be signed in to change notification settings - Fork 8.2k
fix: Add IBM watsonx.ai support to EmbeddingModel #10677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added IBM watsonx.ai as a supported provider in EmbeddingModelComponent, updated dependencies and code to integrate ibm_watsonx_ai and pydantic. Updated starter project and component index metadata to reflect new dependencies and code changes.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughIBM watsonx.ai integration added to EmbeddingModelComponent via a new fetch_ibm_models() helper for dynamic model discovery. Extended inputs include truncate_input_tokens and input_text. Updated watsonx model constants list. Provider-specific embedding paths and field visibility management implemented for IBM, Ollama, and OpenAI. Changes
Sequence DiagramsequenceDiagram
participant User
participant EmbeddingModelComponent
participant update_build_config
participant fetch_ibm_models
participant IBMWatsonX
participant EmbeddingAPI
User->>EmbeddingModelComponent: Select IBM watsonx.ai provider
activate EmbeddingModelComponent
EmbeddingModelComponent->>update_build_config: Trigger provider change
activate update_build_config
update_build_config->>fetch_ibm_models: Fetch available models
activate fetch_ibm_models
fetch_ibm_models->>IBMWatsonX: Query /ml/v1/foundation_model_specs
IBMWatsonX-->>fetch_ibm_models: Return model_ids
fetch_ibm_models-->>update_build_config: Return sorted models
deactivate fetch_ibm_models
update_build_config->>update_build_config: Set model options & visibility<br/>(truncate_input_tokens, input_text)
update_build_config-->>EmbeddingModelComponent: Update component state
deactivate update_build_config
User->>EmbeddingModelComponent: Update base_url_ibm_watsonx
EmbeddingModelComponent->>update_build_config: Refresh models for new URL
activate update_build_config
update_build_config->>fetch_ibm_models: Fetch models from new URL
fetch_ibm_models->>IBMWatsonX: Query with new base_url
IBMWatsonX-->>fetch_ibm_models: Return updated models
fetch_ibm_models-->>update_build_config: Return sorted models
update_build_config-->>EmbeddingModelComponent: Update model options
deactivate update_build_config
User->>EmbeddingModelComponent: Build embeddings
activate EmbeddingModelComponent
EmbeddingModelComponent->>EmbeddingAPI: Create watsonx client & call embed
EmbeddingAPI-->>EmbeddingModelComponent: Return embeddings
EmbeddingModelComponent-->>User: Embeddings result
deactivate EmbeddingModelComponent
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas requiring extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touchesImportant Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 2 warnings)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds IBM watsonx.ai support to the EmbeddingModel component, enabling users to generate embeddings using IBM's watsonx.ai foundation models. The implementation includes dynamic model fetching from the watsonx.ai API, configuration of watsonx-specific parameters (truncate_input_tokens and input_text), and proper credential management using the IBM watsonx.ai SDK.
Key changes:
- Integrated IBM watsonx.ai as a new embedding provider alongside OpenAI and Ollama
- Added dynamic model discovery via watsonx.ai API to fetch available embedding models
- Implemented watsonx-specific embedding parameters for token truncation and text return options
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| src/lfx/src/lfx/components/models_and_agents/embedding_model.py | Added watsonx.ai provider support with API client initialization, dynamic model fetching, and configuration UI updates for watsonx-specific parameters |
| src/lfx/src/lfx/base/models/watsonx_constants.py | Updated default embedding models list to include newer models (slate and multilingual-e5) and renamed constant for clarity |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from lfx.log.logger import logger | ||
| from lfx.schema.dotdict import dotdict | ||
| from lfx.utils.util import transform_localhost_url | ||
| import requests |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import statement import requests should follow PEP 8 convention and be placed at the top of the file with other imports. Currently it's placed after the local imports, which is inconsistent with the import ordering convention used in the codebase. It should be placed before the from imports from third-party libraries.
| "version": "2024-09-16", | ||
| "filters": "function_embedding,!lifecycle_withdrawn:and", | ||
| } | ||
| response = requests.get(endpoint, params=params, timeout=10) |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API request to fetch IBM models is made without authentication. This endpoint likely requires authentication but the request doesn't include any API key or credentials. Consider adding authentication headers or verifying if this endpoint is intended to be publicly accessible.
| build_config["model"]["options"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx) | ||
| build_config["model"]["value"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0] |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fetch_ibm_models method is called twice with the same base_url parameter on consecutive lines. This results in duplicate API requests. Consider storing the result in a variable and reusing it:
ibm_models = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
build_config["model"]["options"] = ibm_models
build_config["model"]["value"] = ibm_models[0]| build_config["model"]["options"] = self.fetch_ibm_models(base_url=field_value) | ||
| build_config["model"]["value"] = self.fetch_ibm_models(base_url=field_value)[0] |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fetch_ibm_models method is called twice with the same field_value parameter on consecutive lines. This results in duplicate API requests. Consider storing the result in a variable and reusing it:
ibm_models = self.fetch_ibm_models(base_url=field_value)
build_config["model"]["options"] = ibm_models
build_config["model"]["value"] = ibm_models[0]| build_config["model"]["options"] = WATSONX_EMBEDDING_MODEL_NAMES | ||
| build_config["model"]["value"] = WATSONX_EMBEDDING_MODEL_NAMES[0] | ||
| build_config["model"]["options"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx) | ||
| build_config["model"]["value"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0] |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential IndexError if fetch_ibm_models returns an empty list. The code accesses [0] without checking if the list is non-empty. Consider adding a check or providing a fallback value:
ibm_models = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
build_config["model"]["options"] = ibm_models
build_config["model"]["value"] = ibm_models[0] if ibm_models else WATSONX_EMBEDDING_MODEL_NAMES[0]| build_config["input_text"]["show"] = True | ||
| elif field_name == "base_url_ibm_watsonx": | ||
| build_config["model"]["options"] = self.fetch_ibm_models(base_url=field_value) | ||
| build_config["model"]["value"] = self.fetch_ibm_models(base_url=field_value)[0] |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential IndexError if fetch_ibm_models returns an empty list. The code accesses [0] without checking if the list is non-empty. Consider adding a check or providing a fallback value:
ibm_models = self.fetch_ibm_models(base_url=field_value)
build_config["model"]["options"] = ibm_models
build_config["model"]["value"] = ibm_models[0] if ibm_models else WATSONX_EMBEDDING_MODEL_NAMES[0]|
|
||
|
|
||
|
|
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary blank lines. There are three consecutive blank lines here, which violates PEP 8 style guide that recommends at most two blank lines between top-level definitions. Remove the extra blank lines.
| endpoint = f"{base_url}/ml/v1/foundation_model_specs" | ||
| params = { | ||
| "version": "2024-09-16", | ||
| "filters": "function_embedding,!lifecycle_withdrawn:and", |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filter syntax "function_embedding,!lifecycle_withdrawn:and" has an unusual :and suffix at the end. Comparing with the similar implementation in language_model.py (line 57), which uses "function_text_chat,!lifecycle_withdrawn" without the :and suffix, this appears to be inconsistent. Consider removing the :and suffix or verifying the correct filter syntax with the IBM watsonx.ai API documentation.
| from lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url | ||
| from lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES | ||
| from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS, WATSONX_EMBEDDING_MODEL_NAMES | ||
| from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS, WATSONX_DEFAULT_EMBEDDING_MODELS, WATSONX_EMBEDDING_MODEL_NAMES |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'WATSONX_DEFAULT_EMBEDDING_MODELS' is not used.
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (40.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #10677 +/- ##
=======================================
Coverage 31.63% 31.63%
=======================================
Files 1350 1350
Lines 61154 61154
Branches 9142 9142
=======================================
Hits 19348 19348
Misses 40890 40890
Partials 916 916
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Fixing Tests. |
Added IBM watsonx.ai as a supported provider in EmbeddingModelComponent, updated dependencies and code to integrate ibm_watsonx_ai and pydantic. Updated starter project and component index metadata to reflect new dependencies and code changes.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.