fix: Add IBM watsonx.ai support to EmbeddingModel #10677

edwinjosechittilappilly · 2025-11-20T21:54:21Z

Added IBM watsonx.ai as a supported provider in EmbeddingModelComponent, updated dependencies and code to integrate ibm_watsonx_ai and pydantic. Updated starter project and component index metadata to reflect new dependencies and code changes.

Summary by CodeRabbit

New Features
- Added IBM Watsonx.ai as an embedding provider with automatic model discovery
- Expanded available embedding models including sentence-transformers/all-minilm-l12-v2, IBM Slate models, and multilingual-e5-large
- Added new configuration options for token truncation and input text handling

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Added IBM watsonx.ai as a supported provider in EmbeddingModelComponent, updated dependencies and code to integrate ibm_watsonx_ai and pydantic. Updated starter project and component index metadata to reflect new dependencies and code changes.

coderabbitai · 2025-11-20T21:54:26Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

IBM watsonx.ai integration added to EmbeddingModelComponent via a new fetch_ibm_models() helper for dynamic model discovery. Extended inputs include truncate_input_tokens and input_text. Updated watsonx model constants list. Provider-specific embedding paths and field visibility management implemented for IBM, Ollama, and OpenAI.

Changes

Cohort / File(s)	Summary
WatsonX Model Constants `src/lfx/src/lfx/base/models/watsonx_constants.py`	Replaced WATSONX_EMBEDDING_MODELS_DETAILED with WATSONX_DEFAULT_EMBEDDING_MODELS containing updated model metadata; removed Granite models, added sentence-transformers/all-minilm-l12-v2, ibm/slate-125m-english-rtrvr-v2, ibm/slate-30m-english-rtrvr-v2, intfloat/multilingual-e5-large; updated WATSONX_EMBEDDING_MODEL_NAMES to reference new constant.
Embedding Model Component `src/lfx/src/lfx/components/models_and_agents/embedding_model.py`	Added static method fetch_ibm_models(base_url) for dynamic IBM model discovery; introduced truncate_input_tokens and input_text inputs; extended build_embeddings to support IBM watsonx.ai client instantiation; enhanced update_build_config with provider-specific field visibility and model option refreshing for IBM and Ollama providers.
Starter Project Configuration `src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json`	Extended EmbeddingModelComponent metadata dependencies to include requests and ibm_watsonx_ai; added helper for IBM model fetching; expanded inputs to include truncate_input_tokens, input_text, and provider-specific fields (base_url_ibm_watsonx, project_id); updated code_hash and build logic.

Sequence Diagram

sequenceDiagram
    participant User
    participant EmbeddingModelComponent
    participant update_build_config
    participant fetch_ibm_models
    participant IBMWatsonX
    participant EmbeddingAPI

    User->>EmbeddingModelComponent: Select IBM watsonx.ai provider
    activate EmbeddingModelComponent
    EmbeddingModelComponent->>update_build_config: Trigger provider change
    activate update_build_config
    
    update_build_config->>fetch_ibm_models: Fetch available models
    activate fetch_ibm_models
    fetch_ibm_models->>IBMWatsonX: Query /ml/v1/foundation_model_specs
    IBMWatsonX-->>fetch_ibm_models: Return model_ids
    fetch_ibm_models-->>update_build_config: Return sorted models
    deactivate fetch_ibm_models
    
    update_build_config->>update_build_config: Set model options & visibility<br/>(truncate_input_tokens, input_text)
    update_build_config-->>EmbeddingModelComponent: Update component state
    deactivate update_build_config
    
    User->>EmbeddingModelComponent: Update base_url_ibm_watsonx
    EmbeddingModelComponent->>update_build_config: Refresh models for new URL
    activate update_build_config
    update_build_config->>fetch_ibm_models: Fetch models from new URL
    fetch_ibm_models->>IBMWatsonX: Query with new base_url
    IBMWatsonX-->>fetch_ibm_models: Return updated models
    fetch_ibm_models-->>update_build_config: Return sorted models
    update_build_config-->>EmbeddingModelComponent: Update model options
    deactivate update_build_config
    
    User->>EmbeddingModelComponent: Build embeddings
    activate EmbeddingModelComponent
    EmbeddingModelComponent->>EmbeddingAPI: Create watsonx client & call embed
    EmbeddingAPI-->>EmbeddingModelComponent: Return embeddings
    EmbeddingModelComponent-->>User: Embeddings result
    deactivate EmbeddingModelComponent

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

New fetch_ibm_models() static method logic and API error handling for IBM watsonx.ai model discovery
Provider-specific branching logic in update_build_config() for visibility toggling and model option refreshing across IBM, Ollama, and OpenAI paths
Watsonx client instantiation with Credentials and APIClient in build_embeddings() method
Dynamic model population and default value assignment when switching providers
Integration of new inputs (truncate_input_tokens, input_text) across multiple provider paths and their corresponding parameter mapping in embedding API calls

Possibly related PRs

feat: Ollama and WatsonX embedding model support #10356: Modifies EmbeddingModelComponent and watsonx constants to add IBM watsonx.ai and Ollama provider support with provider-specific fields and model-discovery logic
fix: changed embedding model to have api base and watsonx api endpoint #10524: Expands IBM watsonx.ai support in EmbeddingModelComponent and watsonx constants by adding watsonx API endpoint inputs and dynamic IBM model fetching
feat: Add IBM watsonx.ai and Ollama to LanguageModelComponent #10471: Adds IBM watsonx.ai integration for LanguageModelComponent with a similar static fetch method and provider-specific inputs pattern

Suggested labels

size:L, lgtm

Suggested reviewers

lucaseduoli
erichare

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	PR introduces IBM watsonx.ai integration with fetch_ibm_models() method and dynamic provider handling, but no test files were added to verify this new functionality.	Add comprehensive test coverage including unit tests for fetch_ibm_models(), update_build_config() logic, edge cases like empty model lists, and integration tests with mocked API calls.
Test Quality And Coverage	⚠️ Warning	PR introduces IBM watsonx.ai support to EmbeddingModelComponent with new fetch_ibm_models() method and input fields, but existing test file lacks test coverage for these new features.	Add comprehensive pytest tests for fetch_ibm_models(), IndexError prevention, authentication handling, update_build_config() behavior, and dynamic model refresh following patterns from test_language_model_component.py.
Test File Naming And Structure	⚠️ Warning	Pull request adds IBM watsonx.ai support without corresponding test files, leaving critical issues like unauthenticated API calls uncovered.	Add comprehensive pytest test files covering fetch_ibm_models() method, authentication failures, input fields, provider switching, and edge cases like empty model lists.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix: Add IBM watsonx.ai support to EmbeddingModel' accurately reflects the main change: adding IBM watsonx.ai as a supported provider in the EmbeddingModelComponent.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Excessive Mock Usage Warning	✅ Passed	Test file appropriately mocks external dependencies while expanding coverage for watsonx integration with proper error handling and validation tests.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adds IBM watsonx.ai support to the EmbeddingModel component, enabling users to generate embeddings using IBM's watsonx.ai foundation models. The implementation includes dynamic model fetching from the watsonx.ai API, configuration of watsonx-specific parameters (truncate_input_tokens and input_text), and proper credential management using the IBM watsonx.ai SDK.

Key changes:

Integrated IBM watsonx.ai as a new embedding provider alongside OpenAI and Ollama
Added dynamic model discovery via watsonx.ai API to fetch available embedding models
Implemented watsonx-specific embedding parameters for token truncation and text return options

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 9 comments.

File	Description
src/lfx/src/lfx/components/models_and_agents/embedding_model.py	Added watsonx.ai provider support with API client initialization, dynamic model fetching, and configuration UI updates for watsonx-specific parameters
src/lfx/src/lfx/base/models/watsonx_constants.py	Updated default embedding models list to include newer models (slate and multilingual-e5) and renamed constant for clarity

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-21T17:33:23Z

src/lfx/src/lfx/components/models_and_agents/embedding_model.py

 from lfx.log.logger import logger
 from lfx.schema.dotdict import dotdict
 from lfx.utils.util import transform_localhost_url
+import requests


Import statement import requests should follow PEP 8 convention and be placed at the top of the file with other imports. Currently it's placed after the local imports, which is inconsistent with the import ordering convention used in the codebase. It should be placed before the from imports from third-party libraries.

Copilot · 2025-11-21T17:33:23Z

src/lfx/src/lfx/components/models_and_agents/embedding_model.py

+                "version": "2024-09-16",
+                "filters": "function_embedding,!lifecycle_withdrawn:and",
+            }
+            response = requests.get(endpoint, params=params, timeout=10)


The API request to fetch IBM models is made without authentication. This endpoint likely requires authentication but the request doesn't include any API key or credentials. Consider adding authentication headers or verifying if this endpoint is intended to be publicly accessible.

Copilot · 2025-11-21T17:33:24Z

src/lfx/src/lfx/components/models_and_agents/embedding_model.py

+                build_config["model"]["options"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
+                build_config["model"]["value"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]


The fetch_ibm_models method is called twice with the same base_url parameter on consecutive lines. This results in duplicate API requests. Consider storing the result in a variable and reusing it:

ibm_models = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx) build_config["model"]["options"] = ibm_models build_config["model"]["value"] = ibm_models[0]

Copilot · 2025-11-21T17:33:24Z

src/lfx/src/lfx/components/models_and_agents/embedding_model.py

+            build_config["model"]["options"] = self.fetch_ibm_models(base_url=field_value)
+            build_config["model"]["value"] = self.fetch_ibm_models(base_url=field_value)[0]


The fetch_ibm_models method is called twice with the same field_value parameter on consecutive lines. This results in duplicate API requests. Consider storing the result in a variable and reusing it:

ibm_models = self.fetch_ibm_models(base_url=field_value) build_config["model"]["options"] = ibm_models build_config["model"]["value"] = ibm_models[0]

Copilot · 2025-11-21T17:33:24Z

src/lfx/src/lfx/components/models_and_agents/embedding_model.py

-                build_config["model"]["options"] = WATSONX_EMBEDDING_MODEL_NAMES
-                build_config["model"]["value"] = WATSONX_EMBEDDING_MODEL_NAMES[0]
+                build_config["model"]["options"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
+                build_config["model"]["value"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]


Potential IndexError if fetch_ibm_models returns an empty list. The code accesses [0] without checking if the list is non-empty. Consider adding a check or providing a fallback value:

ibm_models = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx) build_config["model"]["options"] = ibm_models build_config["model"]["value"] = ibm_models[0] if ibm_models else WATSONX_EMBEDDING_MODEL_NAMES[0]

Copilot · 2025-11-21T17:33:25Z

src/lfx/src/lfx/components/models_and_agents/embedding_model.py

+                build_config["input_text"]["show"] = True
+        elif field_name == "base_url_ibm_watsonx":
+            build_config["model"]["options"] = self.fetch_ibm_models(base_url=field_value)
+            build_config["model"]["value"] = self.fetch_ibm_models(base_url=field_value)[0]


Potential IndexError if fetch_ibm_models returns an empty list. The code accesses [0] without checking if the list is non-empty. Consider adding a check or providing a fallback value:

ibm_models = self.fetch_ibm_models(base_url=field_value) build_config["model"]["options"] = ibm_models build_config["model"]["value"] = ibm_models[0] if ibm_models else WATSONX_EMBEDDING_MODEL_NAMES[0]

Copilot · 2025-11-21T17:33:25Z

src/lfx/src/lfx/base/models/watsonx_constants.py

+
+
+


Unnecessary blank lines. There are three consecutive blank lines here, which violates PEP 8 style guide that recommends at most two blank lines between top-level definitions. Remove the extra blank lines.

Copilot · 2025-11-21T17:33:25Z

src/lfx/src/lfx/components/models_and_agents/embedding_model.py

+            endpoint = f"{base_url}/ml/v1/foundation_model_specs"
+            params = {
+                "version": "2024-09-16",
+                "filters": "function_embedding,!lifecycle_withdrawn:and",


The filter syntax "function_embedding,!lifecycle_withdrawn:and" has an unusual :and suffix at the end. Comparing with the similar implementation in language_model.py (line 57), which uses "function_text_chat,!lifecycle_withdrawn" without the :and suffix, this appears to be inconsistent. Consider removing the :and suffix or verifying the correct filter syntax with the IBM watsonx.ai API documentation.

Copilot · 2025-11-21T17:33:26Z

src/lfx/src/lfx/components/models_and_agents/embedding_model.py

 from lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url
 from lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES
-from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS, WATSONX_EMBEDDING_MODEL_NAMES
+from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS, WATSONX_DEFAULT_EMBEDDING_MODELS, WATSONX_EMBEDDING_MODEL_NAMES


Import of 'WATSONX_DEFAULT_EMBEDDING_MODELS' is not used.

github-actions · 2025-11-21T17:34:34Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	14.68% (3985/27139)	7.56% (1560/20608)	8.94% (535/5984)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
1630	0 💤	0 ❌	0 🔥	18.798s ⏱️

codecov · 2025-11-21T17:42:45Z

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.63%. Comparing base (2bddc04) to head (3f93924).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/lfx/src/lfx/base/models/watsonx_constants.py	0.00%	2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (40.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (38.95%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #10677   +/-   ##
=======================================
  Coverage   31.63%   31.63%           
=======================================
  Files        1350     1350           
  Lines       61154    61154           
  Branches     9142     9142           
=======================================
  Hits        19348    19348           
  Misses      40890    40890           
  Partials      916      916

Flag	Coverage Δ
backend	`51.82% <ø> (ø)`
frontend	`13.59% <ø> (ø)`
lfx	`38.95% <0.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/lfx/src/lfx/base/models/watsonx_constants.py	`0.00% <0.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

edwinjosechittilappilly · 2025-11-21T18:54:53Z

Fixing Tests.

github-actions bot added the bug Something isn't working label Nov 20, 2025

update watsonx default models

d927cc6

edwinjosechittilappilly marked this pull request as ready for review November 21, 2025 17:30

edwinjosechittilappilly requested review from Copilot and erichare November 21, 2025 17:30

github-actions bot removed the bug Something isn't working label Nov 21, 2025

edwinjosechittilappilly requested a review from lucaseduoli November 21, 2025 17:30

github-actions bot added the bug Something isn't working label Nov 21, 2025

Merge branch 'main' into fix-emebddings-component

ce8aabd

Copilot started reviewing on behalf of edwinjosechittilappilly November 21, 2025 17:30 View session