Skip to content

Conversation

@edwinjosechittilappilly
Copy link
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly commented Nov 20, 2025

Added IBM watsonx.ai as a supported provider in EmbeddingModelComponent, updated dependencies and code to integrate ibm_watsonx_ai and pydantic. Updated starter project and component index metadata to reflect new dependencies and code changes.

Summary by CodeRabbit

  • New Features
    • Added IBM Watsonx.ai as an embedding provider with automatic model discovery
    • Expanded available embedding models including sentence-transformers/all-minilm-l12-v2, IBM Slate models, and multilingual-e5-large
    • Added new configuration options for token truncation and input text handling

✏️ Tip: You can customize this high-level summary in your review settings.

Added IBM watsonx.ai as a supported provider in EmbeddingModelComponent, updated dependencies and code to integrate ibm_watsonx_ai and pydantic. Updated starter project and component index metadata to reflect new dependencies and code changes.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

IBM watsonx.ai integration added to EmbeddingModelComponent via a new fetch_ibm_models() helper for dynamic model discovery. Extended inputs include truncate_input_tokens and input_text. Updated watsonx model constants list. Provider-specific embedding paths and field visibility management implemented for IBM, Ollama, and OpenAI.

Changes

Cohort / File(s) Summary
WatsonX Model Constants
src/lfx/src/lfx/base/models/watsonx_constants.py
Replaced WATSONX_EMBEDDING_MODELS_DETAILED with WATSONX_DEFAULT_EMBEDDING_MODELS containing updated model metadata; removed Granite models, added sentence-transformers/all-minilm-l12-v2, ibm/slate-125m-english-rtrvr-v2, ibm/slate-30m-english-rtrvr-v2, intfloat/multilingual-e5-large; updated WATSONX_EMBEDDING_MODEL_NAMES to reference new constant.
Embedding Model Component
src/lfx/src/lfx/components/models_and_agents/embedding_model.py
Added static method fetch_ibm_models(base_url) for dynamic IBM model discovery; introduced truncate_input_tokens and input_text inputs; extended build_embeddings to support IBM watsonx.ai client instantiation; enhanced update_build_config with provider-specific field visibility and model option refreshing for IBM and Ollama providers.
Starter Project Configuration
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
Extended EmbeddingModelComponent metadata dependencies to include requests and ibm_watsonx_ai; added helper for IBM model fetching; expanded inputs to include truncate_input_tokens, input_text, and provider-specific fields (base_url_ibm_watsonx, project_id); updated code_hash and build logic.

Sequence Diagram

sequenceDiagram
    participant User
    participant EmbeddingModelComponent
    participant update_build_config
    participant fetch_ibm_models
    participant IBMWatsonX
    participant EmbeddingAPI

    User->>EmbeddingModelComponent: Select IBM watsonx.ai provider
    activate EmbeddingModelComponent
    EmbeddingModelComponent->>update_build_config: Trigger provider change
    activate update_build_config
    
    update_build_config->>fetch_ibm_models: Fetch available models
    activate fetch_ibm_models
    fetch_ibm_models->>IBMWatsonX: Query /ml/v1/foundation_model_specs
    IBMWatsonX-->>fetch_ibm_models: Return model_ids
    fetch_ibm_models-->>update_build_config: Return sorted models
    deactivate fetch_ibm_models
    
    update_build_config->>update_build_config: Set model options & visibility<br/>(truncate_input_tokens, input_text)
    update_build_config-->>EmbeddingModelComponent: Update component state
    deactivate update_build_config
    
    User->>EmbeddingModelComponent: Update base_url_ibm_watsonx
    EmbeddingModelComponent->>update_build_config: Refresh models for new URL
    activate update_build_config
    update_build_config->>fetch_ibm_models: Fetch models from new URL
    fetch_ibm_models->>IBMWatsonX: Query with new base_url
    IBMWatsonX-->>fetch_ibm_models: Return updated models
    fetch_ibm_models-->>update_build_config: Return sorted models
    update_build_config-->>EmbeddingModelComponent: Update model options
    deactivate update_build_config
    
    User->>EmbeddingModelComponent: Build embeddings
    activate EmbeddingModelComponent
    EmbeddingModelComponent->>EmbeddingAPI: Create watsonx client & call embed
    EmbeddingAPI-->>EmbeddingModelComponent: Return embeddings
    EmbeddingModelComponent-->>User: Embeddings result
    deactivate EmbeddingModelComponent
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • New fetch_ibm_models() static method logic and API error handling for IBM watsonx.ai model discovery
  • Provider-specific branching logic in update_build_config() for visibility toggling and model option refreshing across IBM, Ollama, and OpenAI paths
  • Watsonx client instantiation with Credentials and APIClient in build_embeddings() method
  • Dynamic model population and default value assignment when switching providers
  • Integration of new inputs (truncate_input_tokens, input_text) across multiple provider paths and their corresponding parameter mapping in embedding API calls

Possibly related PRs

Suggested labels

size:L, lgtm

Suggested reviewers

  • lucaseduoli
  • erichare

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)
Check name Status Explanation Resolution
Test Coverage For New Implementations ❌ Error PR introduces IBM watsonx.ai integration with fetch_ibm_models() method and dynamic provider handling, but no test files were added to verify this new functionality. Add comprehensive test coverage including unit tests for fetch_ibm_models(), update_build_config() logic, edge cases like empty model lists, and integration tests with mocked API calls.
Test Quality And Coverage ⚠️ Warning PR introduces IBM watsonx.ai support to EmbeddingModelComponent with new fetch_ibm_models() method and input fields, but existing test file lacks test coverage for these new features. Add comprehensive pytest tests for fetch_ibm_models(), IndexError prevention, authentication handling, update_build_config() behavior, and dynamic model refresh following patterns from test_language_model_component.py.
Test File Naming And Structure ⚠️ Warning Pull request adds IBM watsonx.ai support without corresponding test files, leaving critical issues like unauthenticated API calls uncovered. Add comprehensive pytest test files covering fetch_ibm_models() method, authentication failures, input fields, provider switching, and edge cases like empty model lists.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix: Add IBM watsonx.ai support to EmbeddingModel' accurately reflects the main change: adding IBM watsonx.ai as a supported provider in the EmbeddingModelComponent.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Excessive Mock Usage Warning ✅ Passed Test file appropriately mocks external dependencies while expanding coverage for watsonx integration with proper error handling and validation tests.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the bug Something isn't working label Nov 20, 2025
@edwinjosechittilappilly edwinjosechittilappilly marked this pull request as ready for review November 21, 2025 17:30
@github-actions github-actions bot removed the bug Something isn't working label Nov 21, 2025
@github-actions github-actions bot added the bug Something isn't working label Nov 21, 2025
@github-actions github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 21, 2025
@github-actions github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 21, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds IBM watsonx.ai support to the EmbeddingModel component, enabling users to generate embeddings using IBM's watsonx.ai foundation models. The implementation includes dynamic model fetching from the watsonx.ai API, configuration of watsonx-specific parameters (truncate_input_tokens and input_text), and proper credential management using the IBM watsonx.ai SDK.

Key changes:

  • Integrated IBM watsonx.ai as a new embedding provider alongside OpenAI and Ollama
  • Added dynamic model discovery via watsonx.ai API to fetch available embedding models
  • Implemented watsonx-specific embedding parameters for token truncation and text return options

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 9 comments.

File Description
src/lfx/src/lfx/components/models_and_agents/embedding_model.py Added watsonx.ai provider support with API client initialization, dynamic model fetching, and configuration UI updates for watsonx-specific parameters
src/lfx/src/lfx/base/models/watsonx_constants.py Updated default embedding models list to include newer models (slate and multilingual-e5) and renamed constant for clarity

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

from lfx.log.logger import logger
from lfx.schema.dotdict import dotdict
from lfx.utils.util import transform_localhost_url
import requests
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import statement import requests should follow PEP 8 convention and be placed at the top of the file with other imports. Currently it's placed after the local imports, which is inconsistent with the import ordering convention used in the codebase. It should be placed before the from imports from third-party libraries.

Copilot uses AI. Check for mistakes.
"version": "2024-09-16",
"filters": "function_embedding,!lifecycle_withdrawn:and",
}
response = requests.get(endpoint, params=params, timeout=10)
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API request to fetch IBM models is made without authentication. This endpoint likely requires authentication but the request doesn't include any API key or credentials. Consider adding authentication headers or verifying if this endpoint is intended to be publicly accessible.

Copilot uses AI. Check for mistakes.
Comment on lines +298 to +299
build_config["model"]["options"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
build_config["model"]["value"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fetch_ibm_models method is called twice with the same base_url parameter on consecutive lines. This results in duplicate API requests. Consider storing the result in a variable and reusing it:

ibm_models = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
build_config["model"]["options"] = ibm_models
build_config["model"]["value"] = ibm_models[0]

Copilot uses AI. Check for mistakes.
Comment on lines +310 to +311
build_config["model"]["options"] = self.fetch_ibm_models(base_url=field_value)
build_config["model"]["value"] = self.fetch_ibm_models(base_url=field_value)[0]
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fetch_ibm_models method is called twice with the same field_value parameter on consecutive lines. This results in duplicate API requests. Consider storing the result in a variable and reusing it:

ibm_models = self.fetch_ibm_models(base_url=field_value)
build_config["model"]["options"] = ibm_models
build_config["model"]["value"] = ibm_models[0]

Copilot uses AI. Check for mistakes.
build_config["model"]["options"] = WATSONX_EMBEDDING_MODEL_NAMES
build_config["model"]["value"] = WATSONX_EMBEDDING_MODEL_NAMES[0]
build_config["model"]["options"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
build_config["model"]["value"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential IndexError if fetch_ibm_models returns an empty list. The code accesses [0] without checking if the list is non-empty. Consider adding a check or providing a fallback value:

ibm_models = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
build_config["model"]["options"] = ibm_models
build_config["model"]["value"] = ibm_models[0] if ibm_models else WATSONX_EMBEDDING_MODEL_NAMES[0]

Copilot uses AI. Check for mistakes.
build_config["input_text"]["show"] = True
elif field_name == "base_url_ibm_watsonx":
build_config["model"]["options"] = self.fetch_ibm_models(base_url=field_value)
build_config["model"]["value"] = self.fetch_ibm_models(base_url=field_value)[0]
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential IndexError if fetch_ibm_models returns an empty list. The code accesses [0] without checking if the list is non-empty. Consider adding a check or providing a fallback value:

ibm_models = self.fetch_ibm_models(base_url=field_value)
build_config["model"]["options"] = ibm_models
build_config["model"]["value"] = ibm_models[0] if ibm_models else WATSONX_EMBEDDING_MODEL_NAMES[0]

Copilot uses AI. Check for mistakes.
Comment on lines 26 to 28



Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary blank lines. There are three consecutive blank lines here, which violates PEP 8 style guide that recommends at most two blank lines between top-level definitions. Remove the extra blank lines.

Copilot uses AI. Check for mistakes.
endpoint = f"{base_url}/ml/v1/foundation_model_specs"
params = {
"version": "2024-09-16",
"filters": "function_embedding,!lifecycle_withdrawn:and",
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filter syntax "function_embedding,!lifecycle_withdrawn:and" has an unusual :and suffix at the end. Comparing with the similar implementation in language_model.py (line 57), which uses "function_text_chat,!lifecycle_withdrawn" without the :and suffix, this appears to be inconsistent. Consider removing the :and suffix or verifying the correct filter syntax with the IBM watsonx.ai API documentation.

Copilot uses AI. Check for mistakes.
from lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url
from lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES
from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS, WATSONX_EMBEDDING_MODEL_NAMES
from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS, WATSONX_DEFAULT_EMBEDDING_MODELS, WATSONX_EMBEDDING_MODEL_NAMES
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'WATSONX_DEFAULT_EMBEDDING_MODELS' is not used.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

github-actions bot commented Nov 21, 2025

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 15%
14.68% (3985/27139) 7.56% (1560/20608) 8.94% (535/5984)

Unit Test Results

Tests Skipped Failures Errors Time
1630 0 💤 0 ❌ 0 🔥 18.798s ⏱️

@github-actions github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 21, 2025
@github-actions github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 21, 2025
@github-actions github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 21, 2025
@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.63%. Comparing base (2bddc04) to head (3f93924).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/lfx/src/lfx/base/models/watsonx_constants.py 0.00% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (40.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (38.95%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main   #10677   +/-   ##
=======================================
  Coverage   31.63%   31.63%           
=======================================
  Files        1350     1350           
  Lines       61154    61154           
  Branches     9142     9142           
=======================================
  Hits        19348    19348           
  Misses      40890    40890           
  Partials      916      916           
Flag Coverage Δ
backend 51.82% <ø> (ø)
frontend 13.59% <ø> (ø)
lfx 38.95% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/lfx/src/lfx/base/models/watsonx_constants.py 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions bot added the lgtm This PR has been approved by a maintainer label Nov 21, 2025
@edwinjosechittilappilly
Copy link
Collaborator Author

Fixing Tests.

@github-actions github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 21, 2025
@github-actions github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 21, 2025
@github-actions github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 21, 2025
@github-actions github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 21, 2025
@github-actions github-actions bot added bug Something isn't working and removed bug Something isn't working labels Nov 21, 2025
@edwinjosechittilappilly edwinjosechittilappilly added this pull request to the merge queue Nov 21, 2025
Merged via the queue into main with commit 63ab3ac Nov 21, 2025
81 of 83 checks passed
@edwinjosechittilappilly edwinjosechittilappilly deleted the fix-emebddings-component branch November 21, 2025 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants