Skip to content

Conversation

@edwinjosechittilappilly
Copy link
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly commented Nov 25, 2025

Introduces OpenSearchVectorStoreComponentMultimodalMultiEmbedding, supporting multi-model hybrid semantic and keyword search with dynamic vector fields, parallel embedding generation, advanced filtering, and flexible authentication. Enables ingestion and search across multiple embedding models in OpenSearch, with robust index management and UI configuration handling.


Key Features Added

  1. Multiple Embeddings Input
    The embedding input accepts multiple embedding objects via is_list=True
    Users can connect multiple embedding models from different providers (OpenAI, Watsonx, Cohere, etc.)
    Backward compatible: single embeddings still work seamlessly
  2. Selective Ingestion (Single Model)
    Ingestion uses ONE selected embedding model specified by user
    Selection via embedding_model_name field
    Falls back to first embedding if no model name specified
    Documents are stored in dynamic field: chunk_embedding_{model_name}
  3. Multi-Model Search
    Search queries across ALL embedding models found in the index
    Automatically detects available models via aggregation
    Generates query embeddings for each detected model
    Combines results using hybrid search (dis_max + keyword matching)

Summary by CodeRabbit

  • New Features
    • Added multi-model embedding support across OpenAI, Ollama, and IBM watsonx.ai providers, enabling per-model embeddings.
    • Introduced OpenSearch vector store integration featuring hybrid search, dynamic field management, and multi-embedding support.

✏️ Tip: You can customize this high-level summary in your review settings.

Introduces OpenSearchVectorStoreComponentMultimodalMultiEmbedding, supporting multi-model hybrid semantic and keyword search with dynamic vector fields, parallel embedding generation, advanced filtering, and flexible authentication. Enables ingestion and search across multiple embedding models in OpenSearch, with robust index management and UI configuration handling.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This pull request introduces multi-model embedding support by creating a new EmbeddingsWithModels wrapper class, refactoring EmbeddingModelComponent to return composite embeddings with per-model instances for OpenAI/Ollama/IBM watsonx.ai providers, and adding an OpenSearch vector store component with hybrid search supporting multiple embeddings simultaneously.

Changes

Cohort / File(s) Summary
Core embeddings wrapper
src/lfx/src/lfx/base/embeddings/embeddings_class.py
New EmbeddingsWithModels class extending LangChain Embeddings, storing primary embeddings and optional per-model mappings. Delegates embedding/async operations to primary instance, supports attribute forwarding and callable invocation.
Embedding model component refactoring
src/lfx/src/lfx/components/models_and_agents/embedding_model.py
Updated EmbeddingModelComponent.build_embeddings to async; now returns EmbeddingsWithModels with per-model embeddings for OpenAI, Ollama, and IBM watsonx.ai providers instead of single embeddings instance. Adds import for EmbeddingsWithModels.
OpenSearch multi-model vector store
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py
New OpenSearchVectorStoreComponentMultimodalMultiEmbedding class with hybrid (KNN + keyword) search across multiple embeddings, dynamic field naming, bulk ingestion with embedding tracking, AOSS compatibility checks, and JWT/basic auth support. Includes helper methods normalize_model_name and get_embedding_field_name.
Starter project configuration
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
Updated metadata code hash for EmbeddingModelComponent reflecting internal behavioral changes to return composite embeddings.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant EmbMod as EmbeddingModelComponent
    participant EmbWM as EmbeddingsWithModels
    participant Primary as Primary<br/>Embeddings
    participant PerModel as Per-Model<br/>Embeddings

    Client->>EmbMod: build_embeddings()
    activate EmbMod
    EmbMod->>EmbMod: Detect provider (OpenAI/Ollama/IBM)
    EmbMod->>Primary: Create primary embeddings instance
    EmbMod->>PerModel: Construct per-model instances<br/>(model_1, model_2, ...)
    EmbMod->>EmbWM: Create EmbeddingsWithModels<br/>(primary, {model_1, model_2, ...})
    deactivate EmbMod
    EmbMod-->>Client: Return EmbeddingsWithModels

    Note over Client,PerModel: Later usage:
    Client->>EmbWM: embed_documents(texts)
    activate EmbWM
    EmbWM->>Primary: Delegate to primary instance
    Primary-->>EmbWM: Return embeddings
    deactivate EmbWM
    EmbWM-->>Client: Return embeddings list
Loading
sequenceDiagram
    participant Client as Client
    participant OS as OpenSearchComponent
    participant EmbWM as EmbeddingsWithModels
    participant EmbN as Embedding N<br/>(Per-Model)
    participant OSClient as OpenSearch<br/>Client

    Client->>OS: search_documents(query_text, filters)
    activate OS
    OS->>OS: Detect available embedding models in index
    OS->>EmbWM: Generate embeddings for each model
    activate EmbWM
    loop For each model
        EmbWM->>EmbN: embed_query(query_text)
        EmbN-->>EmbWM: embedding_vector
    end
    deactivate EmbWM
    OS->>OS: Build per-model KNN queries
    OS->>OS: Build keyword query (multi_match)
    OS->>OSClient: Execute dis_max combination<br/>(KNN queries + keyword)
    OSClient-->>OS: Return ranked results
    OS->>OS: Convert results to Data objects
    deactivate OS
    OS-->>Client: Return search_documents results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • opensearch_multimodal.py: Extensive new component with complex hybrid search logic, dynamic field mapping, authentication handling, and error recovery paths requiring careful review of vector/metadata handling and AOSS compatibility checks.
  • embedding_model.py: Async refactoring combined with multi-provider support (OpenAI, Ollama, IBM watsonx.ai) and per-model instance construction logic needs verification for correctness across providers and URL/credential handling.
  • embeddings_class.py: Delegation pattern and attribute forwarding require review for potential issues with type safety, async delegation, and callable invocation edge cases.
  • Integration points: Cross-file dependencies between the new wrapper, updated component, and new OpenSearch integration need validation.

Possibly related PRs

Suggested labels

size:XXL, lgtm

Suggested reviewers

  • phact
  • erichare
  • lucaseduoli

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Test Quality And Coverage ⚠️ Warning Test coverage critically insufficient for major new implementations: EmbeddingsWithModels class (117 lines), OpenSearchVectorStoreComponentMultimodalMultiEmbedding (1,575 lines), and EmbeddingModelComponent async updates (423 lines) lack dedicated unit/integration tests. Create test_embeddings_class.py, test_opensearch_multimodal.py, and enhance EmbeddingModelComponent tests covering initialization, delegation, async patterns, error handling, and edge cases before merge.
Test File Naming And Structure ⚠️ Warning PR introduces two major new components without corresponding test files and converts build_embeddings to async without updating existing test calls to await it. Create test files for new components and update all build_embeddings calls to use await in existing test files.
Excessive Mock Usage Warning ❓ Inconclusive No test files testing the new components were found in the repository despite extensive searching, making assessment of mock usage patterns impossible. Verify if test files exist in a separate location or branch; if absent, prioritize adding unit and integration tests for the new complex components before assessing mock usage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and concisely describes the main change: introducing a new OpenSearch component for multimodal multi-embedding support. The title is specific, relevant, and directly reflects the primary objective.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the enhancement New feature or request label Nov 25, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
Introduces EmbeddingsWithModels class for wrapping embeddings and available models. Updates EmbeddingModelComponent to provide available model lists for OpenAI, Ollama, and IBM watsonx.ai providers, including synchronous Ollama model fetching using httpx. Updates starter project and component index metadata to reflect new dependencies and code changes.
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
Updated the EmbeddingModelComponent to fetch Ollama models asynchronously using await get_ollama_models instead of a synchronous httpx call. Removed httpx from dependencies in Nvidia Remix starter project and updated related metadata. This change improves consistency and reliability when fetching available models for the Ollama provider.
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
@edwinjosechittilappilly edwinjosechittilappilly marked this pull request as ready for review November 25, 2025 21:36
Added several Notion-related components to the component index, including AddContentToPage, NotionDatabaseProperties, NotionListPages, NotionPageContent, NotionPageCreator, NotionPageUpdate, and NotionSearch. These components enable interaction with Notion databases and pages, such as querying, updating, creating, and retrieving content.
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/lfx/src/lfx/components/models_and_agents/embedding_model.py (1)

372-373: Redundant API calls to fetch IBM models.

fetch_ibm_models is called twice with the same URL. Cache the result to avoid duplicate HTTP requests:

             elif field_value == "IBM watsonx.ai":
-                build_config["model"]["options"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
-                build_config["model"]["value"] = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)[0]
+                ibm_models = self.fetch_ibm_models(base_url=self.base_url_ibm_watsonx)
+                build_config["model"]["options"] = ibm_models
+                build_config["model"]["value"] = ibm_models[0] if ibm_models else ""

The same issue exists at lines 384-385.

src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (4)

2170-2210: Bug: ollama_base_url update ignores the new field_value.

In update_build_config, the branch for field_name == "ollama_base_url" assigns ollama_url = self.ollama_base_url, ignoring the freshly provided field_value. This can leave the model list stale until a second refresh.

Apply this diff:

-        elif field_name == "ollama_base_url":
-            # # Refresh Ollama models when base URL changes
-            # if hasattr(self, "provider") and self.provider == "Ollama":
-            # Use field_value if provided, otherwise fall back to instance attribute
-            ollama_url = self.ollama_base_url
+        elif field_name == "ollama_base_url":
+            # Use field_value if provided, otherwise fall back to instance attribute
+            ollama_url = field_value or getattr(self, "ollama_base_url", None)
             if await is_valid_ollama_url(url=ollama_url):
                 try:
                     models = await get_ollama_models(
                         base_url_value=ollama_url,

2339-2355: Default unsafe pickle loading should be False.

Allow Dangerous Deserialization is set to true by default, enabling pickle loading. This is a security risk and should be opt-in.

Apply this diff in the FAISS component template:

-                "display_name": "Allow Dangerous Deserialization",
+                "display_name": "Allow Dangerous Deserialization",
                 "dynamic": false,
                 "info": "Set to True to allow loading pickle files from untrusted sources. Only enable this if you trust the source of the data.",
@@
-                "type": "bool",
-                "value": true
+                "type": "bool",
+                "value": false

2095-2145: Guard OpenAI dimensions parameter per model to prevent 400 errors on text-embedding-ada-002.

The loop at src/lfx/src/lfx/components/models_and_agents/embedding_model.py:189-205 creates per-model OpenAIEmbeddings instances, passing dimensions to all models. However, text-embedding-ada-002 (included in OPENAI_EMBEDDING_MODEL_NAMES) does not accept the dimensions parameter and will error. Only text-embedding-3-small and text-embedding-3-large support it.

Apply this conditional guard:

+            DIMENSIONAL_MODELS = {"text-embedding-3-small", "text-embedding-3-large"}
             for model_name in OPENAI_EMBEDDING_MODEL_NAMES:
                 available_models_dict[model_name] = OpenAIEmbeddings(
                     model=model_name,
-                    dimensions=dimensions or None,  # Use same dimensions config for all
+                    dimensions=(dimensions if model_name in DIMENSIONAL_MODELS else None),
                     base_url=api_base or None,
                     api_key=api_key,
                     chunk_size=chunk_size,

2018-2252: Based on my comprehensive verification of the codebase, I can now provide the corrected review comment. Here are my key findings:

Key Verification Results:

  1. EmbeddingsWithModels IS compatible with FAISS - The class properly implements embed_documents(), embed_query(), async variants, __getattr__() delegation, and __call__() proxy. FAISS will work without modification.
  2. OpenAI dimensions issue IS valid - Only text-embedding-3-small and text-embedding-3-large support the dimensions parameter; text-embedding-ada-002 does not. Current code applies dimensions uniformly to all models.
  3. allow_dangerous_deserialization defaults to True - Confirmed security issue in FAISS component.
  4. Ollama field_value not used - Code line 390 uses self.ollama_base_url instead of the field_value parameter passed to the method.
  5. No httpx issue - Code uses requests.get with timeout=10, not httpx.

Guard OpenAI embedding models against unsupported dimension parameter.

The code applies the dimensions parameter uniformly to all OpenAI models, but text-embedding-ada-002 does not support this parameter and will raise an error. Only text-embedding-3-small and text-embedding-3-large support dimensions.

In the build_embeddings method's OpenAI provider block, guard dimensions per model:

 for model_name in OPENAI_EMBEDDING_MODEL_NAMES:
     available_models_dict[model_name] = OpenAIEmbeddings(
         model=model_name,
-        dimensions=dimensions or None,  # Use same dimensions config for all
+        dimensions=dimensions or None if model_name != "text-embedding-ada-002" else None,
         base_url=api_base or None,
         api_key=api_key,
         chunk_size=chunk_size,
         max_retries=max_retries,
         timeout=request_timeout or None,
         show_progress_bar=show_progress_bar,
         model_kwargs=model_kwargs,
     )

Set FAISS allow_dangerous_deserialization default to False.

The FAISS component currently defaults allow_dangerous_deserialization to True, which enables loading untrusted pickle files and poses a security risk. Change the default value in the component definition to False.

Fix Ollama URL refresh to use the field_value parameter.

In update_build_config, the ollama_base_url field handler ignores the field_value parameter and uses self.ollama_base_url instead. The comment indicates intent to use field_value. Update line 390 to use the passed parameter for consistency with base_url_ibm_watsonx handling:

 elif field_name == "ollama_base_url":
-    ollama_url = self.ollama_base_url
+    ollama_url = field_value or self.ollama_base_url
🧹 Nitpick comments (7)
src/lfx/src/lfx/components/models_and_agents/embedding_model.py (1)

239-255: URL inconsistency and missing error handling for model fetch.

  1. URL inconsistency: get_ollama_models is called with self.ollama_base_url (raw input) while embedding instances use final_base_url (transformed). Although get_ollama_models transforms internally, this could cause subtle issues if the transformation logic diverges.

  2. No fallback on failure: If get_ollama_models fails, the entire build_embeddings method fails. Consider falling back to an empty available_models dict or using the user-selected model as the only entry:

             # Fetch available Ollama models
-            available_model_names = await get_ollama_models(
-                base_url_value=self.ollama_base_url,
+            try:
+                available_model_names = await get_ollama_models(
+                    base_url_value=final_base_url,
-                desired_capability=DESIRED_CAPABILITY,
-                json_models_key=JSON_MODELS_KEY,
-                json_name_key=JSON_NAME_KEY,
-                json_capabilities_key=JSON_CAPABILITIES_KEY,
-            )
+                    desired_capability=DESIRED_CAPABILITY,
+                    json_models_key=JSON_MODELS_KEY,
+                    json_name_key=JSON_NAME_KEY,
+                    json_capabilities_key=JSON_CAPABILITIES_KEY,
+                )
+            except ValueError:
+                logger.warning("Failed to fetch Ollama models, using selected model only")
+                available_model_names = [model] if model else []
src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (2)

2065-2145: Avoid eager instantiation of N embedding clients; create on demand.

Creating an instance for every model on each build is wasteful and can slow UI updates. Prefer a lazy factory (dict of callables) or instantiate only when requested by the consumer.


1759-1810: Add timeouts and error handling to documentation fetcher.

RemixDocumentation._fetch_all_documentation uses httpx.get without a timeout and minimal error handling. Add a short timeout and catch network errors to avoid hanging the flow.

Apply this diff inside the component code block:

-        response = httpx.get(search_index_url, follow_redirects=True)
+        try:
+            response = httpx.get(search_index_url, follow_redirects=True, timeout=10.0)
+        except httpx.HTTPError as e:
+            raise ValueError(f"Failed to fetch search index: {e!s}") from e
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py (4)

52-53: Remove or downgrade noisy logging in helper function.

logger.info is called every time get_embedding_field_name is invoked, which happens frequently during search operations with multiple models. This will clutter logs in production.

 def get_embedding_field_name(model_name: str) -> str:
-    logger.info(f"chunk_embedding_{normalize_model_name(model_name)}")
+    # logger.debug(f"chunk_embedding_{normalize_model_name(model_name)}")
     return f"chunk_embedding_{normalize_model_name(model_name)}"

593-594: Consider handling bulk ingestion errors.

The helpers.bulk call doesn't have explicit error handling. If some documents fail to index, the method will still return all IDs as if successful. Consider using raise_on_error=True (default) and handling partial failures.

-        helpers.bulk(client, requests, max_chunk_bytes=max_chunk_bytes)
+        success, failed = helpers.bulk(
+            client, requests, max_chunk_bytes=max_chunk_bytes, stats_only=False
+        )
+        if failed:
+            logger.warning(f"Failed to index {len(failed)} documents: {failed[:3]}")
         return return_ids

646-646: Downgrade embedding debug log.

logger.warning is used for a debug log that shows the embedding object. This should be logger.debug or removed.

-        logger.warning(f"Embedding: {self.embedding}")
+        logger.debug(f"Embedding: {self.embedding}")

1034-1034: Fix mutable default argument type hint.

The parameter filter_clauses: list[dict] = None should use | None type hint for clarity.

-    def _detect_available_models(self, client: OpenSearch, filter_clauses: list[dict] = None) -> list[str]:
+    def _detect_available_models(self, client: OpenSearch, filter_clauses: list[dict] | None = None) -> list[str]:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3caacf4 and bfbeec3.

📒 Files selected for processing (4)
  • src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (2 hunks)
  • src/lfx/src/lfx/base/embeddings/embeddings_class.py (1 hunks)
  • src/lfx/src/lfx/components/elastic/opensearch_multimodal.py (1 hunks)
  • src/lfx/src/lfx/components/models_and_agents/embedding_model.py (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/lfx/src/lfx/components/models_and_agents/embedding_model.py (2)
src/lfx/src/lfx/base/embeddings/embeddings_class.py (1)
  • EmbeddingsWithModels (6-116)
src/lfx/src/lfx/base/models/model_utils.py (1)
  • get_ollama_models (39-108)
src/lfx/src/lfx/base/embeddings/embeddings_class.py (2)
src/lfx/src/lfx/field_typing/constants.py (1)
  • Embeddings (49-50)
src/lfx/src/lfx/base/tools/flow_tool.py (1)
  • args (32-34)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py (3)
src/lfx/src/lfx/inputs/inputs.py (4)
  • BoolInput (419-432)
  • HandleInput (75-86)
  • IntInput (347-380)
  • StrInput (127-183)
src/lfx/src/lfx/schema/data.py (1)
  • Data (26-288)
src/lfx/src/lfx/base/embeddings/embeddings_class.py (2)
  • embed_documents (36-45)
  • embed_query (47-56)
🔇 Additional comments (10)
src/lfx/src/lfx/components/models_and_agents/embedding_model.py (1)

7-7: LGTM!

Import correctly added for the new EmbeddingsWithModels wrapper class.

src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (2)

1856-1856: Code hash change acknowledged.

No action needed here; just confirming this corresponds to the EmbeddingModelComponent refactor.


2025-2050: fetch_ibm_models function does not exist in the langflow codebase; IBM model fetching is implemented via the watsonx.ai bundle, not a standalone function.

The review comment references a function that cannot be found in the repository. IBM integration in langflow uses a bundle-based architecture (watsonx.ai bundle) that handles dynamic model fetching, rather than a fetch_ibm_models function called by update_build_config. The suggestion about caching and request failure handling may be conceptually valid, but it is directed at the wrong implementation target.

The JSON configuration file snippet shown (lines 2025-2050) contains parameter definitions unrelated to model fetching logic, further confirming a mismatch between the review location and the actual concern.

Likely an incorrect or invalid review comment.

src/lfx/src/lfx/base/embeddings/embeddings_class.py (3)

6-34: LGTM!

The wrapper class is well-designed with proper inheritance from Embeddings, clear docstrings, and correct handling of the mutable default argument for available_models.


36-78: LGTM!

The embedding methods correctly delegate to the underlying embeddings instance with proper type annotations.


80-116: LGTM!

The __call__ method properly checks callability before delegation, __getattr__ correctly forwards unknown attributes to the wrapped instance, and __repr__ provides useful debug information.

src/lfx/src/lfx/components/elastic/opensearch_multimodal.py (4)

116-328: LGTM!

The input definitions are comprehensive and well-documented. The is_list=True on the embedding input correctly enables multi-model support.


330-392: LGTM!

The model name resolution logic correctly handles multiple embedding providers with a clear priority order. The fallback chain through deployment → model → model_id → model_name ensures compatibility across different providers.


855-882: LGTM!

The retry logic with exponential backoff (1s → 2s → 4s, capped at 8s) for embedding generation is well-implemented. The ThreadPoolExecutor usage with bounded workers (max 8) prevents resource exhaustion.


1547-1590: LGTM!

The dynamic UI configuration correctly toggles visibility and required status for authentication fields based on the selected auth mode.

Comment on lines +428 to +435
"dimension": dim,
"method": {
"name": "disk_ann",
"space_type": space_type,
"engine": engine,
"parameters": {"ef_construction": ef_construction, "m": m},
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Script executed:

#!/bin/bash
# First, let's examine the file structure and understand the context around lines 428-435
cat -n src/lfx/src/lfx/components/elastic/opensearch_multimodal.py | sed -n '400,450p'

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

   400	        ef_construction: int = 100,
   401	        m: int = 16,
   402	        vector_field: str = "vector_field",
   403	    ) -> dict[str, Any]:
   404	        """Create the default OpenSearch index mapping for vector search.
   405	
   406	        This method generates the index configuration with k-NN settings optimized
   407	        for approximate nearest neighbor search using the specified vector engine.
   408	        Includes the embedding_model keyword field for tracking which model was used.
   409	
   410	        Args:
   411	            dim: Dimensionality of the vector embeddings
   412	            engine: Vector search engine (jvector, nmslib, faiss, lucene)
   413	            space_type: Distance metric for similarity calculation
   414	            ef_search: Size of dynamic list used during search
   415	            ef_construction: Size of dynamic list used during index construction
   416	            m: Number of bidirectional links for each vector
   417	            vector_field: Name of the field storing vector embeddings
   418	
   419	        Returns:
   420	            Dictionary containing OpenSearch index mapping configuration
   421	        """
   422	        return {
   423	            "settings": {"index": {"knn": True, "knn.algo_param.ef_search": ef_search}},
   424	            "mappings": {
   425	                "properties": {
   426	                    vector_field: {
   427	                        "type": "knn_vector",
   428	                        "dimension": dim,
   429	                        "method": {
   430	                            "name": "disk_ann",
   431	                            "space_type": space_type,
   432	                            "engine": engine,
   433	                            "parameters": {"ef_construction": ef_construction, "m": m},
   434	                        },
   435	                    },
   436	                    "embedding_model": {"type": "keyword"},  # Track which model was used
   437	                    "embedding_dimensions": {"type": "integer"},
   438	                }
   439	            },
   440	        }
   441	
   442	    def _ensure_embedding_field_mapping(
   443	        self,
   444	        client: OpenSearch,
   445	        index_name: str,
   446	        field_name: str,
   447	        dim: int,
   448	        engine: str,
   449	        space_type: str,
   450	        ef_construction: int,

</shell_result>


Script executed:

#!/bin/bash
# Search for "disk_ann" and "hnsw" occurrences in the file
echo "=== Searching for disk_ann occurrences ==="
rg -n "disk_ann" src/lfx/src/lfx/components/elastic/opensearch_multimodal.py

echo -e "\n=== Searching for hnsw occurrences ==="
rg -n "hnsw" src/lfx/src/lfx/components/elastic/opensearch_multimodal.py

echo -e "\n=== Searching for engine parameter usage ==="
rg -n "engine" src/lfx/src/lfx/components/elastic/opensearch_multimodal.py | head -30

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

=== Searching for disk_ann occurrences ===
430:                            "name": "disk_ann",
475:                            "name": "disk_ann",

=== Searching for hnsw occurrences ===

=== Searching for engine parameter usage ===
74:    - Vector storage with configurable engines (jvector, nmslib, faiss, lucene)
108:        "engine",
160:            name="engine",
165:                "Vector search engine for similarity calculations. 'jvector' is recommended for most use cases. "
397:        engine: str = "jvector",
407:        for approximate nearest neighbor search using the specified vector engine.
412:            engine: Vector search engine (jvector, nmslib, faiss, lucene)
432:                            "engine": engine,
448:        engine: str,
463:            engine: Vector search engine
477:                            "engine": engine,
498:    def _validate_aoss_with_engines(self, *, is_aoss: bool, engine: str) -> None:
499:        """Validate engine compatibility with Amazon OpenSearch Serverless (AOSS).
501:        Amazon OpenSearch Serverless has restrictions on which vector engines
502:        can be used. This method ensures the selected engine is compatible.
506:            engine: The selected vector search engine
509:            ValueError: If AOSS is used with an incompatible engine
511:        if is_aoss and engine not in {"nmslib", "faiss"}:
512:            msg = "Amazon OpenSearch Service Serverless only supports `nmslib` or `faiss` engines"
901:        # Validate engine with AOSS
902:        engine = getattr(self, "engine", "jvector")
903:        self._validate_aoss_with_engines(is_aoss=is_aoss, engine=engine)
912:            engine=engine,
934:            engine=engine,

</shell_result>


OpenSearch vector search methods disk_ann hnsw nmslib jvector faiss lucene engine compatibility


<web_search_result>
Summary of OpenSearch k-NN methods & engine compatibility (concise):

  • Engines supported in OpenSearch: Lucene (native), Faiss, NMSLIB (deprecated). [1][2].
  • HNSW: implemented across engines (Lucene, Faiss, NMSLIB). Use HNSW for most workloads; Lucene HNSW supports l2/cosinesimil/innerproduct and ignores ef_search (uses k). [1][2].
  • Faiss: supports HNSW and IVF (IVF requires training). Faiss is used for on-disk/large-scale and supports product quantization/compression options. [1][4].
  • NMSLIB: legacy HNSW engine (supported spaces: l2, innerproduct, cosinesimil, l1, linf) but deprecated—prefer Faiss or Lucene. [1].
  • Lucene engine: native HNSW, best for small-to-mid indexes and efficient filtering (can apply filters during search). [1][5].
  • on_disk / disk-based ANN modes: OpenSearch “on_disk” (low-cost) mode uses Faiss by default; there are also disk-based ANNs in the ecosystem (DiskANN/DISK_ANN in other vector DBs) and emerging disk_ann usage in OpenSearch examples (e.g., disk_ann method with jvector engine in recent posts/features). Check your OpenSearch version for exact support. [2][3][6].
  • jvector / disk_ann: OpenSearch examples and recent posts show a "jvector" engine and a "disk_ann" method used for concurrent / disk-optimized graph builds — this is an emergent/experimental option; verify availability in your OpenSearch release. [3].
  • AWS OpenSearch Service (managed) notes: serverless/vector-search collections currently restrict engines/features (e.g., some modes/engines like Lucene may not be supported in vector collections; AWS may restrict Faiss features such as IVF). Always consult the AWS docs for service-specific limitations. [2].

If you want, tell me your OpenSearch version (or AWS OpenSearch service type) and I’ll list exactly which methods/engine combinations and parameters are available and show example mappings.

Sources: OpenSearch docs & blog posts and AWS OpenSearch docs.
[1] OpenSearch docs — Methods and engines.
[2] OpenSearch docs — k-NN vector / modes (in_memory, on_disk).
[3] OpenSearch blog / changelog examples showing disk_ann + jvector.
[4] OpenSearch blog “Lucene-on-Faiss” (Faiss+Lucene integration / on-disk).
[5] OpenSearch vector search page (comparison table).
</web_search_result>


Script executed:

#!/bin/bash
# Search for test files or examples that show engine-method combinations
echo "=== Looking for test files ==="
find src/lfx -name "*test*" -o -name "*opensearch*" | head -20

echo -e "\n=== Searching for requirements/version info ==="
find . -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" | xargs grep -l "opensearch" 2>/dev/null

echo -e "\n=== Looking for any method selection logic ==="
rg -n "method.*name" src/lfx/src/lfx/components/elastic/opensearch_multimodal.py -A 2 -B 2

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

=== Looking for test files ===
src/lfx/src/lfx/components/elastic/opensearch.py
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py
src/lfx/tests
src/lfx/tests/conftest.py
src/lfx/tests/data/env_variable_test.json
src/lfx/tests/unit/base/data/test_base_file.py
src/lfx/tests/unit/base/data/test_storage_utils.py
src/lfx/tests/unit/cli/test_common.py
src/lfx/tests/unit/cli/test_run_command.py
src/lfx/tests/unit/cli/test_run_real_flows.py
src/lfx/tests/unit/cli/test_run_starter_projects.py
src/lfx/tests/unit/cli/test_run_starter_projects_backward_compatibility.py
src/lfx/tests/unit/cli/test_script_loader.py
src/lfx/tests/unit/cli/test_serve.py
src/lfx/tests/unit/cli/test_serve_app.py
src/lfx/tests/unit/cli/test_serve_app_streaming.py
src/lfx/tests/unit/cli/test_serve_components.py
src/lfx/tests/unit/cli/test_serve_simple.py
src/lfx/tests/unit/cli/test_validation.py
src/lfx/tests/unit/components/langchain_utilities/test_csv_agent.py

=== Searching for requirements/version info ===
./pyproject.toml

=== Looking for any method selection logic ===
265-            options=["basic", "jwt"],
266-            info=(
267:                "Authentication method: 'basic' for username/password authentication, "
268-                "or 'jwt' for JSON Web Token (Bearer) authentication."
269-            ),

</shell_result>


Script executed:

#!/bin/bash
# Search for method configuration patterns and how engines are actually used
echo "=== Full method definitions to see parameter patterns ==="
rg -n "\"method\":" src/lfx/src/lfx/components/elastic/opensearch_multimodal.py -A 5 -B 2

echo -e "\n=== Check if there are any comments about method names ==="
rg -n "method|disk_ann|hnsw" src/lfx/src/lfx/components/elastic/opensearch_multimodal.py | grep -E "(#|method|disk_ann|hnsw)"

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

=== Full method definitions to see parameter patterns ===
427-                        "type": "knn_vector",
428-                        "dimension": dim,
429:                        "method": {
430-                            "name": "disk_ann",
431-                            "space_type": space_type,
432-                            "engine": engine,
433-                            "parameters": {"ef_construction": ef_construction, "m": m},
434-                        },
--
472-                        "type": "knn_vector",
473-                        "dimension": dim,
474:                        "method": {
475-                            "name": "disk_ann",
476-                            "space_type": space_type,
477-                            "engine": engine,
478-                            "parameters": {"ef_construction": ef_construction, "m": m},
479-                        },

=== Check if there are any comments about method names ===
267:                "Authentication method: 'basic' for username/password authentication, "
406:        This method generates the index configuration with k-NN settings optimized
429:                        "method": {
430:                            "name": "disk_ann",
474:                        "method": {
475:                            "name": "disk_ann",
502:        can be used. This method ensures the selected engine is compatible.
544:        This method uses bulk operations to insert documents with their vector
654:        This method handles the complete document ingestion pipeline:
663:        # Convert DataFrame to Data if needed using parent's method
969:        This method accepts two filter formats and converts them to standardized
1137:        This method executes a sophisticated search that:
1518:        This is the main interface method that performs the multi-model search using the
1539:        This method handles real-time UI updates, particularly for authentication

</shell_result>


OpenSearch disk_ann method supported engines which engines work


<web_search_result>
Short answer: the disk_ann method in OpenSearch is implemented to use the jvector engine. See the OpenSearch blog example showing method.name = "disk_ann" with engine = "jvector". [1]

Context: OpenSearch's k-NN plugin generally supports engines faiss, lucene and (deprecated) nmslib for other methods (hnsw/ivf), but disk_ann specifically is shown using jvector in the OpenSearch blog announcing concurrent graph construction. [2][1]

Sources:

  • OpenSearch blog: "Concurrent vector graph construction" (disk_ann example using engine: "jvector"). [1]
  • OpenSearch k‑NN methods & engines documentation (faiss, lucene, nmslib overview). [2]
    </web_search_result>

Hardcoded disk_ann method is incompatible with multiple supported engines.

The method name "disk_ann" is hardcoded at lines 430 and 475, but OpenSearch's disk_ann method only works with the jvector engine. Your component supports four engines (jvector, nmslib, faiss, lucene), but when users select nmslib, faiss, or lucene, the index mapping will fail because those engines don't support the disk_ann method—they require hnsw or other engine-specific methods instead.

The method name must be conditional based on the selected engine parameter.

🤖 Prompt for AI Agents
In src/lfx/src/lfx/components/elastic/opensearch_multimodal.py around lines
428-435 (and similarly at ~475) the vector index mapping hardcodes the method
name "disk_ann", which only works with the jvector engine; this will fail for
nmslib, faiss and lucene. Change the code to pick the method name conditionally
based on the selected engine (e.g., use "disk_ann" only when engine ==
"jvector", otherwise use the engine-appropriate method such as "hnsw" for
nmslib/faiss/lucene or another supported method), and ensure any engine-specific
parameters (ef_construction, m, etc.) are set/renamed to match the chosen
method; update both locations so the mapping construction uses the conditional
method string and appropriate parameter set for that engine.

Comment on lines +1182 to +1186
available_models = self._detect_available_models(client, filter_clauses)

if not available_models:
logger.warning("No embedding models found in index, using current model")
available_models = [self._get_embedding_model_name()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Variable shadowing causes logic error.

On line 1209, available_models = getattr(emb_obj, "available_models", None) shadows the available_models list from line 1182 that contains the detected models in the index. After the first loop iteration, the outer available_models is overwritten, causing the for model_name in available_models: loop on line 1282 to iterate over the wrong data.

Rename the inner variable to avoid shadowing:

             model_name = getattr(emb_obj, "model_name", None)
-            available_models = getattr(emb_obj, "available_models", None)
+            emb_available_models = getattr(emb_obj, "available_models", None)

             logger.info(
                 f"Embedding object {idx}: deployment={deployment}, model={model}, "
                 f"model_id={model_id}, model_name={model_name}, dimensions={dimensions}, "
-                f"available_models={available_models}"
+                f"available_models={emb_available_models}"
             )

             # If this embedding has available_models dict, map all models to their dedicated instances
-            if available_models and isinstance(available_models, dict):
+            if emb_available_models and isinstance(emb_available_models, dict):
                 logger.info(
-                    f"Embedding object {idx} provides {len(available_models)} models via available_models dict"
+                    f"Embedding object {idx} provides {len(emb_available_models)} models via available_models dict"
                 )
-                for model_name_key, dedicated_embedding in available_models.items():
+                for model_name_key, dedicated_embedding in emb_available_models.items():

Apply similar changes to all subsequent usages within the loop (lines 1218-1238).

Also applies to: 1209-1209

🤖 Prompt for AI Agents
In src/lfx/src/lfx/components/elastic/opensearch_multimodal.py around lines
1182-1282, the local variable available_models set at line 1182 is being
shadowed by a second assignment at line 1209 (available_models =
getattr(emb_obj, "available_models", None)), which breaks the outer loop later
(line 1282) — rename the inner variable (for example emb_available_models) and
update all its subsequent uses within that loop (lines ~1218-1238 and any other
occurrences in the same block) so the outer available_models list remains
untouched.

Comment on lines +1300 to +1305
# Check if this is a dedicated instance from available_models dict
if emb_available_models and isinstance(emb_available_models, dict):
logger.info(
f"Model '{model_name}' using dedicated instance from available_models dict "
f"(pre-configured with correct model and dimensions)"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Continue fix for variable shadowing.

These lines also reference emb_available_models (after renaming) and need the same fix applied.

                     # Check if this is a dedicated instance from available_models dict
-                    if emb_available_models and isinstance(emb_available_models, dict):
+                    emb_avail_models = getattr(emb_obj, "available_models", None)
+                    if emb_avail_models and isinstance(emb_avail_models, dict):
                         logger.info(
                             f"Model '{model_name}' using dedicated instance from available_models dict "
                             f"(pre-configured with correct model and dimensions)"
                         )

Note: This is part of the same variable shadowing issue flagged earlier.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Check if this is a dedicated instance from available_models dict
if emb_available_models and isinstance(emb_available_models, dict):
logger.info(
f"Model '{model_name}' using dedicated instance from available_models dict "
f"(pre-configured with correct model and dimensions)"
)
# Check if this is a dedicated instance from available_models dict
emb_avail_models = getattr(emb_obj, "available_models", None)
if emb_avail_models and isinstance(emb_avail_models, dict):
logger.info(
f"Model '{model_name}' using dedicated instance from available_models dict "
f"(pre-configured with correct model and dimensions)"
)
🤖 Prompt for AI Agents
In src/lfx/src/lfx/components/elastic/opensearch_multimodal.py around lines 1300
to 1305, the code still references the old name `emb_available_models`; replace
this reference with the new variable name you introduced earlier (the one used
elsewhere in the file to avoid shadowing), and keep the same isinstance(dict)
check and logging text; ensure the variable used matches the prior rename so
there are no shadowed/undefined names at runtime.

Comment on lines +1539 to +1545
try:
raw = self.search(self.search_query or "")
return [Data(text=hit["page_content"], **hit["metadata"]) for hit in raw]
self.log(self.ingest_data)
except Exception as e:
self.log(f"search_documents error: {e}")
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unreachable code after return statement.

Line 1542 (self.log(self.ingest_data)) is placed after the return statement on line 1541, making it unreachable. This is likely a debugging statement that should be removed or moved before the return.

     def search_documents(self) -> list[Data]:
         try:
             raw = self.search(self.search_query or "")
-            return [Data(text=hit["page_content"], **hit["metadata"]) for hit in raw]
-            self.log(self.ingest_data)
+            results = [Data(text=hit["page_content"], **hit["metadata"]) for hit in raw]
+            return results
         except Exception as e:
             self.log(f"search_documents error: {e}")
             raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
raw = self.search(self.search_query or "")
return [Data(text=hit["page_content"], **hit["metadata"]) for hit in raw]
self.log(self.ingest_data)
except Exception as e:
self.log(f"search_documents error: {e}")
raise
try:
raw = self.search(self.search_query or "")
results = [Data(text=hit["page_content"], **hit["metadata"]) for hit in raw]
return results
except Exception as e:
self.log(f"search_documents error: {e}")
raise
🤖 Prompt for AI Agents
In src/lfx/src/lfx/components/elastic/opensearch_multimodal.py around lines
1539-1545 there is an unreachable self.log(self.ingest_data) placed after a
return; remove the unreachable line or move the logging before the return so it
executes. Better: call and assign raw = self.search(...), perform any needed
self.log(...) before returning, then return the list comprehension; ensure
self.search is only called once and preserve exception handling.

return WATSONX_EMBEDDING_MODEL_NAMES

def build_embeddings(self) -> Embeddings:
async def build_embeddings(self) -> Embeddings:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Script executed:

#!/bin/bash
# Search for all usages of build_embeddings method
rg -n --type=py 'build_embeddings' -B2 -A2

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Script execution failed with unrecoverable error: [CAPABILITY:RUN_SHELL_SCRIPT] Failed to clone repository langflow-ai/langflow: Clone operation failed
</shell_result>


Script executed:

#!/bin/bash
# Search for all usages of build_embeddings method across the codebase
rg -n --type=py 'build_embeddings' -B2 -A2

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

src/backend/tests/unit/initial_setup/starter_projects/test_vector_store_rag.py-41-    vector_store.set_on_output(name="dataframe", value=DataFrame(data=[Data(text="This is a test file.")]), cache=True)
src/backend/tests/unit/initial_setup/starter_projects/test_vector_store_rag.py-42-    vector_store.set(
src/backend/tests/unit/initial_setup/starter_projects/test_vector_store_rag.py:43:        embedding_model=openai_embeddings.build_embeddings,
src/backend/tests/unit/initial_setup/starter_projects/test_vector_store_rag.py-44-        ingest_data=text_splitter.split_text,
src/backend/tests/unit/initial_setup/starter_projects/test_vector_store_rag.py-45-        api_endpoint="https://astra.example.com",
--
src/backend/tests/unit/initial_setup/starter_projects/test_vector_store_rag.py-60-        api_endpoint="https://astra.example.com",
src/backend/tests/unit/initial_setup/starter_projects/test_vector_store_rag.py-61-        token="token",  # noqa: S106
src/backend/tests/unit/initial_setup/starter_projects/test_vector_store_rag.py:62:        embedding_model=openai_embeddings.build_embeddings,
src/backend/tests/unit/initial_setup/starter_projects/test_vector_store_rag.py-63-    )
src/backend/tests/unit/initial_setup/starter_projects/test_vector_store_rag.py-64-    # Mock search_documents
--
src/backend/tests/unit/components/vectorstores/test_local_db_component.py-32-
src/backend/tests/unit/components/vectorstores/test_local_db_component.py-33-        return {
src/backend/tests/unit/components/vectorstores/test_local_db_component.py:34:            "embedding": OpenAIEmbeddingsComponent(openai_api_key=api_key).build_embeddings(),
src/backend/tests/unit/components/vectorstores/test_local_db_component.py-35-            "collection_name": "test_collection",
src/backend/tests/unit/components/vectorstores/test_local_db_component.py-36-            "persist": True,
--
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-120-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-121-    @patch("lfx.components.models_and_agents.embedding_model.OpenAIEmbeddings")
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:122:    async def test_build_embeddings_openai(self, mock_openai_embeddings, component_class, default_kwargs):
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-123-        # Setup mock
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-124-        mock_instance = MagicMock()
--
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-135-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-136-        # Build the embeddings
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:137:        embeddings = component.build_embeddings()
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-138-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-139-        # Verify the OpenAIEmbeddings was called with the correct parameters
--
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-152-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-153-    @patch("langchain_ollama.OllamaEmbeddings")
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:154:    async def test_build_embeddings_ollama(self, mock_ollama_embeddings, component_class, default_kwargs):
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-155-        # Setup mock
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-156-        mock_instance = MagicMock()
--
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-166-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-167-        # Build the embeddings
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:168:        embeddings = component.build_embeddings()
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-169-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-170-        # Verify the OllamaEmbeddings was called with the correct parameters
--
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-178-    @patch("ibm_watsonx_ai.Credentials")
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-179-    @patch("langchain_ibm.WatsonxEmbeddings")
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:180:    async def test_build_embeddings_watsonx(
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-181-        self, mock_watsonx_embeddings, mock_credentials, mock_api_client, component_class, default_kwargs
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-182-    ):
--
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-199-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-200-        # Build the embeddings
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:201:        embeddings = component.build_embeddings()
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-202-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-203-        # Verify Credentials was created correctly
--
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-225-        assert embeddings == mock_instance
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-226-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:227:    async def test_build_embeddings_watsonx_missing_project_id(self, component_class, default_kwargs):
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-228-        kwargs = default_kwargs.copy()
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-229-        kwargs["provider"] = "IBM watsonx.ai"
--
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-232-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-233-        with pytest.raises(ValueError, match=r"Project ID is required for IBM watsonx.ai"):
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:234:            component.build_embeddings()
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-235-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:236:    async def test_build_embeddings_openai_missing_api_key(self, component_class, default_kwargs):
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-237-        component = component_class(**default_kwargs)
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-238-        component.provider = "OpenAI"
--
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-240-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-241-        with pytest.raises(ValueError, match="OpenAI API key is required when using OpenAI provider"):
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:242:            component.build_embeddings()
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-243-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:244:    async def test_build_embeddings_watsonx_missing_api_key(self, component_class, default_kwargs):
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-245-        kwargs = default_kwargs.copy()
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-246-        kwargs["provider"] = "IBM watsonx.ai"
--
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-251-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-252-        with pytest.raises(ValueError, match=r"IBM watsonx.ai API key is required when using IBM watsonx.ai provider"):
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:253:            component.build_embeddings()
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-254-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:255:    async def test_build_embeddings_unknown_provider(self, component_class, default_kwargs):
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-256-        component = component_class(**default_kwargs)
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-257-        component.provider = "Unknown"
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-258-
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py-259-        with pytest.raises(ValueError, match="Unknown provider: Unknown"):
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:260:            component.build_embeddings()
--
src/backend/tests/unit/components/vectorstores/test_chroma_vector_store_component.py-29-
src/backend/tests/unit/components/vectorstores/test_chroma_vector_store_component.py-30-        return {
src/backend/tests/unit/components/vectorstores/test_chroma_vector_store_component.py:31:            "embedding": OpenAIEmbeddingsComponent(openai_api_key=api_key).build_embeddings(),
src/backend/tests/unit/components/vectorstores/test_chroma_vector_store_component.py-32-            "collection_name": "test_collection",
src/backend/tests/unit/components/vectorstores/test_chroma_vector_store_component.py-33-            "persist_directory": tmp_path,
--
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-114-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-115-    @patch("langchain_huggingface.HuggingFaceEmbeddings")
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:116:    def test_build_embeddings_huggingface(self, mock_hf_embeddings, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-117-        """Test building HuggingFace embeddings."""
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-118-        component = component_class(**default_kwargs)
--
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-121-        mock_hf_embeddings.return_value = mock_embeddings
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-122-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:123:        result = component._build_embeddings("sentence-transformers/all-MiniLM-L6-v2", None)
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-124-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-125-        mock_hf_embeddings.assert_called_once_with(model="sentence-transformers/all-MiniLM-L6-v2")
--
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-127-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-128-    @patch("langchain_openai.OpenAIEmbeddings")
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:129:    def test_build_embeddings_openai(self, mock_openai_embeddings, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-130-        """Test building OpenAI embeddings."""
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-131-        component = component_class(**default_kwargs)
--
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-134-        mock_openai_embeddings.return_value = mock_embeddings
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-135-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:136:        result = component._build_embeddings("text-embedding-ada-002", "test-api-key")
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-137-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-138-        mock_openai_embeddings.assert_called_once_with(
--
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-143-        assert result == mock_embeddings
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-144-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:145:    def test_build_embeddings_openai_no_key(self, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-146-        """Test building OpenAI embeddings without API key raises error."""
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-147-        component = component_class(**default_kwargs)
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-148-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-149-        with pytest.raises(ValueError, match="OpenAI API key is required"):
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:150:            component._build_embeddings("text-embedding-ada-002", None)
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-151-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-152-    @patch("langchain_cohere.CohereEmbeddings")
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:153:    def test_build_embeddings_cohere(self, mock_cohere_embeddings, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-154-        """Test building Cohere embeddings."""
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-155-        component = component_class(**default_kwargs)
--
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-158-        mock_cohere_embeddings.return_value = mock_embeddings
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-159-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:160:        result = component._build_embeddings("embed-english-v3.0", "test-api-key")
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-161-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-162-        mock_cohere_embeddings.assert_called_once_with(
--
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-166-        assert result == mock_embeddings
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-167-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:168:    def test_build_embeddings_cohere_no_key(self, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-169-        """Test building Cohere embeddings without API key raises error."""
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-170-        component = component_class(**default_kwargs)
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-171-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-172-        with pytest.raises(ValueError, match="Cohere API key is required"):
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:173:            component._build_embeddings("embed-english-v3.0", None)
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-174-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:175:    def test_build_embeddings_custom_not_supported(self, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-176-        """Test building custom embeddings raises NotImplementedError."""
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-177-        component = component_class(**default_kwargs)
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-178-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-179-        with pytest.raises(NotImplementedError, match="Custom embedding models not yet supported"):
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:180:            component._build_embeddings("custom-model", "test-key")
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-181-
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-182-    @patch("langflow.components.knowledge_bases.ingestion.get_settings_service")
--
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-331-        # Mock embedding validation
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-332-        with (
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py:333:            patch.object(component, "_build_embeddings") as mock_build_emb,
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-334-            patch.object(component, "_save_embedding_metadata"),
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py-335-        ):
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-159-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-160-    @patch("langchain_huggingface.HuggingFaceEmbeddings")
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:161:    def test_build_embeddings_huggingface(self, mock_hf_embeddings, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-162-        """Test building HuggingFace embeddings."""
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-163-        component = component_class(**default_kwargs)
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-172-        mock_hf_embeddings.return_value = mock_embeddings
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-173-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:174:        result = component._build_embeddings(metadata)
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-175-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-176-        mock_hf_embeddings.assert_called_once_with(model="sentence-transformers/all-MiniLM-L6-v2")
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-178-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-179-    @patch("langchain_openai.OpenAIEmbeddings")
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:180:    def test_build_embeddings_openai(self, mock_openai_embeddings, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-181-        """Test building OpenAI embeddings."""
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-182-        component = component_class(**default_kwargs)
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-192-        mock_openai_embeddings.return_value = mock_embeddings
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-193-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:194:        result = component._build_embeddings(metadata)
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-195-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-196-        mock_openai_embeddings.assert_called_once_with(
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-201-        assert result == mock_embeddings
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-202-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:203:    def test_build_embeddings_openai_no_key(self, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-204-        """Test building OpenAI embeddings without API key raises error."""
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-205-        component = component_class(**default_kwargs)
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-213-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-214-        with pytest.raises(ValueError, match="OpenAI API key is required"):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:215:            component._build_embeddings(metadata)
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-216-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-217-    @patch("langchain_cohere.CohereEmbeddings")
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:218:    def test_build_embeddings_cohere(self, mock_cohere_embeddings, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-219-        """Test building Cohere embeddings."""
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-220-        component = component_class(**default_kwargs)
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-230-        mock_cohere_embeddings.return_value = mock_embeddings
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-231-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:232:        result = component._build_embeddings(metadata)
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-233-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-234-        mock_cohere_embeddings.assert_called_once_with(
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-238-        assert result == mock_embeddings
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-239-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:240:    def test_build_embeddings_cohere_no_key(self, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-241-        """Test building Cohere embeddings without API key raises error."""
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-242-        component = component_class(**default_kwargs)
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-250-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-251-        with pytest.raises(ValueError, match="Cohere API key is required"):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:252:            component._build_embeddings(metadata)
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-253-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:254:    def test_build_embeddings_custom_not_supported(self, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-255-        """Test building custom embeddings raises NotImplementedError."""
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-256-        component = component_class(**default_kwargs)
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-263-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-264-        with pytest.raises(NotImplementedError, match="Custom embedding models not yet supported"):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:265:            component._build_embeddings(metadata)
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-266-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:267:    def test_build_embeddings_unsupported_provider(self, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-268-        """Test building embeddings with unsupported provider raises NotImplementedError."""
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-269-        component = component_class(**default_kwargs)
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-276-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-277-        with pytest.raises(NotImplementedError, match="Embedding provider 'UnsupportedProvider' is not supported"):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:278:            component._build_embeddings(metadata)
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-279-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:280:    def test_build_embeddings_with_user_api_key(self, component_class, default_kwargs):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-281-        """Test that user-provided API key overrides stored one."""
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-282-        # Use a real SecretStr object instead of a mock
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-297-            mock_openai.return_value = mock_embeddings
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-298-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:299:            component._build_embeddings(metadata)
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-300-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-301-            # The user-provided key should override the stored key in metadata
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-348-        with (
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-349-            patch.object(component, "_get_kb_metadata") as mock_get_metadata,
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:350:            patch.object(component, "_build_embeddings") as mock_build_embeddings,
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-351-            patch("langchain_chroma.Chroma"),
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-352-        ):
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-353-            mock_get_metadata.return_value = {"embedding_provider": "HuggingFace", "embedding_model": "test-model"}
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:354:            mock_build_embeddings.return_value = MagicMock()
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-355-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-356-            # This is a unit test focused on the component's internal logic
--
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-360-            # Verify internal methods were called
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-361-            mock_get_metadata.assert_called_once()
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py:362:            mock_build_embeddings.assert_called_once()
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-363-
src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py-364-    def test_include_embeddings_parameter(self, component_class, default_kwargs):
--
src/backend/base/langflow/initial_setup/starter_projects/vector_store_rag.py-20-    vector_store = AstraDBVectorStoreComponent()
src/backend/base/langflow/initial_setup/starter_projects/vector_store_rag.py-21-    vector_store.set(
src/backend/base/langflow/initial_setup/starter_projects/vector_store_rag.py:22:        embedding_model=openai_embeddings.build_embeddings,
src/backend/base/langflow/initial_setup/starter_projects/vector_store_rag.py-23-        ingest_data=text_splitter.split_text,
src/backend/base/langflow/initial_setup/starter_projects/vector_store_rag.py-24-    )
--
src/backend/base/langflow/initial_setup/starter_projects/vector_store_rag.py-34-    rag_vector_store.set(
src/backend/base/langflow/initial_setup/starter_projects/vector_store_rag.py-35-        search_query=chat_input.message_response,
src/backend/base/langflow/initial_setup/starter_projects/vector_store_rag.py:36:        embedding_model=openai_embeddings.build_embeddings,
src/backend/base/langflow/initial_setup/starter_projects/vector_store_rag.py-37-    )
src/backend/base/langflow/initial_setup/starter_projects/vector_store_rag.py-38-
--
src/lfx/src/lfx/components/vertexai/vertexai_embeddings.py-34-
src/lfx/src/lfx/components/vertexai/vertexai_embeddings.py-35-    outputs = [
src/lfx/src/lfx/components/vertexai/vertexai_embeddings.py:36:        Output(display_name="Embeddings", name="embeddings", method="build_embeddings"),
src/lfx/src/lfx/components/vertexai/vertexai_embeddings.py-37-    ]
src/lfx/src/lfx/components/vertexai/vertexai_embeddings.py-38-
src/lfx/src/lfx/components/vertexai/vertexai_embeddings.py:39:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/vertexai/vertexai_embeddings.py-40-        try:
src/lfx/src/lfx/components/vertexai/vertexai_embeddings.py-41-            from langchain_google_vertexai import VertexAIEmbeddings
--
src/lfx/src/lfx/components/ollama/ollama_embeddings.py-41-
src/lfx/src/lfx/components/ollama/ollama_embeddings.py-42-    outputs = [
src/lfx/src/lfx/components/ollama/ollama_embeddings.py:43:        Output(display_name="Embeddings", name="embeddings", method="build_embeddings"),
src/lfx/src/lfx/components/ollama/ollama_embeddings.py-44-    ]
src/lfx/src/lfx/components/ollama/ollama_embeddings.py-45-
src/lfx/src/lfx/components/ollama/ollama_embeddings.py:46:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/ollama/ollama_embeddings.py-47-        transformed_base_url = transform_localhost_url(self.base_url)
src/lfx/src/lfx/components/ollama/ollama_embeddings.py-48-        try:
--
src/lfx/src/lfx/components/twelvelabs/text_embeddings.py-54-    ]
src/lfx/src/lfx/components/twelvelabs/text_embeddings.py-55-
src/lfx/src/lfx/components/twelvelabs/text_embeddings.py:56:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/twelvelabs/text_embeddings.py-57-        return TwelveLabsTextEmbeddings(api_key=self.api_key, model=self.model)
--
src/lfx/src/lfx/components/twelvelabs/video_embeddings.py-97-    ]
src/lfx/src/lfx/components/twelvelabs/video_embeddings.py-98-
src/lfx/src/lfx/components/twelvelabs/video_embeddings.py:99:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/twelvelabs/video_embeddings.py-100-        return TwelveLabsVideoEmbeddings(api_key=self.api_key, model_name=self.model_name)
--
src/lfx/src/lfx/components/openai/openai.py-73-    ]
src/lfx/src/lfx/components/openai/openai.py-74-
src/lfx/src/lfx/components/openai/openai.py:75:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/openai/openai.py-76-        return OpenAIEmbeddings(
src/lfx/src/lfx/components/openai/openai.py-77-            client=self.client or None,
--
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-50-        if field_name == "base_url" and field_value:
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-51-            try:
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py:52:                build_model = self.build_embeddings()
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-53-                ids = [model.id for model in build_model.available_models]
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-54-                build_config["model"]["options"] = ids
--
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-59-        return build_config
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-60-
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py:61:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-62-        try:
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-63-            from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
--
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-153-            return WATSONX_EMBEDDING_MODEL_NAMES
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-154-
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:155:    async def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-156-        provider = self.provider
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-157-        model = self.model
--
src/lfx/src/lfx/components/lmstudio/lmstudioembeddings.py-71-    ]
src/lfx/src/lfx/components/lmstudio/lmstudioembeddings.py-72-
src/lfx/src/lfx/components/lmstudio/lmstudioembeddings.py:73:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/lmstudio/lmstudioembeddings.py-74-        try:
src/lfx/src/lfx/components/lmstudio/lmstudioembeddings.py-75-            from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
--
src/lfx/src/lfx/components/mistral/mistral_embeddings.py-39-
src/lfx/src/lfx/components/mistral/mistral_embeddings.py-40-    outputs = [
src/lfx/src/lfx/components/mistral/mistral_embeddings.py:41:        Output(display_name="Embeddings", name="embeddings", method="build_embeddings"),
src/lfx/src/lfx/components/mistral/mistral_embeddings.py-42-    ]
src/lfx/src/lfx/components/mistral/mistral_embeddings.py-43-
src/lfx/src/lfx/components/mistral/mistral_embeddings.py:44:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/mistral/mistral_embeddings.py-45-        if not self.mistral_api_key:
src/lfx/src/lfx/components/mistral/mistral_embeddings.py-46-            msg = "Mistral API Key is required"
--
src/lfx/src/lfx/components/langchain_utilities/fake_embeddings.py-21-    ]
src/lfx/src/lfx/components/langchain_utilities/fake_embeddings.py-22-
src/lfx/src/lfx/components/langchain_utilities/fake_embeddings.py:23:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/langchain_utilities/fake_embeddings.py-24-        return FakeEmbeddings(
src/lfx/src/lfx/components/langchain_utilities/fake_embeddings.py-25-            size=self.dimensions or 5,
--
src/lfx/src/lfx/components/ibm/watsonx_embeddings.py-115-                logger.exception("Error updating model options.")
src/lfx/src/lfx/components/ibm/watsonx_embeddings.py-116-
src/lfx/src/lfx/components/ibm/watsonx_embeddings.py:117:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/ibm/watsonx_embeddings.py-118-        credentials = Credentials(
src/lfx/src/lfx/components/ibm/watsonx_embeddings.py-119-            api_key=SecretStr(self.api_key).get_secret_value(),
--
src/lfx/src/lfx/components/huggingface/huggingface_inference_api.py-44-
src/lfx/src/lfx/components/huggingface/huggingface_inference_api.py-45-    outputs = [
src/lfx/src/lfx/components/huggingface/huggingface_inference_api.py:46:        Output(display_name="Embeddings", name="embeddings", method="build_embeddings"),
src/lfx/src/lfx/components/huggingface/huggingface_inference_api.py-47-    ]
src/lfx/src/lfx/components/huggingface/huggingface_inference_api.py-48-
--
src/lfx/src/lfx/components/huggingface/huggingface_inference_api.py-83-        return HuggingFaceInferenceAPIEmbeddings(api_key=api_key, api_url=api_url, model_name=model_name)
src/lfx/src/lfx/components/huggingface/huggingface_inference_api.py-84-
src/lfx/src/lfx/components/huggingface/huggingface_inference_api.py:85:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/huggingface/huggingface_inference_api.py-86-        api_url = self.get_api_url()
src/lfx/src/lfx/components/huggingface/huggingface_inference_api.py-87-
--
src/lfx/src/lfx/components/google/google_generative_ai_embeddings.py-34-
src/lfx/src/lfx/components/google/google_generative_ai_embeddings.py-35-    outputs = [
src/lfx/src/lfx/components/google/google_generative_ai_embeddings.py:36:        Output(display_name="Embeddings", name="embeddings", method="build_embeddings"),
src/lfx/src/lfx/components/google/google_generative_ai_embeddings.py-37-    ]
src/lfx/src/lfx/components/google/google_generative_ai_embeddings.py-38-
src/lfx/src/lfx/components/google/google_generative_ai_embeddings.py:39:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/google/google_generative_ai_embeddings.py-40-        if not self.api_key:
src/lfx/src/lfx/components/google/google_generative_ai_embeddings.py-41-            msg = "API Key is required"
--
src/lfx/src/lfx/components/files_and_knowledge/retrieval.py-135-        return metadata
src/lfx/src/lfx/components/files_and_knowledge/retrieval.py-136-
src/lfx/src/lfx/components/files_and_knowledge/retrieval.py:137:    def _build_embeddings(self, metadata: dict):
src/lfx/src/lfx/components/files_and_knowledge/retrieval.py-138-        """Build embedding model from metadata."""
src/lfx/src/lfx/components/files_and_knowledge/retrieval.py-139-        runtime_api_key = self.api_key.get_secret_value() if isinstance(self.api_key, SecretStr) else self.api_key
--
src/lfx/src/lfx/components/files_and_knowledge/retrieval.py-203-
src/lfx/src/lfx/components/files_and_knowledge/retrieval.py-204-        # Build the embedder for the knowledge base
src/lfx/src/lfx/components/files_and_knowledge/retrieval.py:205:        embedding_function = self._build_embeddings(metadata)
src/lfx/src/lfx/components/files_and_knowledge/retrieval.py-206-
src/lfx/src/lfx/components/files_and_knowledge/retrieval.py-207-        # Load vector store
--
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-243-        return "Custom"
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-244-
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py:245:    def _build_embeddings(self, embedding_model: str, api_key: str):
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-246-        """Build embedding model using provider patterns."""
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-247-        # Get provider by matching model name to lists
--
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-385-
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-386-            # Create embeddings model
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py:387:            embedding_function = self._build_embeddings(embedding_model, api_key)
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-388-
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-389-            # Convert DataFrame to Data objects (following Local DB pattern)
--
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-655-
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-656-                # We need to test the API Key one time against the embedding model
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py:657:                embed_model = self._build_embeddings(embedding_model=field_value["02_embedding_model"], api_key=api_key)
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-658-
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py-659-                # Try to generate a dummy embedding to validate the API key without blocking the event loop
--
src/lfx/src/lfx/components/azure/azure_openai_embeddings.py-64-
src/lfx/src/lfx/components/azure/azure_openai_embeddings.py-65-    outputs = [
src/lfx/src/lfx/components/azure/azure_openai_embeddings.py:66:        Output(display_name="Embeddings", name="embeddings", method="build_embeddings"),
src/lfx/src/lfx/components/azure/azure_openai_embeddings.py-67-    ]
src/lfx/src/lfx/components/azure/azure_openai_embeddings.py-68-
src/lfx/src/lfx/components/azure/azure_openai_embeddings.py:69:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/azure/azure_openai_embeddings.py-70-        try:
src/lfx/src/lfx/components/azure/azure_openai_embeddings.py-71-            embeddings = AzureOpenAIEmbeddings(
--
src/lfx/src/lfx/components/cloudflare/cloudflare.py-61-
src/lfx/src/lfx/components/cloudflare/cloudflare.py-62-    outputs = [
src/lfx/src/lfx/components/cloudflare/cloudflare.py:63:        Output(display_name="Embeddings", name="embeddings", method="build_embeddings"),
src/lfx/src/lfx/components/cloudflare/cloudflare.py-64-    ]
src/lfx/src/lfx/components/cloudflare/cloudflare.py-65-
src/lfx/src/lfx/components/cloudflare/cloudflare.py:66:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/cloudflare/cloudflare.py-67-        try:
src/lfx/src/lfx/components/cloudflare/cloudflare.py-68-            embeddings = CloudflareWorkersAIEmbeddings(
--
src/lfx/src/lfx/components/cohere/cohere_embeddings.py-40-
src/lfx/src/lfx/components/cohere/cohere_embeddings.py-41-    outputs = [
src/lfx/src/lfx/components/cohere/cohere_embeddings.py:42:        Output(display_name="Embeddings", name="embeddings", method="build_embeddings"),
src/lfx/src/lfx/components/cohere/cohere_embeddings.py-43-    ]
src/lfx/src/lfx/components/cohere/cohere_embeddings.py-44-
src/lfx/src/lfx/components/cohere/cohere_embeddings.py:45:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/cohere/cohere_embeddings.py-46-        data = None
src/lfx/src/lfx/components/cohere/cohere_embeddings.py-47-        try:
--
src/lfx/src/lfx/components/amazon/amazon_bedrock_embedding.py-69-
src/lfx/src/lfx/components/amazon/amazon_bedrock_embedding.py-70-    outputs = [
src/lfx/src/lfx/components/amazon/amazon_bedrock_embedding.py:71:        Output(display_name="Embeddings", name="embeddings", method="build_embeddings"),
src/lfx/src/lfx/components/amazon/amazon_bedrock_embedding.py-72-    ]
src/lfx/src/lfx/components/amazon/amazon_bedrock_embedding.py-73-
src/lfx/src/lfx/components/amazon/amazon_bedrock_embedding.py:74:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/amazon/amazon_bedrock_embedding.py-75-        try:
src/lfx/src/lfx/components/amazon/amazon_bedrock_embedding.py-76-            from langchain_aws import BedrockEmbeddings
--
src/lfx/src/lfx/components/aiml/aiml_embeddings.py-31-    ]
src/lfx/src/lfx/components/aiml/aiml_embeddings.py-32-
src/lfx/src/lfx/components/aiml/aiml_embeddings.py:33:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/components/aiml/aiml_embeddings.py-34-        return AIMLEmbeddingsImpl(
src/lfx/src/lfx/components/aiml/aiml_embeddings.py-35-            api_key=self.aiml_api_key,
--
src/lfx/src/lfx/base/embeddings/model.py-8-
src/lfx/src/lfx/base/embeddings/model.py-9-    outputs = [
src/lfx/src/lfx/base/embeddings/model.py:10:        Output(display_name="Embedding Model", name="embeddings", method="build_embeddings"),
src/lfx/src/lfx/base/embeddings/model.py-11-    ]
src/lfx/src/lfx/base/embeddings/model.py-12-
src/lfx/src/lfx/base/embeddings/model.py-13-    def _validate_outputs(self) -> None:
src/lfx/src/lfx/base/embeddings/model.py:14:        required_output_methods = ["build_embeddings"]
src/lfx/src/lfx/base/embeddings/model.py-15-        output_names = [output.name for output in self.outputs]
src/lfx/src/lfx/base/embeddings/model.py-16-        for method_name in required_output_methods:
--
src/lfx/src/lfx/base/embeddings/model.py-22-                raise ValueError(msg)
src/lfx/src/lfx/base/embeddings/model.py-23-
src/lfx/src/lfx/base/embeddings/model.py:24:    def build_embeddings(self) -> Embeddings:
src/lfx/src/lfx/base/embeddings/model.py:25:        msg = "You must implement the build_embeddings method in your class."
src/lfx/src/lfx/base/embeddings/model.py-26-        raise NotImplementedError(msg)

</shell_result>


Script executed:

#!/bin/bash
# Check test_local_db_component.py around line 34
sed -n '30,40p' src/backend/tests/unit/components/vectorstores/test_local_db_component.py

# Check test_embedding_model_component.py around lines 137, 168
sed -n '130,145p' src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py

# Check if the test functions are async
sed -n '120,125p' src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

        except ValueError:
            pytest.skip("OPENAI_API_KEY is not set")

        return {
            "embedding": OpenAIEmbeddingsComponent(openai_api_key=api_key).build_embeddings(),
            "collection_name": "test_collection",
            "persist": True,
            "persist_directory": str(tmp_path),  # Convert Path to string
            "mode": "Ingest",
        }

        component.model = "text-embedding-3-small"
        component.api_key = "test-key"  # pragma:allowlist secret
        component.chunk_size = 1000
        component.max_retries = 3
        component.show_progress_bar = False

        # Build the embeddings
        embeddings = component.build_embeddings()

        # Verify the OpenAIEmbeddings was called with the correct parameters
        mock_openai_embeddings.assert_called_once_with(
            model="text-embedding-3-small",
            dimensions=None,
            base_url=None,
            api_key="test-key",  # pragma:allowlist secret
            chunk_size=1000,

    @patch("lfx.components.models_and_agents.embedding_model.OpenAIEmbeddings")
    async def test_build_embeddings_openai(self, mock_openai_embeddings, component_class, default_kwargs):
        # Setup mock
        mock_instance = MagicMock()
        mock_openai_embeddings.return_value = mock_instance

</shell_result>


Script executed:

#!/bin/bash
# Search for any uses of "await build_embeddings" to see if callers are properly awaiting
rg -n --type=py 'await.*build_embeddings' 

# Check the actual implementation of the async method
sed -n '155,200p' src/lfx/src/lfx/components/models_and_agents/embedding_model.py

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

    async def build_embeddings(self) -> Embeddings:
        provider = self.provider
        model = self.model
        api_key = self.api_key
        api_base = self.api_base
        base_url_ibm_watsonx = self.base_url_ibm_watsonx
        ollama_base_url = self.ollama_base_url
        dimensions = self.dimensions
        chunk_size = self.chunk_size
        request_timeout = self.request_timeout
        max_retries = self.max_retries
        show_progress_bar = self.show_progress_bar
        model_kwargs = self.model_kwargs or {}

        if provider == "OpenAI":
            if not api_key:
                msg = "OpenAI API key is required when using OpenAI provider"
                raise ValueError(msg)

            # Create the primary embedding instance
            embeddings_instance = OpenAIEmbeddings(
                model=model,
                dimensions=dimensions or None,
                base_url=api_base or None,
                api_key=api_key,
                chunk_size=chunk_size,
                max_retries=max_retries,
                timeout=request_timeout or None,
                show_progress_bar=show_progress_bar,
                model_kwargs=model_kwargs,
            )

            # Create dedicated instances for each available model
            available_models_dict = {}
            for model_name in OPENAI_EMBEDDING_MODEL_NAMES:
                available_models_dict[model_name] = OpenAIEmbeddings(
                    model=model_name,
                    dimensions=dimensions or None,  # Use same dimensions config for all
                    base_url=api_base or None,
                    api_key=api_key,
                    chunk_size=chunk_size,
                    max_retries=max_retries,
                    timeout=request_timeout or None,
                    show_progress_bar=show_progress_bar,
                    model_kwargs=model_kwargs,
                )

</shell_result>


Breaking change not properly propagated: callers are not awaiting the async method.

The build_embeddings method was converted to async but call sites throughout the codebase are not awaiting it:

  • src/backend/tests/unit/components/vectorstores/test_local_db_component.py:34
  • src/backend/tests/unit/components/vectorstores/test_chroma_vector_store_component.py:31
  • src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py:137, 168, 201, 234, 242, 253, 260

Additionally, the method body contains no await operations despite being declared async, and all other embedding components in the codebase remain synchronous. Either update all call sites to use await, or reconsider whether this method should be async.

🤖 Prompt for AI Agents
In src/lfx/src/lfx/components/models_and_agents/embedding_model.py around line
155, the method build_embeddings was changed to async but its body contains no
awaits and many callers are not awaiting it, causing breaking behavior; either
revert build_embeddings to a synchronous def and restore the previous
(non-async) signature so all existing call sites continue to work, or keep it
async and (1) introduce real awaitable operations inside (or wrap the work in an
executor) and (2) update every caller/test listed to await build_embeddings;
pick one approach and apply it consistently across the embedding component
interface and all referenced call sites/tests.

Comment on lines +187 to +200
# Create dedicated instances for each available model
available_models_dict = {}
for model_name in OPENAI_EMBEDDING_MODEL_NAMES:
available_models_dict[model_name] = OpenAIEmbeddings(
model=model_name,
dimensions=dimensions or None, # Use same dimensions config for all
base_url=api_base or None,
api_key=api_key,
chunk_size=chunk_size,
max_retries=max_retries,
timeout=request_timeout or None,
show_progress_bar=show_progress_bar,
model_kwargs=model_kwargs,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Script executed:

# Find the OPENAI_EMBEDDING_MODEL_NAMES constant definition
rg "OPENAI_EMBEDDING_MODEL_NAMES" -n -B2 -A5

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

src/lfx/tests/data/starter_projects_1_6_0/Nvidia Remix.json-1783-                "title_case": false,
src/lfx/tests/data/starter_projects_1_6_0/Nvidia Remix.json-1784-                "type": "code",
src/lfx/tests/data/starter_projects_1_6_0/Nvidia Remix.json:1785:                "value": "from typing import Any\n\nfrom langchain_openai import OpenAIEmbeddings\n\nfrom langflow.base.embeddings.model import LCEmbeddingsModel\nfrom langflow.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom langflow.field_typing import Embeddings\nfrom langflow.io import (\n    BoolInput,\n    DictInput,\n    DropdownInput,\n    FloatInput,\n    IntInput,\n    MessageTextInput,\n    SecretStrInput,\n)\nfrom langflow.schema.dotdict import dotdict\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n    display_name = \"Embedding Model\"\n    description = \"Generate embeddings using a specified provider.\"\n    documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n    icon = \"binary\"\n    name = \"EmbeddingModel\"\n    category = \"models\"\n\n    inputs = [\n        DropdownInput(\n            name=\"provider\",\n            display_name=\"Model Provider\",\n            options=[\"OpenAI\"],\n            value=\"OpenAI\",\n            info=\"Select the embedding model provider\",\n            real_time_refresh=True,\n            options_metadata=[{\"icon\": \"OpenAI\"}],\n        ),\n        DropdownInput(\n            name=\"model\",\n            display_name=\"Model Name\",\n            options=OPENAI_EMBEDDING_MODEL_NAMES,\n            value=OPENAI_EMBEDDING_MODEL_NAMES[0],\n            info=\"Select the embedding model to use\",\n        ),\n        SecretStrInput(\n            name=\"api_key\",\n            display_name=\"OpenAI API Key\",\n            info=\"Model Provider API key\",\n            required=True,\n            show=True,\n            real_time_refresh=True,\n        ),\n        MessageTextInput(\n            name=\"api_base\",\n            display_name=\"API Base URL\",\n            info=\"Base URL for the API. Leave empty for default.\",\n            advanced=True,\n        ),\n        IntInput(\n            name=\"dimensions\",\n            display_name=\"Dimensions\",\n            info=\"The number of dimensions the resulting output embeddings should have. \"\n            \"Only supported by certain models.\",\n            advanced=True,\n        ),\n        IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n        FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n        IntInput(name=\"max_retries\", display_name=\"Max Retries\", advanced=True, value=3),\n        BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n        DictInput(\n            name=\"model_kwargs\",\n            display_name=\"Model Kwargs\",\n            advanced=True,\n            info=\"Additional keyword arguments to pass to the model.\",\n        ),\n    ]\n\n    def build_embeddings(self) -> Embeddings:\n        provider = self.provider\n        model = self.model\n        api_key = self.api_key\n        api_base = self.api_base\n        dimensions = self.dimensions\n        chunk_size = self.chunk_size\n        request_timeout = self.request_timeout\n        max_retries = self.max_retries\n        show_progress_bar = self.show_progress_bar\n        model_kwargs = self.model_kwargs or {}\n\n        if provider == \"OpenAI\":\n            if not api_key:\n                msg = \"OpenAI API key is required when using OpenAI provider\"\n                raise ValueError(msg)\n            return OpenAIEmbeddings(\n                model=model,\n                dimensions=dimensions or None,\n                base_url=api_base or None,\n                api_key=api_key,\n                chunk_size=chunk_size,\n                max_retries=max_retries,\n                timeout=request_timeout or None,\n                show_progress_bar=show_progress_bar,\n                model_kwargs=model_kwargs,\n            )\n        msg = f\"Unknown provider: {provider}\"\n        raise ValueError(msg)\n\n    def update_build_config(self, build_config: dotdict, field_value: Any, field_name: str | None = None) -> dotdict:\n        if field_name == \"provider\" and field_value == \"OpenAI\":\n            build_config[\"model\"][\"options\"] = OPENAI_EMBEDDING_MODEL_NAMES\n            build_config[\"model\"][\"value\"] = OPENAI_EMBEDDING_MODEL_NAMES[0]\n            build_config[\"api_key\"][\"display_name\"] = \"OpenAI API Key\"\n            build_config[\"api_base\"][\"display_name\"] = \"OpenAI API Base URL\"\n        return build_config\n"
src/lfx/tests/data/starter_projects_1_6_0/Nvidia Remix.json-1786-              },
src/lfx/tests/data/starter_projects_1_6_0/Nvidia Remix.json-1787-              "dimensions": {
src/lfx/tests/data/starter_projects_1_6_0/Nvidia Remix.json-1788-                "_input_type": "IntInput",
src/lfx/tests/data/starter_projects_1_6_0/Nvidia Remix.json-1789-                "advanced": true,
src/lfx/tests/data/starter_projects_1_6_0/Nvidia Remix.json-1790-                "display_name": "Dimensions",
--
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1317-                "title_case": false,
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1318-                "type": "code",
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json:1319:                "value": "from langchain_openai import OpenAIEmbeddings\n\nfrom langflow.base.embeddings.model import LCEmbeddingsModel\nfrom langflow.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom langflow.field_typing import Embeddings\nfrom langflow.io import BoolInput, DictInput, DropdownInput, FloatInput, IntInput, MessageTextInput, SecretStrInput\n\n\nclass OpenAIEmbeddingsComponent(LCEmbeddingsModel):\n    display_name = \"OpenAI Embeddings\"\n    description = \"Generate embeddings using OpenAI models.\"\n    icon = \"OpenAI\"\n    name = \"OpenAIEmbeddings\"\n\n    inputs = [\n        DictInput(\n            name=\"default_headers\",\n            display_name=\"Default Headers\",\n            advanced=True,\n            info=\"Default headers to use for the API request.\",\n        ),\n        DictInput(\n            name=\"default_query\",\n            display_name=\"Default Query\",\n            advanced=True,\n            info=\"Default query parameters to use for the API request.\",\n        ),\n        IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n        MessageTextInput(name=\"client\", display_name=\"Client\", advanced=True),\n        MessageTextInput(name=\"deployment\", display_name=\"Deployment\", advanced=True),\n        IntInput(name=\"embedding_ctx_length\", display_name=\"Embedding Context Length\", advanced=True, value=1536),\n        IntInput(name=\"max_retries\", display_name=\"Max Retries\", value=3, advanced=True),\n        DropdownInput(\n            name=\"model\",\n            display_name=\"Model\",\n            advanced=False,\n            options=OPENAI_EMBEDDING_MODEL_NAMES,\n            value=\"text-embedding-3-small\",\n        ),\n        DictInput(name=\"model_kwargs\", display_name=\"Model Kwargs\", advanced=True),\n        SecretStrInput(name=\"openai_api_key\", display_name=\"OpenAI API Key\", value=\"OPENAI_API_KEY\", required=True),\n        MessageTextInput(name=\"openai_api_base\", display_name=\"OpenAI API Base\", advanced=True),\n        MessageTextInput(name=\"openai_api_type\", display_name=\"OpenAI API Type\", advanced=True),\n        MessageTextInput(name=\"openai_api_version\", display_name=\"OpenAI API Version\", advanced=True),\n        MessageTextInput(\n            name=\"openai_organization\",\n            display_name=\"OpenAI Organization\",\n            advanced=True,\n        ),\n        MessageTextInput(name=\"openai_proxy\", display_name=\"OpenAI Proxy\", advanced=True),\n        FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n        BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n        BoolInput(name=\"skip_empty\", display_name=\"Skip Empty\", advanced=True),\n        MessageTextInput(\n            name=\"tiktoken_model_name\",\n            display_name=\"TikToken Model Name\",\n            advanced=True,\n        ),\n        BoolInput(\n            name=\"tiktoken_enable\",\n            display_name=\"TikToken Enable\",\n            advanced=True,\n            value=True,\n            info=\"If False, you must have transformers installed.\",\n        ),\n        IntInput(\n            name=\"dimensions\",\n            display_name=\"Dimensions\",\n            info=\"The number of dimensions the resulting output embeddings should have. \"\n            \"Only supported by certain models.\",\n            advanced=True,\n        ),\n    ]\n\n    def build_embeddings(self) -> Embeddings:\n        return OpenAIEmbeddings(\n            client=self.client or None,\n            model=self.model,\n            dimensions=self.dimensions or None,\n            deployment=self.deployment or None,\n            api_version=self.openai_api_version or None,\n            base_url=self.openai_api_base or None,\n            openai_api_type=self.openai_api_type or None,\n            openai_proxy=self.openai_proxy or None,\n            embedding_ctx_length=self.embedding_ctx_length,\n            api_key=self.openai_api_key or None,\n            organization=self.openai_organization or None,\n            allowed_special=\"all\",\n            disallowed_special=\"all\",\n            chunk_size=self.chunk_size,\n            max_retries=self.max_retries,\n            timeout=self.request_timeout or None,\n            tiktoken_enabled=self.tiktoken_enable,\n            tiktoken_model_name=self.tiktoken_model_name or None,\n            show_progress_bar=self.show_progress_bar,\n            model_kwargs=self.model_kwargs,\n            skip_empty=self.skip_empty,\n            default_headers=self.default_headers or None,\n            default_query=self.default_query or None,\n        )\n"
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1320-              },
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1321-              "default_headers": {
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1322-                "_input_type": "DictInput",
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1323-                "advanced": true,
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1324-                "display_name": "Default Headers",
--
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1850-                "title_case": false,
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1851-                "type": "code",
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json:1852:                "value": "from langchain_openai import OpenAIEmbeddings\n\nfrom langflow.base.embeddings.model import LCEmbeddingsModel\nfrom langflow.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom langflow.field_typing import Embeddings\nfrom langflow.io import BoolInput, DictInput, DropdownInput, FloatInput, IntInput, MessageTextInput, SecretStrInput\n\n\nclass OpenAIEmbeddingsComponent(LCEmbeddingsModel):\n    display_name = \"OpenAI Embeddings\"\n    description = \"Generate embeddings using OpenAI models.\"\n    icon = \"OpenAI\"\n    name = \"OpenAIEmbeddings\"\n\n    inputs = [\n        DictInput(\n            name=\"default_headers\",\n            display_name=\"Default Headers\",\n            advanced=True,\n            info=\"Default headers to use for the API request.\",\n        ),\n        DictInput(\n            name=\"default_query\",\n            display_name=\"Default Query\",\n            advanced=True,\n            info=\"Default query parameters to use for the API request.\",\n        ),\n        IntInput(name=\"chunk_size\", display_name=\"Chunk Size\", advanced=True, value=1000),\n        MessageTextInput(name=\"client\", display_name=\"Client\", advanced=True),\n        MessageTextInput(name=\"deployment\", display_name=\"Deployment\", advanced=True),\n        IntInput(name=\"embedding_ctx_length\", display_name=\"Embedding Context Length\", advanced=True, value=1536),\n        IntInput(name=\"max_retries\", display_name=\"Max Retries\", value=3, advanced=True),\n        DropdownInput(\n            name=\"model\",\n            display_name=\"Model\",\n            advanced=False,\n            options=OPENAI_EMBEDDING_MODEL_NAMES,\n            value=\"text-embedding-3-small\",\n        ),\n        DictInput(name=\"model_kwargs\", display_name=\"Model Kwargs\", advanced=True),\n        SecretStrInput(name=\"openai_api_key\", display_name=\"OpenAI API Key\", value=\"OPENAI_API_KEY\", required=True),\n        MessageTextInput(name=\"openai_api_base\", display_name=\"OpenAI API Base\", advanced=True),\n        MessageTextInput(name=\"openai_api_type\", display_name=\"OpenAI API Type\", advanced=True),\n        MessageTextInput(name=\"openai_api_version\", display_name=\"OpenAI API Version\", advanced=True),\n        MessageTextInput(\n            name=\"openai_organization\",\n            display_name=\"OpenAI Organization\",\n            advanced=True,\n        ),\n        MessageTextInput(name=\"openai_proxy\", display_name=\"OpenAI Proxy\", advanced=True),\n        FloatInput(name=\"request_timeout\", display_name=\"Request Timeout\", advanced=True),\n        BoolInput(name=\"show_progress_bar\", display_name=\"Show Progress Bar\", advanced=True),\n        BoolInput(name=\"skip_empty\", display_name=\"Skip Empty\", advanced=True),\n        MessageTextInput(\n            name=\"tiktoken_model_name\",\n            display_name=\"TikToken Model Name\",\n            advanced=True,\n        ),\n        BoolInput(\n            name=\"tiktoken_enable\",\n            display_name=\"TikToken Enable\",\n            advanced=True,\n            value=True,\n            info=\"If False, you must have transformers installed.\",\n        ),\n        IntInput(\n            name=\"dimensions\",\n            display_name=\"Dimensions\",\n            info=\"The number of dimensions the resulting output embeddings should have. \"\n            \"Only supported by certain models.\",\n            advanced=True,\n        ),\n    ]\n\n    def build_embeddings(self) -> Embeddings:\n        return OpenAIEmbeddings(\n            client=self.client or None,\n            model=self.model,\n            dimensions=self.dimensions or None,\n            deployment=self.deployment or None,\n            api_version=self.openai_api_version or None,\n            base_url=self.openai_api_base or None,\n            openai_api_type=self.openai_api_type or None,\n            openai_proxy=self.openai_proxy or None,\n            embedding_ctx_length=self.embedding_ctx_length,\n            api_key=self.openai_api_key or None,\n            organization=self.openai_organization or None,\n            allowed_special=\"all\",\n            disallowed_special=\"all\",\n            chunk_size=self.chunk_size,\n            max_retries=self.max_retries,\n            timeout=self.request_timeout or None,\n            tiktoken_enabled=self.tiktoken_enable,\n            tiktoken_model_name=self.tiktoken_model_name or None,\n            show_progress_bar=self.show_progress_bar,\n            model_kwargs=self.model_kwargs,\n            skip_empty=self.skip_empty,\n            default_headers=self.default_headers or None,\n            default_query=self.default_query or None,\n        )\n"
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1853-              },
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1854-              "default_headers": {
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1855-                "_input_type": "DictInput",
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1856-                "advanced": true,
src/lfx/tests/data/starter_projects_1_6_0/Vector Store RAG.json-1857-                "display_name": "Default Headers",
--
src/lfx/tests/data/starter_projects_1_6_0/Knowledge Ingestion.json-786-                "title_case": false,
src/lfx/tests/data/starter_projects_1_6_0/Knowledge Ingestion.json-787-                "type": "code",
src/lfx/tests/data/starter_projects_1_6_0/Knowledge Ingestion.json:788:                "value": "from __future__ import annotations\n\nimport asyncio\nimport contextlib\nimport hashlib\nimport json\nimport re\nimport uuid\nfrom dataclasses import asdict, dataclass, field\nfrom datetime import datetime, timezone\nfrom pathlib import Path\nfrom typing import TYPE_CHECKING, Any\n\nimport pandas as pd\nfrom cryptography.fernet import InvalidToken\nfrom langchain_chroma import Chroma\nfrom loguru import logger\n\nfrom langflow.base.knowledge_bases.knowledge_base_utils import get_knowledge_bases\nfrom langflow.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES\nfrom langflow.components.processing.converter import convert_to_dataframe\nfrom langflow.custom import Component\nfrom langflow.io import (\n    BoolInput,\n    DropdownInput,\n    HandleInput,\n    IntInput,\n    Output,\n    SecretStrInput,\n    StrInput,\n    TableInput,\n)\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict  # noqa: TC001\nfrom langflow.schema.table import EditMode\nfrom langflow.services.auth.utils import decrypt_api_key, encrypt_api_key\nfrom langflow.services.database.models.user.crud import get_user_by_id\nfrom langflow.services.deps import (\n    get_settings_service,\n    get_variable_service,\n    session_scope,\n)\n\nif TYPE_CHECKING:\n    from langflow.schema.dataframe import DataFrame\n\nHUGGINGFACE_MODEL_NAMES = [\n    \"sentence-transformers/all-MiniLM-L6-v2\",\n    \"sentence-transformers/all-mpnet-base-v2\",\n]\nCOHERE_MODEL_NAMES = [\"embed-english-v3.0\", \"embed-multilingual-v3.0\"]\n\nsettings = get_settings_service().settings\nknowledge_directory = settings.knowledge_bases_dir\nif not knowledge_directory:\n    msg = \"Knowledge bases directory is not set in the settings.\"\n    raise ValueError(msg)\nKNOWLEDGE_BASES_ROOT_PATH = Path(knowledge_directory).expanduser()\n\n\nclass KnowledgeIngestionComponent(Component):\n    \"\"\"Create or append to Langflow Knowledge from a DataFrame.\"\"\"\n\n    # ------ UI metadata ---------------------------------------------------\n    display_name = \"Knowledge Ingestion\"\n    description = \"Create or update knowledge in Langflow.\"\n    icon = \"upload\"\n    name = \"KnowledgeIngestion\"\n\n    def __init__(self, *args, **kwargs) -> None:\n        super().__init__(*args, **kwargs)\n        self._cached_kb_path: Path | None = None\n\n    @dataclass\n    class NewKnowledgeBaseInput:\n        functionality: str = \"create\"\n        fields: dict[str, dict] = field(\n            default_factory=lambda: {\n                \"data\": {\n                    \"node\": {\n                        \"name\": \"create_knowledge_base\",\n                        \"description\": \"Create new knowledge in Langflow.\",\n                        \"display_name\": \"Create new knowledge\",\n                        \"field_order\": [\n                            \"01_new_kb_name\",\n                            \"02_embedding_model\",\n                            \"03_api_key\",\n                        ],\n                        \"template\": {\n                            \"01_new_kb_name\": StrInput(\n                                name=\"new_kb_name\",\n                                display_name=\"Knowledge Name\",\n                                info=\"Name of the new knowledge to create.\",\n                                required=True,\n                            ),\n                            \"02_embedding_model\": DropdownInput(\n                                name=\"embedding_model\",\n                                display_name=\"Choose Embedding\",\n                                info=\"Select the embedding model to use for this knowledge base.\",\n                                required=True,\n                                options=OPENAI_EMBEDDING_MODEL_NAMES + HUGGINGFACE_MODEL_NAMES + COHERE_MODEL_NAMES,\n                                options_metadata=[{\"icon\": \"OpenAI\"} for _ in OPENAI_EMBEDDING_MODEL_NAMES]\n                                + [{\"icon\": \"HuggingFace\"} for _ in HUGGINGFACE_MODEL_NAMES]\n                                + [{\"icon\": \"Cohere\"} for _ in COHERE_MODEL_NAMES],\n                            ),\n                            \"03_api_key\": SecretStrInput(\n                                name=\"api_key\",\n                                display_name=\"API Key\",\n                                info=\"Provider API key for embedding model\",\n                                required=True,\n                                load_from_db=False,\n                            ),\n                        },\n                    },\n                }\n            }\n        )\n\n    # ------ Inputs --------------------------------------------------------\n    inputs = [\n        DropdownInput(\n            name=\"knowledge_base\",\n            display_name=\"Knowledge\",\n            info=\"Select the knowledge to load data from.\",\n            required=True,\n            options=[],\n            refresh_button=True,\n            real_time_refresh=True,\n            dialog_inputs=asdict(NewKnowledgeBaseInput()),\n        ),\n        HandleInput(\n            name=\"input_df\",\n            display_name=\"Input\",\n            info=(\n                \"Table with all original columns (already chunked / processed). \"\n                \"Accepts Data or DataFrame. If Data is provided, it is converted to a DataFrame automatically.\"\n            ),\n            input_types=[\"Data\", \"DataFrame\"],\n            required=True,\n        ),\n        TableInput(\n            name=\"column_config\",\n            display_name=\"Column Configuration\",\n            info=\"Configure column behavior for the knowledge base.\",\n            required=True,\n            table_schema=[\n                {\n                    \"name\": \"column_name\",\n                    \"display_name\": \"Column Name\",\n                    \"type\": \"str\",\n                    \"description\": \"Name of the column in the source DataFrame\",\n                    \"edit_mode\": EditMode.INLINE,\n                },\n                {\n                    \"name\": \"vectorize\",\n                    \"display_name\": \"Vectorize\",\n                    \"type\": \"boolean\",\n                    \"description\": \"Create embeddings for this column\",\n                    \"default\": False,\n                    \"edit_mode\": EditMode.INLINE,\n                },\n                {\n                    \"name\": \"identifier\",\n                    \"display_name\": \"Identifier\",\n                    \"type\": \"boolean\",\n                    \"description\": \"Use this column as unique identifier\",\n                    \"default\": False,\n                    \"edit_mode\": EditMode.INLINE,\n                },\n            ],\n            value=[\n                {\n                    \"column_name\": \"text\",\n                    \"vectorize\": True,\n                    \"identifier\": True,\n                },\n            ],\n        ),\n        IntInput(\n            name=\"chunk_size\",\n            display_name=\"Chunk Size\",\n            info=\"Batch size for processing embeddings\",\n            advanced=True,\n            value=1000,\n        ),\n        SecretStrInput(\n            name=\"api_key\",\n            display_name=\"Embedding Provider API Key\",\n            info=\"API key for the embedding provider to generate embeddings.\",\n            advanced=True,\n            required=False,\n        ),\n        BoolInput(\n            name=\"allow_duplicates\",\n            display_name=\"Allow Duplicates\",\n            info=\"Allow duplicate rows in the knowledge base\",\n            advanced=True,\n            value=False,\n        ),\n    ]\n\n    # ------ Outputs -------------------------------------------------------\n    outputs = [Output(display_name=\"Results\", name=\"dataframe_output\", method=\"build_kb_info\")]\n\n    # ------ Internal helpers ---------------------------------------------\n    def _get_kb_root(self) -> Path:\n        \"\"\"Return the root directory for knowledge bases.\"\"\"\n        return KNOWLEDGE_BASES_ROOT_PATH\n\n    def _validate_column_config(self, df_source: pd.DataFrame) -> list[dict[str, Any]]:\n        \"\"\"Validate column configuration using Structured Output patterns.\"\"\"\n        if not self.column_config:\n            msg = \"Column configuration cannot be empty\"\n            raise ValueError(msg)\n\n        # Convert table input to list of dicts (similar to Structured Output)\n        config_list = self.column_config if isinstance(self.column_config, list) else []\n\n        # Validate column names exist in DataFrame\n        df_columns = set(df_source.columns)\n        for config in config_list:\n            col_name = config.get(\"column_name\")\n            if col_name not in df_columns:\n                msg = f\"Column '{col_name}' not found in DataFrame. Available columns: {sorted(df_columns)}\"\n                raise ValueError(msg)\n\n        return config_list\n\n    def _get_embedding_provider(self, embedding_model: str) -> str:\n        \"\"\"Get embedding provider by matching model name to lists.\"\"\"\n        if embedding_model in OPENAI_EMBEDDING_MODEL_NAMES:\n            return \"OpenAI\"\n        if embedding_model in HUGGINGFACE_MODEL_NAMES:\n            return \"HuggingFace\"\n        if embedding_model in COHERE_MODEL_NAMES:\n            return \"Cohere\"\n        return \"Custom\"\n\n    def _build_embeddings(self, embedding_model: str, api_key: str):\n        \"\"\"Build embedding model using provider patterns.\"\"\"\n        # Get provider by matching model name to lists\n        provider = self._get_embedding_provider(embedding_model)\n\n        # Validate provider and model\n        if provider == \"OpenAI\":\n            from langchain_openai import OpenAIEmbeddings\n\n            if not api_key:\n                msg = \"OpenAI API key is required when using OpenAI provider\"\n                raise ValueError(msg)\n            return OpenAIEmbeddings(\n                model=embedding_model,\n                api_key=api_key,\n                chunk_size=self.chunk_size,\n            )\n        if provider == \"HuggingFace\":\n            from langchain_huggingface import HuggingFaceEmbeddings\n\n            return HuggingFaceEmbeddings(\n                model=embedding_model,\n            )\n        if provider == \"Cohere\":\n            from langchain_cohere import CohereEmbeddings\n\n            if not api_key:\n                msg = \"Cohere API key is required when using Cohere provider\"\n                raise ValueError(msg)\n            return CohereEmbeddings(\n                model=embedding_model,\n                cohere_api_key=api_key,\n            )\n        if provider == \"Custom\":\n            # For custom embedding models, we would need additional configuration\n            msg = \"Custom embedding models not yet supported\"\n            raise NotImplementedError(msg)\n        msg = f\"Unknown provider: {provider}\"\n        raise ValueError(msg)\n\n    def _build_embedding_metadata(self, embedding_model, api_key) -> dict[str, Any]:\n        \"\"\"Build embedding model metadata.\"\"\"\n        # Get provider by matching model name to lists\n        embedding_provider = self._get_embedding_provider(embedding_model)\n\n        api_key_to_save = None\n        if api_key and hasattr(api_key, \"get_secret_value\"):\n            api_key_to_save = api_key.get_secret_value()\n        elif isinstance(api_key, str):\n            api_key_to_save = api_key\n\n        encrypted_api_key = None\n        if api_key_to_save:\n            settings_service = get_settings_service()\n            try:\n                encrypted_api_key = encrypt_api_key(api_key_to_save, settings_service=settings_service)\n            except (TypeError, ValueError) as e:\n                self.log(f\"Could not encrypt API key: {e}\")\n                logger.error(f\"Could not encrypt API key: {e}\")\n\n        return {\n            \"embedding_provider\": embedding_provider,\n            \"embedding_model\": embedding_model,\n            \"api_key\": encrypted_api_key,\n            \"api_key_used\": bool(api_key),\n            \"chunk_size\": self.chunk_size,\n            \"created_at\": datetime.now(timezone.utc).isoformat(),\n        }\n\n    def _save_embedding_metadata(self, kb_path: Path, embedding_model: str, api_key: str) -> None:\n        \"\"\"Save embedding model metadata.\"\"\"\n        embedding_metadata = self._build_embedding_metadata(embedding_model, api_key)\n        metadata_path = kb_path / \"embedding_metadata.json\"\n        metadata_path.write_text(json.dumps(embedding_metadata, indent=2))\n\n    def _save_kb_files(\n        self,\n        kb_path: Path,\n        config_list: list[dict[str, Any]],\n    ) -> None:\n        \"\"\"Save KB files using File Component storage patterns.\"\"\"\n        try:\n            # Create directory (following File Component patterns)\n            kb_path.mkdir(parents=True, exist_ok=True)\n\n            # Save column configuration\n            # Only do this if the file doesn't exist already\n            cfg_path = kb_path / \"schema.json\"\n            if not cfg_path.exists():\n                cfg_path.write_text(json.dumps(config_list, indent=2))\n\n        except (OSError, TypeError, ValueError) as e:\n            self.log(f\"Error saving KB files: {e}\")\n\n    def _build_column_metadata(self, config_list: list[dict[str, Any]], df_source: pd.DataFrame) -> dict[str, Any]:\n        \"\"\"Build detailed column metadata.\"\"\"\n        metadata: dict[str, Any] = {\n            \"total_columns\": len(df_source.columns),\n            \"mapped_columns\": len(config_list),\n            \"unmapped_columns\": len(df_source.columns) - len(config_list),\n            \"columns\": [],\n            \"summary\": {\"vectorized_columns\": [], \"identifier_columns\": []},\n        }\n\n        for config in config_list:\n            col_name = config.get(\"column_name\")\n            vectorize = config.get(\"vectorize\") == \"True\" or config.get(\"vectorize\") is True\n            identifier = config.get(\"identifier\") == \"True\" or config.get(\"identifier\") is True\n\n            # Add to columns list\n            metadata[\"columns\"].append(\n                {\n                    \"name\": col_name,\n                    \"vectorize\": vectorize,\n                    \"identifier\": identifier,\n                }\n            )\n\n            # Update summary\n            if vectorize:\n                metadata[\"summary\"][\"vectorized_columns\"].append(col_name)\n            if identifier:\n                metadata[\"summary\"][\"identifier_columns\"].append(col_name)\n\n        return metadata\n\n    async def _create_vector_store(\n        self,\n        df_source: pd.DataFrame,\n        config_list: list[dict[str, Any]],\n        embedding_model: str,\n        api_key: str,\n    ) -> None:\n        \"\"\"Create vector store following Local DB component pattern.\"\"\"\n        try:\n            # Set up vector store directory\n            vector_store_dir = await self._kb_path()\n            if not vector_store_dir:\n                msg = \"Knowledge base path is not set. Please create a new knowledge base first.\"\n                raise ValueError(msg)\n            vector_store_dir.mkdir(parents=True, exist_ok=True)\n\n            # Create embeddings model\n            embedding_function = self._build_embeddings(embedding_model, api_key)\n\n            # Convert DataFrame to Data objects (following Local DB pattern)\n            data_objects = await self._convert_df_to_data_objects(df_source, config_list)\n\n            # Create vector store\n            chroma = Chroma(\n                persist_directory=str(vector_store_dir),\n                embedding_function=embedding_function,\n                collection_name=self.knowledge_base,\n            )\n\n            # Convert Data objects to LangChain Documents\n            documents = []\n            for data_obj in data_objects:\n                doc = data_obj.to_lc_document()\n                documents.append(doc)\n\n            # Add documents to vector store\n            if documents:\n                chroma.add_documents(documents)\n                self.log(f\"Added {len(documents)} documents to vector store '{self.knowledge_base}'\")\n\n        except (OSError, ValueError, RuntimeError) as e:\n            self.log(f\"Error creating vector store: {e}\")\n\n    async def _convert_df_to_data_objects(\n        self, df_source: pd.DataFrame, config_list: list[dict[str, Any]]\n    ) -> list[Data]:\n        \"\"\"Convert DataFrame to Data objects for vector store.\"\"\"\n        data_objects: list[Data] = []\n\n        # Set up vector store directory\n        kb_path = await self._kb_path()\n\n        # If we don't allow duplicates, we need to get the existing hashes\n        chroma = Chroma(\n            persist_directory=str(kb_path),\n            collection_name=self.knowledge_base,\n        )\n\n        # Get all documents and their metadata\n        all_docs = chroma.get()\n\n        # Extract all _id values from metadata\n        id_list = [metadata.get(\"_id\") for metadata in all_docs[\"metadatas\"] if metadata.get(\"_id\")]\n\n        # Get column roles\n        content_cols = []\n        identifier_cols = []\n\n        for config in config_list:\n            col_name = config.get(\"column_name\")\n            vectorize = config.get(\"vectorize\") == \"True\" or config.get(\"vectorize\") is True\n            identifier = config.get(\"identifier\") == \"True\" or config.get(\"identifier\") is True\n\n            if vectorize:\n                content_cols.append(col_name)\n            elif identifier:\n                identifier_cols.append(col_name)\n\n        # Convert each row to a Data object\n        for _, row in df_source.iterrows():\n            # Build content text from identifier columns using list comprehension\n            identifier_parts = [str(row[col]) for col in content_cols if col in row and pd.notna(row[col])]\n\n            # Join all parts into a single string\n            page_content = \" \".join(identifier_parts)\n\n            # Build metadata from NON-vectorized columns only (simple key-value pairs)\n            data_dict = {\n                \"text\": page_content,  # Main content for vectorization\n            }\n\n            # Add identifier columns if they exist\n            if identifier_cols:\n                identifier_parts = [str(row[col]) for col in identifier_cols if col in row and pd.notna(row[col])]\n                page_content = \" \".join(identifier_parts)\n\n            # Add metadata columns as simple key-value pairs\n            for col in df_source.columns:\n                if col not in content_cols and col in row and pd.notna(row[col]):\n                    # Convert to simple types for Chroma metadata\n                    value = row[col]\n                    data_dict[col] = str(value)  # Convert complex types to string\n\n            # Hash the page_content for unique ID\n            page_content_hash = hashlib.sha256(page_content.encode()).hexdigest()\n            data_dict[\"_id\"] = page_content_hash\n\n            # If duplicates are disallowed, and hash exists, prevent adding this row\n            if not self.allow_duplicates and page_content_hash in id_list:\n                self.log(f\"Skipping duplicate row with hash {page_content_hash}\")\n                continue\n\n            # Create Data object - everything except \"text\" becomes metadata\n            data_obj = Data(data=data_dict)\n            data_objects.append(data_obj)\n\n        return data_objects\n\n    def is_valid_collection_name(self, name, min_length: int = 3, max_length: int = 63) -> bool:\n        \"\"\"Validates collection name against conditions 1-3.\n\n        1. Contains 3-63 characters\n        2. Starts and ends with alphanumeric character\n        3. Contains only alphanumeric characters, underscores, or hyphens.\n\n        Args:\n            name (str): Collection name to validate\n            min_length (int): Minimum length of the name\n            max_length (int): Maximum length of the name\n\n        Returns:\n            bool: True if valid, False otherwise\n        \"\"\"\n        # Check length (condition 1)\n        if not (min_length <= len(name) <= max_length):\n            return False\n\n        # Check start/end with alphanumeric (condition 2)\n        if not (name[0].isalnum() and name[-1].isalnum()):\n            return False\n\n        # Check allowed characters (condition 3)\n        return re.match(r\"^[a-zA-Z0-9_-]+$\", name) is not None\n\n    async def _kb_path(self) -> Path | None:\n        # Check if we already have the path cached\n        cached_path = getattr(self, \"_cached_kb_path\", None)\n        if cached_path is not None:\n            return cached_path\n\n        # If not cached, compute it\n        async with session_scope() as db:\n            if not self.user_id:\n                msg = \"User ID is required for fetching knowledge base path.\"\n                raise ValueError(msg)\n            current_user = await get_user_by_id(db, self.user_id)\n            if not current_user:\n                msg = f\"User with ID {self.user_id} not found.\"\n                raise ValueError(msg)\n            kb_user = current_user.username\n\n        kb_root = self._get_kb_root()\n\n        # Cache the result\n        self._cached_kb_path = kb_root / kb_user / self.knowledge_base\n\n        return self._cached_kb_path\n\n    # ---------------------------------------------------------------------\n    #                         OUTPUT METHODS\n    # ---------------------------------------------------------------------\n    async def build_kb_info(self) -> Data:\n        \"\"\"Main ingestion routine → returns a dict with KB metadata.\"\"\"\n        try:\n            input_value = self.input_df[0] if isinstance(self.input_df, list) else self.input_df\n            df_source: DataFrame = convert_to_dataframe(input_value)\n\n            # Validate column configuration (using Structured Output patterns)\n            config_list = self._validate_column_config(df_source)\n            column_metadata = self._build_column_metadata(config_list, df_source)\n\n            # Read the embedding info from the knowledge base folder\n            kb_path = await self._kb_path()\n            if not kb_path:\n                msg = \"Knowledge base path is not set. Please create a new knowledge base first.\"\n                raise ValueError(msg)\n            metadata_path = kb_path / \"embedding_metadata.json\"\n\n            # If the API key is not provided, try to read it from the metadata file\n            if metadata_path.exists():\n                settings_service = get_settings_service()\n                metadata = json.loads(metadata_path.read_text())\n                embedding_model = metadata.get(\"embedding_model\")\n                try:\n                    api_key = decrypt_api_key(metadata[\"api_key\"], settings_service)\n                except (InvalidToken, TypeError, ValueError) as e:\n                    logger.error(f\"Could not decrypt API key. Please provide it manually. Error: {e}\")\n\n            # Check if a custom API key was provided, update metadata if so\n            if self.api_key:\n                api_key = self.api_key\n                self._save_embedding_metadata(\n                    kb_path=kb_path,\n                    embedding_model=embedding_model,\n                    api_key=api_key,\n                )\n\n            # Create vector store following Local DB component pattern\n            await self._create_vector_store(df_source, config_list, embedding_model=embedding_model, api_key=api_key)\n\n            # Save KB files (using File Component storage patterns)\n            self._save_kb_files(kb_path, config_list)\n\n            # Build metadata response\n            meta: dict[str, Any] = {\n                \"kb_id\": str(uuid.uuid4()),\n                \"kb_name\": self.knowledge_base,\n                \"rows\": len(df_source),\n                \"column_metadata\": column_metadata,\n                \"path\": str(kb_path),\n                \"config_columns\": len(config_list),\n                \"timestamp\": datetime.now(tz=timezone.utc).isoformat(),\n            }\n\n            # Set status message\n            self.status = f\"✅ KB **{self.knowledge_base}** saved · {len(df_source)} chunks.\"\n\n            return Data(data=meta)\n\n        except (OSError, ValueError, RuntimeError, KeyError) as e:\n            msg = f\"Error during KB ingestion: {e}\"\n            raise RuntimeError(msg) from e\n\n    async def _get_api_key_variable(self, field_value: dict[str, Any]):\n        async with session_scope() as db:\n            if not self.user_id:\n                msg = \"User ID is required for fetching global variables.\"\n                raise ValueError(msg)\n            current_user = await get_user_by_id(db, self.user_id)\n            if not current_user:\n                msg = f\"User with ID {self.user_id} not found.\"\n                raise ValueError(msg)\n            variable_service = get_variable_service()\n\n            # Process the api_key field variable\n            return await variable_service.get_variable(\n                user_id=current_user.id,\n                name=field_value[\"03_api_key\"],\n                field=\"\",\n                session=db,\n            )\n\n    async def update_build_config(\n        self,\n        build_config: dotdict,\n        field_value: Any,\n        field_name: str | None = None,\n    ) -> dotdict:\n        \"\"\"Update build configuration based on provider selection.\"\"\"\n        # Create a new knowledge base\n        if field_name == \"knowledge_base\":\n            async with session_scope() as db:\n                if not self.user_id:\n                    msg = \"User ID is required for fetching knowledge base list.\"\n                    raise ValueError(msg)\n                current_user = await get_user_by_id(db, self.user_id)\n                if not current_user:\n                    msg = f\"User with ID {self.user_id} not found.\"\n                    raise ValueError(msg)\n                kb_user = current_user.username\n            if isinstance(field_value, dict) and \"01_new_kb_name\" in field_value:\n                # Validate the knowledge base name - Make sure it follows these rules:\n                if not self.is_valid_collection_name(field_value[\"01_new_kb_name\"]):\n                    msg = f\"Invalid knowledge base name: {field_value['01_new_kb_name']}\"\n                    raise ValueError(msg)\n\n                api_key = field_value.get(\"03_api_key\", None)\n                with contextlib.suppress(Exception):\n                    # If the API key is a variable, resolve it\n                    api_key = await self._get_api_key_variable(field_value)\n\n                # Make sure api_key is a string\n                if not isinstance(api_key, str):\n                    msg = \"API key must be a string.\"\n                    raise ValueError(msg)\n\n                # We need to test the API Key one time against the embedding model\n                embed_model = self._build_embeddings(embedding_model=field_value[\"02_embedding_model\"], api_key=api_key)\n\n                # Try to generate a dummy embedding to validate the API key without blocking the event loop\n                try:\n                    await asyncio.wait_for(\n                        asyncio.to_thread(embed_model.embed_query, \"test\"),\n                        timeout=10,\n                    )\n                except TimeoutError as e:\n                    msg = \"Embedding validation timed out. Please verify network connectivity and key.\"\n                    raise ValueError(msg) from e\n                except Exception as e:\n                    msg = f\"Embedding validation failed: {e!s}\"\n                    raise ValueError(msg) from e\n\n                # Create the new knowledge base directory\n                kb_path = KNOWLEDGE_BASES_ROOT_PATH / kb_user / field_value[\"01_new_kb_name\"]\n                kb_path.mkdir(parents=True, exist_ok=True)\n\n                # Save the embedding metadata\n                build_config[\"knowledge_base\"][\"value\"] = field_value[\"01_new_kb_name\"]\n                self._save_embedding_metadata(\n                    kb_path=kb_path,\n                    embedding_model=field_value[\"02_embedding_model\"],\n                    api_key=api_key,\n                )\n\n            # Update the knowledge base options dynamically\n            build_config[\"knowledge_base\"][\"options\"] = await get_knowledge_bases(\n                KNOWLEDGE_BASES_ROOT_PATH,\n                user_id=self.user_id,\n            )\n\n            # If the selected knowledge base is not available, reset it\n            if build_config[\"knowledge_base\"][\"value\"] not in build_config[\"knowledge_base\"][\"options\"]:\n                build_config[\"knowledge_base\"][\"value\"] = None\n\n        return build_config\n"
src/lfx/tests/data/starter_projects_1_6_0/Knowledge Ingestion.json-789-              },
src/lfx/tests/data/starter_projects_1_6_0/Knowledge Ingestion.json-790-              "column_config": {
src/lfx/tests/data/starter_projects_1_6_0/Knowledge Ingestion.json-791-                "_input_type": "TableInput",
src/lfx/tests/data/starter_projects_1_6_0/Knowledge Ingestion.json-792-                "advanced": false,
src/lfx/tests/data/starter_projects_1_6_0/Knowledge Ingestion.json-793-                "display_name": "Column Configuration",
--
src/lfx/src/lfx/components/openai/openai.py-2-
src/lfx/src/lfx/components/openai/openai.py-3-from lfx.base.embeddings.model import LCEmbeddingsModel
src/lfx/src/lfx/components/openai/openai.py:4:from lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES
src/lfx/src/lfx/components/openai/openai.py-5-from lfx.field_typing import Embeddings
src/lfx/src/lfx/components/openai/openai.py-6-from lfx.io import BoolInput, DictInput, DropdownInput, FloatInput, IntInput, MessageTextInput, SecretStrInput
src/lfx/src/lfx/components/openai/openai.py-7-
src/lfx/src/lfx/components/openai/openai.py-8-
src/lfx/src/lfx/components/openai/openai.py-9-class OpenAIEmbeddingsComponent(LCEmbeddingsModel):
--
src/lfx/src/lfx/components/openai/openai.py-35-            display_name="Model",
src/lfx/src/lfx/components/openai/openai.py-36-            advanced=False,
src/lfx/src/lfx/components/openai/openai.py:37:            options=OPENAI_EMBEDDING_MODEL_NAMES,
src/lfx/src/lfx/components/openai/openai.py-38-            value="text-embedding-3-small",
src/lfx/src/lfx/components/openai/openai.py-39-        ),
src/lfx/src/lfx/components/openai/openai.py-40-        DictInput(name="model_kwargs", display_name="Model Kwargs", advanced=True),
src/lfx/src/lfx/components/openai/openai.py-41-        SecretStrInput(name="openai_api_key", display_name="OpenAI API Key", value="OPENAI_API_KEY", required=True),
src/lfx/src/lfx/components/openai/openai.py-42-        MessageTextInput(name="openai_api_base", display_name="OpenAI API Base", advanced=True),
--
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-8-from lfx.base.embeddings.model import LCEmbeddingsModel
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-9-from lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:10:from lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-11-from lfx.base.models.watsonx_constants import (
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-12-    IBM_WATSONX_URLS,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-13-    WATSONX_EMBEDDING_MODEL_NAMES,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-14-)
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-15-from lfx.field_typing import Embeddings
--
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-81-            name="model",
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-82-            display_name="Model Name",
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:83:            options=OPENAI_EMBEDDING_MODEL_NAMES,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:84:            value=OPENAI_EMBEDDING_MODEL_NAMES[0],
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-85-            info="Select the embedding model to use",
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-86-            real_time_refresh=True,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-87-            refresh_button=True,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-88-        ),
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-89-     

... [stdout truncated: output exceeded 50000 characters (4027478 total). Please modify your command to reduce output size, for example by:
- Using grep/rg to filter results
- Using head/tail to limit lines
- Using --max-count flag with grep/rg
- Processing output with awk/sed to extract only needed information]

</shell_result>


OpenAI embedding models dimensions parameter support text-embedding-3 ada-002


<web_search_result>
Short answer: The dimensions parameter is supported for the text-embedding-3 family (e.g., text-embedding-3-small / -large) but not for text-embedding-ada-002 (ada-002 uses a fixed 1536-d vector). [1][2][3]

Sources:

  • OpenAI Embeddings guide: default sizes and ability to shorten text-embedding-3 via dimensions. [1]
  • OpenAI announcement: text-embedding-3-large up to 3072 dims and native support for shortening with a dimensions parameter. [2]
  • API docs / reference noting the dimensions parameter is only supported on text-embedding-3 and newer (ada-002 is fixed-size 1536). [3]
    </web_search_result>

Script executed:

# Find and read the openai_constants.py file
fd "openai_constants.py" -type f

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 2

stderr:

error: 'ype' isn't a valid value for '--type <filetype>'
  [possible values: file, directory, symlink, executable, empty, socket, pipe]

For more information try '--help'

</shell_result>


Script executed:

# Read the embedding_model.py file around lines 187-200 for full context
cat -n src/lfx/src/lfx/components/models_and_agents/embedding_model.py | sed -n '160,220p'

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

   160	        base_url_ibm_watsonx = self.base_url_ibm_watsonx
   161	        ollama_base_url = self.ollama_base_url
   162	        dimensions = self.dimensions
   163	        chunk_size = self.chunk_size
   164	        request_timeout = self.request_timeout
   165	        max_retries = self.max_retries
   166	        show_progress_bar = self.show_progress_bar
   167	        model_kwargs = self.model_kwargs or {}
   168	
   169	        if provider == "OpenAI":
   170	            if not api_key:
   171	                msg = "OpenAI API key is required when using OpenAI provider"
   172	                raise ValueError(msg)
   173	
   174	            # Create the primary embedding instance
   175	            embeddings_instance = OpenAIEmbeddings(
   176	                model=model,
   177	                dimensions=dimensions or None,
   178	                base_url=api_base or None,
   179	                api_key=api_key,
   180	                chunk_size=chunk_size,
   181	                max_retries=max_retries,
   182	                timeout=request_timeout or None,
   183	                show_progress_bar=show_progress_bar,
   184	                model_kwargs=model_kwargs,
   185	            )
   186	
   187	            # Create dedicated instances for each available model
   188	            available_models_dict = {}
   189	            for model_name in OPENAI_EMBEDDING_MODEL_NAMES:
   190	                available_models_dict[model_name] = OpenAIEmbeddings(
   191	                    model=model_name,
   192	                    dimensions=dimensions or None,  # Use same dimensions config for all
   193	                    base_url=api_base or None,
   194	                    api_key=api_key,
   195	                    chunk_size=chunk_size,
   196	                    max_retries=max_retries,
   197	                    timeout=request_timeout or None,
   198	                    show_progress_bar=show_progress_bar,
   199	                    model_kwargs=model_kwargs,
   200	                )
   201	
   202	            return EmbeddingsWithModels(
   203	                embeddings=embeddings_instance,
   204	                available_models=available_models_dict,
   205	            )
   206	
   207	        if provider == "Ollama":
   208	            try:
   209	                from langchain_ollama import OllamaEmbeddings
   210	            except ImportError:
   211	                try:
   212	                    from langchain_community.embeddings import OllamaEmbeddings
   213	                except ImportError:
   214	                    msg = "Please install langchain-ollama: pip install langchain-ollama"
   215	                    raise ImportError(msg) from None
   216	
   217	            transformed_base_url = transform_localhost_url(ollama_base_url)
   218	
   219	            # Check if URL contains /v1 suffix (OpenAI-compatible mode)
   220	            if transformed_base_url and transformed_base_url.rstrip("/").endswith("/v1"):

</shell_result>


Script executed:

# Find openai_constants.py file
fd openai_constants.py

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

src/backend/base/langflow/base/models/openai_constants.py
src/lfx/src/lfx/base/models/openai_constants.py

</shell_result>


Script executed:

# Read the lfx openai_constants.py file
cat src/lfx/src/lfx/base/models/openai_constants.py

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

from .model_metadata import create_model_metadata

# Unified model metadata - single source of truth
OPENAI_MODELS_DETAILED = [
    # GPT-5 Series
    create_model_metadata(
        provider="OpenAI",
        name="gpt-5.1",
        icon="OpenAI",
        tool_calling=True,
        reasoning=True,
    ),
    create_model_metadata(
        provider="OpenAI",
        name="gpt-5",
        icon="OpenAI",
        tool_calling=True,
        reasoning=True,
    ),
    create_model_metadata(
        provider="OpenAI",
        name="gpt-5-mini",
        icon="OpenAI",
        tool_calling=True,
        reasoning=True,
    ),
    create_model_metadata(
        provider="OpenAI",
        name="gpt-5-nano",
        icon="OpenAI",
        tool_calling=True,
        reasoning=True,
    ),
    create_model_metadata(
        provider="OpenAI",
        name="gpt-5-chat-latest",
        icon="OpenAI",
        tool_calling=False,
        reasoning=True,
    ),
    # Regular OpenAI Models
    create_model_metadata(provider="OpenAI", name="gpt-4o-mini", icon="OpenAI", tool_calling=True),
    create_model_metadata(provider="OpenAI", name="gpt-4o", icon="OpenAI", tool_calling=True),
    create_model_metadata(provider="OpenAI", name="gpt-4.1", icon="OpenAI", tool_calling=True),
    create_model_metadata(provider="OpenAI", name="gpt-4.1-mini", icon="OpenAI", tool_calling=True),
    create_model_metadata(provider="OpenAI", name="gpt-4.1-nano", icon="OpenAI", tool_calling=True),
    create_model_metadata(
        provider="OpenAI", name="gpt-4.5-preview", icon="OpenAI", tool_calling=True, preview=True, not_supported=True
    ),
    create_model_metadata(provider="OpenAI", name="gpt-4-turbo", icon="OpenAI", tool_calling=True),
    create_model_metadata(
        provider="OpenAI", name="gpt-4-turbo-preview", icon="OpenAI", tool_calling=True, preview=True
    ),
    create_model_metadata(provider="OpenAI", name="gpt-4", icon="OpenAI", tool_calling=True),
    create_model_metadata(provider="OpenAI", name="gpt-3.5-turbo", icon="OpenAI", tool_calling=True),
    # Reasoning Models
    create_model_metadata(provider="OpenAI", name="o1", icon="OpenAI", reasoning=True),
    create_model_metadata(provider="OpenAI", name="o1-mini", icon="OpenAI", reasoning=True, not_supported=True),
    create_model_metadata(provider="OpenAI", name="o1-pro", icon="OpenAI", reasoning=True, not_supported=True),
    create_model_metadata(provider="OpenAI", name="o3-mini", icon="OpenAI", reasoning=True),
    create_model_metadata(provider="OpenAI", name="o3", icon="OpenAI", reasoning=True),
    create_model_metadata(provider="OpenAI", name="o3-pro", icon="OpenAI", reasoning=True),
    create_model_metadata(provider="OpenAI", name="o4-mini", icon="OpenAI", reasoning=True),
    create_model_metadata(provider="OpenAI", name="o4-mini-high", icon="OpenAI", reasoning=True),
    # Search Models
    create_model_metadata(
        provider="OpenAI",
        name="gpt-4o-mini-search-preview",
        icon="OpenAI",
        tool_calling=True,
        search=True,
        preview=True,
    ),
    create_model_metadata(
        provider="OpenAI",
        name="gpt-4o-search-preview",
        icon="OpenAI",
        tool_calling=True,
        search=True,
        preview=True,
    ),
    # Not Supported Models
    create_model_metadata(
        provider="OpenAI", name="computer-use-preview", icon="OpenAI", not_supported=True, preview=True
    ),
    create_model_metadata(
        provider="OpenAI", name="gpt-4o-audio-preview", icon="OpenAI", not_supported=True, preview=True
    ),
    create_model_metadata(
        provider="OpenAI", name="gpt-4o-realtime-preview", icon="OpenAI", not_supported=True, preview=True
    ),
    create_model_metadata(
        provider="OpenAI", name="gpt-4o-mini-audio-preview", icon="OpenAI", not_supported=True, preview=True
    ),
    create_model_metadata(
        provider="OpenAI", name="gpt-4o-mini-realtime-preview", icon="OpenAI", not_supported=True, preview=True
    ),
]
OPENAI_CHAT_MODEL_NAMES = [
    metadata["name"]
    for metadata in OPENAI_MODELS_DETAILED
    if not metadata.get("not_supported", False)
    and not metadata.get("reasoning", False)
    and not metadata.get("search", False)
]

OPENAI_REASONING_MODEL_NAMES = [
    metadata["name"]
    for metadata in OPENAI_MODELS_DETAILED
    if metadata.get("reasoning", False) and not metadata.get("not_supported", False)
]

OPENAI_SEARCH_MODEL_NAMES = [
    metadata["name"]
    for metadata in OPENAI_MODELS_DETAILED
    if metadata.get("search", False) and not metadata.get("not_supported", False)
]

NOT_SUPPORTED_MODELS = [metadata["name"] for metadata in OPENAI_MODELS_DETAILED if metadata.get("not_supported", False)]

OPENAI_EMBEDDING_MODEL_NAMES = [
    "text-embedding-3-small",
    "text-embedding-3-large",
    "text-embedding-ada-002",
]

# Backwards compatibility
MODEL_NAMES = OPENAI_CHAT_MODEL_NAMES
OPENAI_MODEL_NAMES = OPENAI_CHAT_MODEL_NAMES

</shell_result>


Both performance and dimension compatibility concerns are confirmed; the suggested fix is accurate.

The code creates embedding instances for all three OpenAI models (text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002) upfront on lines 189–200, regardless of which model is selected. Additionally, the dimensions parameter is applied identically to all models on line 192.

Issues confirmed:

  1. Eager instantiation is wasteful: Creating instances for all models when typically only one is used.
  2. Dimension compatibility bug: Only text-embedding-3-* models support the dimensions parameter; text-embedding-ada-002 uses a fixed 1536-dimensional embedding and does not accept this parameter. Passing a non-None value would cause an API error.

The suggested fix—filtering dimensions based on model name—is correct and matches OpenAI's API behavior:

             # Create dedicated instances for each available model
             available_models_dict = {}
             for model_name in OPENAI_EMBEDDING_MODEL_NAMES:
+                # Only apply dimensions to models that support it
+                model_dimensions = dimensions if model_name.startswith("text-embedding-3") else None
                 available_models_dict[model_name] = OpenAIEmbeddings(
                     model=model_name,
-                    dimensions=dimensions or None,  # Use same dimensions config for all
+                    dimensions=model_dimensions,
                     base_url=api_base or None,
                     api_key=api_key,
                     chunk_size=chunk_size,
                     max_retries=max_retries,
                     timeout=request_timeout or None,
                     show_progress_bar=show_progress_bar,
                     model_kwargs=model_kwargs,
                 )
🤖 Prompt for AI Agents
In src/lfx/src/lfx/components/models_and_agents/embedding_model.py around lines
187–200, fix two issues: stop passing a global dimensions value to models that
don't accept it (text-embedding-ada-002) and avoid eagerly creating all model
instances. Update the loop to set model_dimensions = dimensions only for models
whose name starts with "text-embedding-3" (otherwise use None) and either
instantiate models lazily (store a factory/callable and create only when
requested) or only create the single selected model instead of pre-instantiating
all of them; ensure other parameters remain unchanged.

Comment on lines +303 to +304
# Fetch available IBM watsonx.ai models
available_model_names = self.fetch_ibm_models(final_url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Blocking I/O in async context.

fetch_ibm_models uses synchronous requests.get() (lines 138-153), which blocks the event loop when called from this async method. This can cause performance degradation and unresponsiveness in async applications.

Consider either:

  1. Converting fetch_ibm_models to an async method using httpx.AsyncClient
  2. Running it in a thread executor:
+            import asyncio
+            
             # Fetch available IBM watsonx.ai models
-            available_model_names = self.fetch_ibm_models(final_url)
+            available_model_names = await asyncio.get_event_loop().run_in_executor(
+                None, self.fetch_ibm_models, final_url
+            )

Alternatively, convert fetch_ibm_models to async similar to get_ollama_models.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Fetch available IBM watsonx.ai models
available_model_names = self.fetch_ibm_models(final_url)
import asyncio
# Fetch available IBM watsonx.ai models
available_model_names = await asyncio.get_event_loop().run_in_executor(
None, self.fetch_ibm_models, final_url
)
🤖 Prompt for AI Agents
In src/lfx/src/lfx/components/models_and_agents/embedding_model.py around lines
303-304, the call to fetch_ibm_models(final_url) is invoked from an async method
but fetch_ibm_models (lines ~138-153) uses blocking requests.get(), which will
block the event loop; fix by either converting fetch_ibm_models to an async
function using an async HTTP client (e.g., httpx.AsyncClient) and await it where
called, or keep it sync but call it via an executor (e.g., await
asyncio.to_thread(self.fetch_ibm_models, final_url)); update the function
signature and imports accordingly and ensure error handling and timeouts are
preserved in the chosen async approach.

Comment on lines +306 to +314
# Create dedicated instances for each available model
available_models_dict = {}
for model_name in available_model_names:
available_models_dict[model_name] = WatsonxEmbeddings(
model_id=model_name,
params=params,
watsonx_client=api_client,
project_id=project_id,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

IBM watsonx.ai Python SDK APIClient thread safety documentation


<web_search_result>
Short answer: IBM's watsonx.ai Python SDK does not publish an explicit, general "APIClient is thread‑safe" guarantee. Practical guidance from source/docs implies you should treat APIClient/persistent HTTP connections cautiously in multithreaded code — best practice is to create one APIClient (or one HTTP session) per thread or otherwise avoid sharing a single APIClient across threads. [1][2][3]

Sources:

  • SDK source (party_wrapper) — shows internal use of requests/connection wrappers (no public thread‑safety claim). [1]
  • LlamaIndex integration note — notes the SDK enabled persistent_connection by default (since 1.1.2) and that persistent connections can cause problems in environments without long‑running connections (i.e., indicates connection behavior you must consider in concurrent scenarios). [2]
  • LangChain / integration docs — show patterns where callers create and pass an APIClient instance into higher‑level wrappers (implying explicit management of the client object). [3]

If you want, I can fetch specific SDK README or source lines (or check a specific SDK version) and show exact code/comments about connection/session handling.
</web_search_result>


Script executed:

#!/bin/bash

# First, let's look at the file in question
echo "=== File context ==="
wc -l src/lfx/src/lfx/components/models_and_agents/embedding_model.py

# Get the full context around lines 306-314
echo -e "\n=== Code context around lines 306-314 ==="
sed -n '290,330p' src/lfx/src/lfx/components/models_and_agents/embedding_model.py

# Look for imports related to WatsonxEmbeddings and APIClient
echo -e "\n=== Imports in the file ==="
head -30 src/lfx/src/lfx/components/models_and_agents/embedding_model.py

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

=== File context ===
423 src/lfx/src/lfx/components/models_and_agents/embedding_model.py

=== Code context around lines 306-314 ===
            params = {
                EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,
                EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": self.input_text},
            }

            # Create the primary embedding instance
            embeddings_instance = WatsonxEmbeddings(
                model_id=model,
                params=params,
                watsonx_client=api_client,
                project_id=project_id,
            )

            # Fetch available IBM watsonx.ai models
            available_model_names = self.fetch_ibm_models(final_url)

            # Create dedicated instances for each available model
            available_models_dict = {}
            for model_name in available_model_names:
                available_models_dict[model_name] = WatsonxEmbeddings(
                    model_id=model_name,
                    params=params,
                    watsonx_client=api_client,
                    project_id=project_id,
                )

            return EmbeddingsWithModels(
                embeddings=embeddings_instance,
                available_models=available_models_dict,
            )

        msg = f"Unknown provider: {provider}"
        raise ValueError(msg)

    async def update_build_config(
        self, build_config: dotdict, field_value: Any, field_name: str | None = None
    ) -> dotdict:
        if field_name == "provider":
            if field_value == "OpenAI":
                build_config["model"]["options"] = OPENAI_EMBEDDING_MODEL_NAMES
                build_config["model"]["value"] = OPENAI_EMBEDDING_MODEL_NAMES[0]

=== Imports in the file ===
from typing import Any

import requests
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
from langchain_openai import OpenAIEmbeddings

from lfx.base.embeddings.embeddings_class import EmbeddingsWithModels
from lfx.base.embeddings.model import LCEmbeddingsModel
from lfx.base.models.model_utils import get_ollama_models, is_valid_ollama_url
from lfx.base.models.openai_constants import OPENAI_EMBEDDING_MODEL_NAMES
from lfx.base.models.watsonx_constants import (
    IBM_WATSONX_URLS,
    WATSONX_EMBEDDING_MODEL_NAMES,
)
from lfx.field_typing import Embeddings
from lfx.io import (
    BoolInput,
    DictInput,
    DropdownInput,
    FloatInput,
    IntInput,
    MessageTextInput,
    SecretStrInput,
)
from lfx.log.logger import logger
from lfx.schema.dotdict import dotdict
from lfx.utils.util import transform_localhost_url

# Ollama API constants
HTTP_STATUS_OK = 200

</shell_result>


Script executed:

#!/bin/bash

# Find all references to EmbeddingsWithModels and available_models
echo "=== Searching for EmbeddingsWithModels usage ==="
rg -n "EmbeddingsWithModels" --type py

echo -e "\n=== Searching for available_models usage ==="
rg -n "available_models" --type py -A 2 -B 2

echo -e "\n=== Check for async/threading patterns in embedding model ==="
rg -n "async def|Thread|asyncio|concurrent" src/lfx/src/lfx/components/models_and_agents/embedding_model.py

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

=== Searching for EmbeddingsWithModels usage ===
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:7:from lfx.base.embeddings.embeddings_class import EmbeddingsWithModels
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:202:            return EmbeddingsWithModels(
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:257:            return EmbeddingsWithModels(
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:316:            return EmbeddingsWithModels(
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:695:                # Also check available_models list from EmbeddingsWithModels
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1193:        # Also leverage available_models list from EmbeddingsWithModels
src/lfx/src/lfx/base/embeddings/embeddings_class.py:6:class EmbeddingsWithModels(Embeddings):
src/lfx/src/lfx/base/embeddings/embeddings_class.py:24:        """Initialize the EmbeddingsWithModels wrapper.
src/lfx/src/lfx/base/embeddings/embeddings_class.py:114:            f"EmbeddingsWithModels(embeddings={self.embeddings!r}, "

=== Searching for available_models usage ===
src/lfx/src/lfx/base/models/groq_model_discovery.py-71-
src/lfx/src/lfx/base/models/groq_model_discovery.py-72-            # Step 1: Get list of available models
src/lfx/src/lfx/base/models/groq_model_discovery.py:73:            available_models = self._fetch_available_models()
src/lfx/src/lfx/base/models/groq_model_discovery.py:74:            logger.info(f"Found {len(available_models)} models from Groq API")
src/lfx/src/lfx/base/models/groq_model_discovery.py-75-
src/lfx/src/lfx/base/models/groq_model_discovery.py-76-            # Step 2: Categorize models
--
src/lfx/src/lfx/base/models/groq_model_discovery.py-78-            non_llm_models = []
src/lfx/src/lfx/base/models/groq_model_discovery.py-79-
src/lfx/src/lfx/base/models/groq_model_discovery.py:80:            for model_id in available_models:
src/lfx/src/lfx/base/models/groq_model_discovery.py-81-                if any(pattern in model_id.lower() for pattern in self.SKIP_PATTERNS):
src/lfx/src/lfx/base/models/groq_model_discovery.py-82-                    non_llm_models.append(model_id)
--
src/lfx/src/lfx/base/models/groq_model_discovery.py-115-            return models_metadata
src/lfx/src/lfx/base/models/groq_model_discovery.py-116-
src/lfx/src/lfx/base/models/groq_model_discovery.py:117:    def _fetch_available_models(self) -> list[str]:
src/lfx/src/lfx/base/models/groq_model_discovery.py-118-        """Fetch list of available models from Groq API."""
src/lfx/src/lfx/base/models/groq_model_discovery.py-119-        url = f"{self.base_url}/openai/v1/models"
--
src/lfx/src/lfx/base/embeddings/embeddings_class.py-13-    Attributes:
src/lfx/src/lfx/base/embeddings/embeddings_class.py-14-        embeddings: The primary LangChain Embeddings instance (used as fallback).
src/lfx/src/lfx/base/embeddings/embeddings_class.py:15:        available_models: Dict mapping model names to their dedicated Embeddings instances.
src/lfx/src/lfx/base/embeddings/embeddings_class.py-16-                         Each model has its own pre-configured instance with specific parameters.
src/lfx/src/lfx/base/embeddings/embeddings_class.py-17-    """
--
src/lfx/src/lfx/base/embeddings/embeddings_class.py-20-        self,
src/lfx/src/lfx/base/embeddings/embeddings_class.py-21-        embeddings: Embeddings,
src/lfx/src/lfx/base/embeddings/embeddings_class.py:22:        available_models: dict[str, Embeddings] | None = None,
src/lfx/src/lfx/base/embeddings/embeddings_class.py-23-    ):
src/lfx/src/lfx/base/embeddings/embeddings_class.py-24-        """Initialize the EmbeddingsWithModels wrapper.
--
src/lfx/src/lfx/base/embeddings/embeddings_class.py-26-        Args:
src/lfx/src/lfx/base/embeddings/embeddings_class.py-27-            embeddings: The primary LangChain Embeddings instance (used as default/fallback).
src/lfx/src/lfx/base/embeddings/embeddings_class.py:28:            available_models: Dict mapping model names to dedicated Embeddings instances.
src/lfx/src/lfx/base/embeddings/embeddings_class.py-29-                            Each value should be a fully configured Embeddings object ready to use.
src/lfx/src/lfx/base/embeddings/embeddings_class.py-30-                            Defaults to empty dict if not provided.
--
src/lfx/src/lfx/base/embeddings/embeddings_class.py-32-        super().__init__()
src/lfx/src/lfx/base/embeddings/embeddings_class.py-33-        self.embeddings = embeddings
src/lfx/src/lfx/base/embeddings/embeddings_class.py:34:        self.available_models = available_models if available_models is not None else {}
src/lfx/src/lfx/base/embeddings/embeddings_class.py-35-
src/lfx/src/lfx/base/embeddings/embeddings_class.py-36-    def embed_documents(self, texts: list[str]) -> list[list[float]]:
--
src/lfx/src/lfx/base/embeddings/embeddings_class.py-113-        return (
src/lfx/src/lfx/base/embeddings/embeddings_class.py-114-            f"EmbeddingsWithModels(embeddings={self.embeddings!r}, "
src/lfx/src/lfx/base/embeddings/embeddings_class.py:115:            f"available_models={self.available_models!r})"
src/lfx/src/lfx/base/embeddings/embeddings_class.py-116-        )
src/lfx/src/lfx/base/embeddings/embeddings_class.py-117-
--
src/lfx/src/lfx/components/nvidia/nvidia_rerank.py-47-            try:
src/lfx/src/lfx/components/nvidia/nvidia_rerank.py-48-                build_model = self.build_compressor()
src/lfx/src/lfx/components/nvidia/nvidia_rerank.py:49:                ids = [model.id for model in build_model.available_models]
src/lfx/src/lfx/components/nvidia/nvidia_rerank.py-50-                build_config["model"]["options"] = ids
src/lfx/src/lfx/components/nvidia/nvidia_rerank.py-51-                build_config["model"]["value"] = ids[0]
--
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-51-            try:
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-52-                build_model = self.build_embeddings()
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py:53:                ids = [model.id for model in build_model.available_models]
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-54-                build_config["model"]["options"] = ids
src/lfx/src/lfx/components/nvidia/nvidia_embedding.py-55-                build_config["model"]["value"] = ids[0]
--
src/lfx/src/lfx/components/nvidia/nvidia.py-21-        from langchain_nvidia_ai_endpoints import ChatNVIDIA
src/lfx/src/lfx/components/nvidia/nvidia.py-22-
src/lfx/src/lfx/components/nvidia/nvidia.py:23:        all_models = ChatNVIDIA().get_available_models()
src/lfx/src/lfx/components/nvidia/nvidia.py-24-    except ImportError as e:
src/lfx/src/lfx/components/nvidia/nvidia.py-25-        msg = "Please install langchain-nvidia-ai-endpoints to use the NVIDIA model."
--
src/lfx/src/lfx/components/nvidia/nvidia.py-102-        model = ChatNVIDIA(base_url=self.base_url, api_key=self.api_key)
src/lfx/src/lfx/components/nvidia/nvidia.py-103-        if tool_model_enabled:
src/lfx/src/lfx/components/nvidia/nvidia.py:104:            tool_models = [m for m in model.get_available_models() if m.supports_tools]
src/lfx/src/lfx/components/nvidia/nvidia.py-105-            return sorted(m.id for m in tool_models)
src/lfx/src/lfx/components/nvidia/nvidia.py:106:        return sorted(m.id for m in model.available_models)
src/lfx/src/lfx/components/nvidia/nvidia.py-107-
src/lfx/src/lfx/components/nvidia/nvidia.py-108-    def update_build_config(self, build_config: dotdict, _field_value: Any, field_name: str | None = None):
--
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-186-
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-187-            # Create dedicated instances for each available model
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:188:            available_models_dict = {}
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-189-            for model_name in OPENAI_EMBEDDING_MODEL_NAMES:
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:190:                available_models_dict[model_name] = OpenAIEmbeddings(
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-191-                    model=model_name,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-192-                    dimensions=dimensions or None,  # Use same dimensions config for all
--
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-202-            return EmbeddingsWithModels(
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-203-                embeddings=embeddings_instance,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:204:                available_models=available_models_dict,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-205-            )
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-206-
--
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-247-
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-248-            # Create dedicated instances for each available model
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:249:            available_models_dict = {}
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-250-            for model_name in available_model_names:
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:251:                available_models_dict[model_name] = OllamaEmbeddings(
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-252-                    model=model_name,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-253-                    base_url=final_base_url,
--
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-257-            return EmbeddingsWithModels(
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-258-                embeddings=embeddings_instance,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:259:                available_models=available_models_dict,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-260-            )
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-261-
--
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-305-
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-306-            # Create dedicated instances for each available model
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:307:            available_models_dict = {}
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-308-            for model_name in available_model_names:
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:309:                available_models_dict[model_name] = WatsonxEmbeddings(
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-310-                    model_id=model_name,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-311-                    params=params,
--
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-316-            return EmbeddingsWithModels(
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-317-                embeddings=embeddings_instance,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:318:                available_models=available_models_dict,
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-319-            )
src/lfx/src/lfx/components/models_and_agents/embedding_model.py-320-
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-693-            for emb_obj in embeddings_list:
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-694-                # Check all possible model identifiers (deployment, model, model_id, model_name)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:695:                # Also check available_models list from EmbeddingsWithModels
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-696-                possible_names = []
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-697-                deployment = getattr(emb_obj, "deployment", None)
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-699-                model_id = getattr(emb_obj, "model_id", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-700-                model_name = getattr(emb_obj, "model_name", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:701:                available_models_attr = getattr(emb_obj, "available_models", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-702-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-703-                if deployment:
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-714-                    possible_names.append(f"{deployment}:{model}")
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-715-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:716:                # Add all models from available_models dict
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:717:                if available_models_attr and isinstance(available_models_attr, dict):
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-718-                    possible_names.extend(
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-719-                        str(model_key).strip()
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:720:                        for model_key in available_models_attr
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-721-                        if model_key and str(model_key).strip()
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-722-                    )
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-724-                # Match if target matches any of the possible names
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-725-                if target_model_name in possible_names:
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:726:                    # Check if target is in available_models dict - use dedicated instance
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-727-                    if (
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:728:                        available_models_attr
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:729:                        and isinstance(available_models_attr, dict)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:730:                        and target_model_name in available_models_attr
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-731-                    ):
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-732-                        # Use the dedicated embedding instance from the dict
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:733:                        selected_embedding = available_models_attr[target_model_name]
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-734-                        embedding_model = target_model_name
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:735:                        self.log(f"Found dedicated embedding instance for '{embedding_model}' in available_models dict")
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-736-                    else:
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-737-                        # Traditional identifier match
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-751-                    model_id = getattr(emb, "model_id", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-752-                    model_name = getattr(emb, "model_name", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:753:                    available_models_attr = getattr(emb, "available_models", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-754-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-755-                    if deployment:
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-766-                        identifiers.append(f"combined='{deployment}:{model}'")
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-767-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:768:                    # Add available_models dict if present
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:769:                    if available_models_attr and isinstance(available_models_attr, dict):
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:770:                        identifiers.append(f"available_models={list(available_models_attr.keys())}")
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-771-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-772-                    available_info.append(
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-803-        if hasattr(selected_embedding, "dimensions"):
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-804-            logger.info(f"Embedding dimensions: {selected_embedding.dimensions}")
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:805:        if hasattr(selected_embedding, "available_models"):
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:806:            logger.info(f"Embedding available_models: {selected_embedding.available_models}")
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-807-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:808:        # No model switching needed - each model in available_models has its own dedicated instance
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-809-        # The selected_embedding is already configured correctly for the target model
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-810-        logger.info(f"Using embedding instance for '{embedding_model}' - pre-configured and ready to use")
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1031-        return context_clauses
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1032-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1033:    def _detect_available_models(self, client: OpenSearch, filter_clauses: list[dict] | None = None) -> list[str]:
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1034-        """Detect which embedding models have documents in the index.
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1035-
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1177-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1178-        # Detect available embedding models in the index (scoped by filters)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1179:        available_models = self._detect_available_models(client, filter_clauses)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1180-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1181:        if not available_models:
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1182-            logger.warning("No embedding models found in index, using current model")
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1183:            available_models = [self._get_embedding_model_name()]
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1184-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1185-        # Generate embeddings for ALL detected models
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1191-        # Create a comprehensive map of model names to embedding objects
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1192-        # Check all possible identifiers (deployment, model, model_id, model_name)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1193:        # Also leverage available_models list from EmbeddingsWithModels
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1194-        # Handle duplicate identifiers by creating combined keys
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1195-        embedding_by_model = {}
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1204-            model_name = getattr(emb_obj, "model_name", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1205-            dimensions = getattr(emb_obj, "dimensions", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1206:            available_models = getattr(emb_obj, "available_models", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1207-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1208-            logger.info(
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1209-                f"Embedding object {idx}: deployment={deployment}, model={model}, "
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1210-                f"model_id={model_id}, model_name={model_name}, dimensions={dimensions}, "
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1211:                f"available_models={available_models}"
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1212-            )
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1213-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1214:            # If this embedding has available_models dict, map all models to their dedicated instances
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1215:            if available_models and isinstance(available_models, dict):
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1216:                logger.info(f"Embedding object {idx} provides {len(available_models)} models via available_models dict")
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1217:                for model_name_key, dedicated_embedding in available_models.items():
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1218-                    if model_name_key and str(model_name_key).strip():
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1219-                        model_str = str(model_name_key).strip()
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1268-                logger.warning(f"  Conflict on '{conflict_id}': {len(emb_list)} embeddings use this identifier")
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1269-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1270:        logger.info(f"Generating embeddings for {len(available_models)} models in index")
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1271-        logger.info(f"Available embedding identifiers: {list(embedding_by_model.keys())}")
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1272-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1273:        for model_name in available_models:
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1274-            try:
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1275-                # Check if we have an embedding object for this model
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1281-                    emb_model_id = getattr(emb_obj, "model_id", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1282-                    emb_dimensions = getattr(emb_obj, "dimensions", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1283:                    emb_available_models = getattr(emb_obj, "available_models", None)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1284-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1285-                    logger.info(
--
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1289-                    )
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1290-
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1291:                    # Check if this is a dedicated instance from available_models dict
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1292:                    if emb_available_models and isinstance(emb_available_models, dict):
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1293-                        logger.info(
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py:1294:                            f"Model '{model_name}' using dedicated instance from available_models dict "
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1295-                            f"(pre-configured with correct model and dimensions)"
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py-1296-                        )
--
src/backend/tests/unit/groq/test_groq_model_discovery.py-36-    @patch("lfx.base.models.groq_model_discovery.requests.get")
src/backend/tests/unit/groq/test_groq_model_discovery.py-37-    @patch("groq.Groq")
src/backend/tests/unit/groq/test_groq_model_discovery.py:38:    def test_fetch_available_models_success(
src/backend/tests/unit/groq/test_groq_model_discovery.py-39-        self, mock_groq, mock_get, mock_api_key, mock_groq_models_response, mock_groq_client_tool_calling_success
src/backend/tests/unit/groq/test_groq_model_discovery.py-40-    ):
--
src/backend/tests/unit/groq/test_groq_model_discovery.py-51-
src/backend/tests/unit/groq/test_groq_model_discovery.py-52-        discovery = GroqModelDiscovery(api_key=mock_api_key)
src/backend/tests/unit/groq/test_groq_model_discovery.py:53:        models = discovery._fetch_available_models()
src/backend/tests/unit/groq/test_groq_model_discovery.py-54-
src/backend/tests/unit/groq/test_groq_model_discovery.py-55-        assert isinstance(models, list)

=== Check for async/threading patterns in embedding model ===
155:    async def build_embeddings(self) -> Embeddings:
324:    async def update_build_config(

</shell_result>


Script executed:

#!/bin/bash

# Search for APIClient creation and usage
echo "=== Searching for APIClient ==="
rg -n "APIClient|api_client" src/lfx/src/lfx/components/models_and_agents/embedding_model.py -B 2 -A 2

# Get the full method that contains lines 306-314
echo -e "\n=== Full method context ==="
sed -n '250,330p' src/lfx/src/lfx/components/models_and_agents/embedding_model.py

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

=== Searching for APIClient ===
277-                raise ValueError(msg)
278-
279:            from ibm_watsonx_ai import APIClient, Credentials
280-
281-            final_url = base_url_ibm_watsonx or "https://us-south.ml.cloud.ibm.com"
--
286-            )
287-
288:            api_client = APIClient(credentials)
289-
290-            params = {
--
297-                model_id=model,
298-                params=params,
299:                watsonx_client=api_client,
300-                project_id=project_id,
301-            )
--
310-                    model_id=model_name,
311-                    params=params,
312:                    watsonx_client=api_client,
313-                    project_id=project_id,
314-                )

=== Full method context ===
            for model_name in available_model_names:
                available_models_dict[model_name] = OllamaEmbeddings(
                    model=model_name,
                    base_url=final_base_url,
                    **model_kwargs,
                )

            return EmbeddingsWithModels(
                embeddings=embeddings_instance,
                available_models=available_models_dict,
            )

        if provider == "IBM watsonx.ai":
            try:
                from langchain_ibm import WatsonxEmbeddings
            except ImportError:
                msg = "Please install langchain-ibm: pip install langchain-ibm"
                raise ImportError(msg) from None

            if not api_key:
                msg = "IBM watsonx.ai API key is required when using IBM watsonx.ai provider"
                raise ValueError(msg)

            project_id = self.project_id

            if not project_id:
                msg = "Project ID is required for IBM watsonx.ai provider"
                raise ValueError(msg)

            from ibm_watsonx_ai import APIClient, Credentials

            final_url = base_url_ibm_watsonx or "https://us-south.ml.cloud.ibm.com"

            credentials = Credentials(
                api_key=self.api_key,
                url=final_url,
            )

            api_client = APIClient(credentials)

            params = {
                EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: self.truncate_input_tokens,
                EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": self.input_text},
            }

            # Create the primary embedding instance
            embeddings_instance = WatsonxEmbeddings(
                model_id=model,
                params=params,
                watsonx_client=api_client,
                project_id=project_id,
            )

            # Fetch available IBM watsonx.ai models
            available_model_names = self.fetch_ibm_models(final_url)

            # Create dedicated instances for each available model
            available_models_dict = {}
            for model_name in available_model_names:
                available_models_dict[model_name] = WatsonxEmbeddings(
                    model_id=model_name,
                    params=params,
                    watsonx_client=api_client,
                    project_id=project_id,
                )

            return EmbeddingsWithModels(
                embeddings=embeddings_instance,
                available_models=available_models_dict,
            )

        msg = f"Unknown provider: {provider}"
        raise ValueError(msg)

    async def update_build_config(
        self, build_config: dotdict, field_value: Any, field_name: str | None = None
    ) -> dotdict:
        if field_name == "provider":
            if field_value == "OpenAI":
                build_config["model"]["options"] = OPENAI_EMBEDDING_MODEL_NAMES
                build_config["model"]["value"] = OPENAI_EMBEDDING_MODEL_NAMES[0]

</shell_result>


ibm_watsonx_ai WatsonxEmbeddings watsonx_client shared instance concurrent usage


<web_search_result>
Short answer — yes, you can share a single WatsonxEmbeddings / watsonx_client instance across callers, but you should control concurrent requests (use the integration's built‑in concurrency option or your own queue/pool).

Key points and practical guidance

  • LangChain's WatsonxEmbeddings supports a concurrency control parameter (often called concurrency_limit / maxConcurrency) so multiple callers can safely share one embeddings instance while limiting parallel requests to the provider. [1][3]
  • The LangChain IBM embedding wrapper constructs/uses an underlying watsonx client (api_client) — you can pass a preconfigured client to the wrapper to reuse connections. If you share that client, ensure your app enforces a concurrency limit (or rely on the wrapper's concurrency option) to avoid rate limits or resource contention. [2][1]
  • If you need high throughput: use a shared client + set concurrency_limit / maxConcurrency to a value that matches your plan/rate limits, or implement a worker pool/queue to throttle requests and retry on 429/errors.

Sources

  • LangChain WatsonxEmbeddings reference (instantiation, parameters). [1]
  • LangChain IBM embeddings module source (shows api_client / internal client use). [2]
  • LangChain docs on rate‑limits / maxConcurrency for embeddings. [3]

If you want, tell me which SDK/library and language (Python, JS, LangChain version) you use and your target throughput, and I’ll suggest exact code/config values.
</web_search_result>


Script executed:

#!/bin/bash

# Check the full WatsonxEmbeddings initialization in embedding_model.py
echo "=== WatsonxEmbeddings initialization parameters ==="
sed -n '295,315p' src/lfx/src/lfx/components/models_and_agents/embedding_model.py

# Search for concurrency_limit or similar parameters in the file
echo -e "\n=== Search for concurrency parameters ==="
rg -n "concurrency|maxConcurrency|max_concurrency" src/lfx/src/lfx/components/models_and_agents/embedding_model.py

# Check how embed_documents is called in opensearch_multimodal
echo -e "\n=== How embeddings are called in opensearch_multimodal ==="
rg -n "embed_documents\|embed_query" src/lfx/src/lfx/components/elastic/opensearch_multimodal.py -B 2 -A 3 | head -50

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

=== WatsonxEmbeddings initialization parameters ===
            # Create the primary embedding instance
            embeddings_instance = WatsonxEmbeddings(
                model_id=model,
                params=params,
                watsonx_client=api_client,
                project_id=project_id,
            )

            # Fetch available IBM watsonx.ai models
            available_model_names = self.fetch_ibm_models(final_url)

            # Create dedicated instances for each available model
            available_models_dict = {}
            for model_name in available_model_names:
                available_models_dict[model_name] = WatsonxEmbeddings(
                    model_id=model_name,
                    params=params,
                    watsonx_client=api_client,
                    project_id=project_id,
                )


=== Search for concurrency parameters ===

=== How embeddings are called in opensearch_multimodal ===

</shell_result>


langchain_ibm WatsonxEmbeddings parameters concurrency_limit maxConcurrency source code


<web_search_result>
Short answer — both names exist but in different places:

  • In the langchain-ibm (LangChain) integrations:

    • JS/TS integration exposes maxConcurrency / maxRetries on Watsonx Embeddings/LLM input types (e.g., WatsonxInputEmbeddings / WatsonxCallOptionsLLM). See the API docs for WatsonxEmbeddings / WatsonxInputEmbeddings. [1][2]
  • In the upstream IBM watsonx.ai Python SDK (the underlying client) the Embeddings class uses concurrency_limit (and batch_size, max_retries, delay_time, retry_status_codes, etc.). See the SDK Embeddings constructor docs. [3]

Sources:

  1. LangChain (community) JS/TS WatsonxEmbeddings / WatsonxInputEmbeddings docs (shows maxConcurrency). [turn2search6][turn2search2].
  2. LangChain JS Watsonx call options / params (maxConcurrency field). [turn0search1].
  3. IBM watsonx.ai Python SDK Embeddings constructor (concurrency_limit parameter). [turn2search9].
    </web_search_result>

Script executed:

#!/bin/bash

# Search for langchain-ibm version in dependencies
echo "=== Searching for langchain-ibm version ==="
find . -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" | head -20

# Check pyproject.toml or similar files
if [ -f "pyproject.toml" ]; then
  echo -e "\n=== pyproject.toml content ==="
  grep -A 5 -B 5 "langchain" pyproject.toml | head -40
fi

# Also check in src or root
for f in requirements.txt requirements-dev.txt setup.py; do
  if [ -f "$f" ]; then
    echo -e "\n=== $f ==="
    grep "langchain" "$f" | head -10
  fi
done

# Also check backend directory
if [ -d "backend" ]; then
  echo -e "\n=== Checking backend for requirements ==="
  find backend -maxdepth 2 -name "*.txt" -o -name "pyproject.toml" | xargs grep -l "langchain" 2>/dev/null | head -5
fi

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

=== Searching for langchain-ibm version ===
./pyproject.toml
./src/backend/base/langflow/initial_setup/setup.py
./src/backend/base/langflow/logging/setup.py
./src/backend/base/pyproject.toml
./src/lfx/pyproject.toml

=== pyproject.toml content ===
name = "langflow"
version = "1.7.0"
description = "A Python package with a built-in web application"
requires-python = ">=3.10,<3.14"
license = "MIT"
keywords = ["nlp", "langchain", "openai", "gpt", "gui"]
readme = "README.md"
maintainers = [
    { name = "Carlos Coelho", email = "[email protected]" },
    { name = "Cristhian Zanforlin", email = "[email protected]" },
    { name = "Gabriel Almeida", email = "[email protected]" },
--
    "MarkupSafe==3.0.2",
    "boto3>=1.34.162,<2.0.0",
    "numexpr==2.10.2",
    "qianfan==0.3.5",
    "pgvector==0.3.6",
    "langchain==0.3.23",
    "elasticsearch==8.16.0",
    "pytube==15.0.0",
    "dspy-ai==2.5.41",
    "datasets>2.14.7,<4.0.0",
    "assemblyai==0.35.1",
--
    "langsmith>=0.3.42,<1.0.0",
    "yfinance==0.2.50",
    "wolframalpha==5.1.3",
    "astra-assistants[tools]>=2.2.13,<3.0.0",
    "composio==0.9.2",
    "composio-langchain==0.9.2",
    "spider-client==0.1.24",
    "nltk==3.9.1",
    "lark==1.2.2",
    "jq==1.8.0",
    "pydantic-settings>=2.2.0,<3.0.0",
    "duckduckgo_search==7.2.1",
    "opensearch-py==2.8.0",
    "langchain-google-genai==2.0.6",
    "langchain-cohere>=0.3.3,<1.0.0",
    "langchain-huggingface==0.3.1",

</shell_result>


Script executed:

#!/bin/bash

# Look at how embed_documents is called - check context around line 1274
echo "=== Context around embed_documents call in opensearch_multimodal ==="
sed -n '1270,1310p' src/lfx/src/lfx/components/elastic/opensearch_multimodal.py

# Check if there's any threading or asyncio in this file
echo -e "\n=== Threading/async patterns in opensearch_multimodal ==="
rg -n "async def|Thread|concurrent|asyncio" src/lfx/src/lfx/components/elastic/opensearch_multimodal.py | head -20

Repository: langflow-ai/langflow


<shell_result>
Repository: langflow-ai/langflow
Exit code: 0

stdout:

=== Context around embed_documents call in opensearch_multimodal ===
        logger.info(f"Generating embeddings for {len(available_models)} models in index")
        logger.info(f"Available embedding identifiers: {list(embedding_by_model.keys())}")

        for model_name in available_models:
            try:
                # Check if we have an embedding object for this model
                if model_name in embedding_by_model:
                    # Use the matching embedding object directly
                    emb_obj = embedding_by_model[model_name]
                    emb_deployment = getattr(emb_obj, "deployment", None)
                    emb_model = getattr(emb_obj, "model", None)
                    emb_model_id = getattr(emb_obj, "model_id", None)
                    emb_dimensions = getattr(emb_obj, "dimensions", None)
                    emb_available_models = getattr(emb_obj, "available_models", None)

                    logger.info(
                        f"Using embedding object for model '{model_name}': "
                        f"deployment={emb_deployment}, model={emb_model}, model_id={emb_model_id}, "
                        f"dimensions={emb_dimensions}"
                    )

                    # Check if this is a dedicated instance from available_models dict
                    if emb_available_models and isinstance(emb_available_models, dict):
                        logger.info(
                            f"Model '{model_name}' using dedicated instance from available_models dict "
                            f"(pre-configured with correct model and dimensions)"
                        )

                    # Use the embedding instance directly - no model switching needed!
                    vec = emb_obj.embed_query(q)
                    query_embeddings[model_name] = vec
                    logger.info(f"Generated embedding for model: {model_name} (actual dimensions: {len(vec)})")
                else:
                    # No matching embedding found for this model
                    logger.warning(
                        f"No matching embedding found for model '{model_name}'. "
                        f"This model will be skipped. Available models: {list(embedding_by_model.keys())}"
                    )
            except (RuntimeError, ValueError, ConnectionError, TimeoutError, AttributeError, KeyError) as e:
                logger.warning(f"Failed to generate embedding for {model_name}: {e}")


=== Threading/async patterns in opensearch_multimodal ===
7:from concurrent.futures import ThreadPoolExecutor, as_completed
858:                with ThreadPoolExecutor(max_workers=max_workers) as executor:
874:                    "Threaded embedding generation failed for model %s (attempt %s/%s), retrying in %.1fs",
1536:    async def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None) -> dict:

</shell_result>


Based on my verification, the review comment raises a valid concern that warrants action. Here's what I found:

  1. Shared api_client confirmed: All WatsonxEmbeddings instances created in the loop (lines 309-314) share the same api_client object created at line 288.

  2. Concurrent usage confirmed: opensearch_multimodal.py uses ThreadPoolExecutor (line 858) to execute embedding operations in parallel, and it directly uses the instances from the available_models dictionary (as shown in the loop at lines 1273+).

  3. No concurrency control: The WatsonxEmbeddings instances are instantiated without any concurrency_limit parameter, which LangChain supports for controlling parallel requests.

  4. IBM SDK thread-safety: Web search confirms IBM's watsonx.ai Python SDK does not provide explicit thread-safety guarantees for APIClient.

The review comment is accurate and identifies a legitimate issue that needs to be addressed. The solution would be to add a concurrency_limit parameter when instantiating WatsonxEmbeddings instances.


Verify that WatsonxEmbeddings instances are configured with a concurrency_limit parameter to control concurrent API calls.

All WatsonxEmbeddings instances (lines 309–314 and 297–301) share the same api_client object. The code is used with ThreadPoolExecutor in opensearch_multimodal.py for parallel embedding generation. Without an explicit concurrency_limit on each instance, concurrent calls through the shared APIClient risk exceeding API rate limits or exposing thread-unsafe behavior. Add concurrency_limit=N (e.g., 5–10) to each WatsonxEmbeddings() constructor to safely control concurrent requests.

🤖 Prompt for AI Agents
In src/lfx/src/lfx/components/models_and_agents/embedding_model.py around lines
306-314 (and also update the earlier instantiation at lines 297-301), the
WatsonxEmbeddings instances are created without a concurrency_limit while
sharing the same api_client used concurrently by ThreadPoolExecutor; fix this by
passing a concurrency_limit parameter to each WatsonxEmbeddings(...) constructor
(e.g., concurrency_limit=5 or 10) so that parallel embedding requests through
the shared api_client are throttled and avoid thread-safety or rate-limit
issues.

@github-actions github-actions bot added lgtm This PR has been approved by a maintainer enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 25, 2025

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 15%
15.29% (4188/27381) 8.49% (1778/20935) 9.6% (579/6031)

Unit Test Results

Tests Skipped Failures Errors Time
1638 0 💤 0 ❌ 0 🔥 21.875s ⏱️

@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

❌ Patch coverage is 0% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.44%. Comparing base (1065e6e) to head (7bcb66a).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...rc/lfx/src/lfx/base/embeddings/embeddings_class.py 0.00% 21 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (40.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (40.04%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #10714      +/-   ##
==========================================
- Coverage   32.48%   32.44%   -0.05%     
==========================================
  Files        1366     1367       +1     
  Lines       63294    63315      +21     
  Branches     9356     9357       +1     
==========================================
- Hits        20564    20542      -22     
- Misses      41698    41740      +42     
- Partials     1032     1033       +1     
Flag Coverage Δ
backend 51.26% <ø> (-0.13%) ⬇️
frontend 14.13% <ø> (ø)
lfx 40.04% <0.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rc/lfx/src/lfx/base/embeddings/embeddings_class.py 0.00% <0.00%> (ø)

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Updated ChatInput and ChatOutput components in starter project JSONs to use the session_id from the graph if not provided, ensuring consistent session management. This change improves message storage and retrieval logic for chat flows.
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 25, 2025
@edwinjosechittilappilly edwinjosechittilappilly added this pull request to the merge queue Nov 25, 2025
Merged via the queue into main with commit f2fb7b3 Nov 25, 2025
80 of 82 checks passed
@edwinjosechittilappilly edwinjosechittilappilly deleted the opensearch-multi-embedding branch November 25, 2025 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants