Skip to content

Conversation

@deon-sanchez
Copy link
Collaborator

@deon-sanchez deon-sanchez commented Dec 12, 2025

Summary by CodeRabbit

  • New Features

    • Added live model data integration from models.dev API with intelligent caching and refresh capabilities.
    • Expanded model information including costs, token limits, modalities, knowledge cutoff, and documentation links.
    • Enhanced provider detection with support for additional model providers and their icons.
  • Improvements

    • Improved modal scrolling for better navigation in provider selection.

✏️ Tip: You can customize this high-level summary in your review settings.

@deon-sanchez deon-sanchez self-assigned this Dec 12, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 12, 2025

Walkthrough

This PR adds live model data fetching from the models.dev API with caching support, controlled by the LFX_USE_LIVE_MODEL_DATA environment variable. It expands the model metadata type system with cost, limits, and modalities structures, introduces a new models_dev_client module with cache management and API integration, and updates frontend provider icon mappings. The system maintains backward-compatible static defaults with fallback behavior when live data is unavailable.

Changes

Cohort / File(s) Change Summary
Environment Configuration
\.env\.example
Added LFX_USE_LIVE_MODEL_DATA environment variable with documentation describing purpose (fetch model data from models.dev API when enabled), supported providers, fallback behavior, and default value; minor formatting alignment in existing comments.
Frontend Provider Mappings
src/frontend/src/controllers/API/queries/models/use-get-model-providers\.ts
Updated provider-to-icon mappings with renamed entries (e.g., "Google Generative AI" → "GoogleGenerativeAI"), new provider entries (e.g., "Google Vertex AI", "Ollama Cloud", "IBM Watsonx", "SambaNova", "Together AI", "Fireworks AI", "DeepSeek", "xAI", "Alibaba", "Cerebras", Azure/AWS variants), and preserved default "Bot" fallback.
Frontend Modal Styling
src/frontend/src/modals/modelProviderModal/index\.tsx
Added overflow-y-auto to left panel container to enable vertical scrolling on content overflow.
Backend Model Metadata Types
src/lfx/src/lfx/base/models/model_metadata\.py
Introduced three new TypedDicts (ModelCost, ModelLimits, ModelModalities) for structured pricing, token limits, and input/output modalities; extended ModelMetadata with 16 new optional fields (provider_id, display_name, structured_output, temperature, attachment, open_weights, cost, limits, modalities, knowledge_cutoff, release_date, last_updated, api_base, env_vars, documentation_url, model_type); updated create_model_metadata() to accept and conditionally include these fields.
Backend Models API Client (New)
src/lfx/src/lfx/base/models/models_dev_client\.py
New module providing models.dev API integration with 30-second HTTP timeout, local disk caching (1-hour TTL at ~/.cache/langflow/.models_dev_cache.json), error fallback to stale cache, data transformation via transform_api_model_to_metadata(), and public APIs: fetch_models_dev_data(), get_live_models_detailed(), get_models_by_provider(), search_models(), get_provider_metadata_from_api(), clear_cache().
Backend Model Package API
src/lfx/src/lfx/base/models/__init__\.py
Expanded public exports to include model metadata types (ModelCost, ModelLimits, ModelMetadata, ModelModalities, create_model_metadata), live-model utilities (clear_cache, fetch_models_dev_data, get_live_models_detailed, get_models_by_provider, get_provider_metadata_from_api, search_models), and refresh_live_model_data; added grouping comments for organized API surface.
Backend Unified Models
src/lfx/src/lfx/base/models/unified_models\.py
Introduced USE_LIVE_MODEL_DATA toggle (environment-driven), added get_static_model_provider_metadata(), get_live_models_as_groups(), and refresh_live_model_data() functions; updated get_models_detailed() to conditionally fetch live data from API with fallback to static defaults; imported cache/live-fetch utilities from models_dev_client.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client Code
    participant UM as unified_models
    participant MDC as models_dev_client
    participant Cache as Local Cache File
    participant API as models.dev API
    
    Client->>UM: get_models_detailed()
    alt USE_LIVE_MODEL_DATA enabled
        UM->>MDC: get_live_models_detailed()
        MDC->>Cache: _load_cache()
        alt Cache valid (within 1h TTL)
            Cache-->>MDC: cached data
        else Cache missing or expired
            MDC->>API: fetch https://models.dev/api.json
            API-->>MDC: provider & model data
            MDC->>Cache: _save_cache(data)
        end
        loop For each provider & model
            MDC->>MDC: transform_api_model_to_metadata()
            Note over MDC: Map provider IDs,<br/>determine model type,<br/>structure cost/limits
        end
        MDC-->>UM: list[ModelMetadata]
    else Live data unavailable
        UM->>UM: get_static_models_detailed()
        Note over UM: Fallback to static<br/>model definitions
    end
    UM-->>Client: list[ModelMetadata]
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • models_dev_client.py: New 250+ line module with API integration, caching logic, data transformation, and multiple public/internal functions—requires verification of error handling, cache TTL logic, API contract mapping, and provider/model type determination.
  • model_metadata.py: Introduces three new TypedDicts and significantly extends ModelMetadata structure with 16 optional fields—requires careful review of type annotations, field initialization logic, and backward compatibility.
  • unified_models.py: Adds conditional live/static resolution with caching imports and new public functions—requires verification of environment variable handling, fallback paths, and interaction with models_dev_client.
  • Frontend mappings (use-get-model-providers.ts): Provider name/icon mappings are numerous and should be cross-checked for accuracy and consistency with backend provider IDs.

Possibly related PRs

Suggested labels

enhancement, size:L

Suggested reviewers

  • jordanrfrazier
  • edwinjosechittilappilly
  • lucaseduoli

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Test Coverage For New Implementations ❌ Error PR introduces significant new backend functionality and frontend changes without corresponding test coverage for API caching, error handling, live data resolution, and UI updates. Add test files for models_dev_client.py, model_metadata.py, update unified_models.py tests, and add frontend tests for provider icons and scrolling features.
Test Quality And Coverage ⚠️ Warning New backend functionality in models_dev_client.py and unified_models.py lacks test coverage despite extensive testing infrastructure. Create comprehensive pytest tests for models_dev_client.py, model_metadata.py, and unified_models.py covering API interactions, caching, error handling, and frontend TypeScript tests for use-get-model-providers.ts.
Test File Naming And Structure ❓ Inconclusive PR adds significant backend functionality (models_dev_client.py with caching and API logic) and frontend changes, but no test files are included in the modified files list. Verify whether test files exist in separate locations, confirm the project's testing conventions and directory structure, and determine if new tests should accompany the substantial backend changes.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Get live provider and model data from model.dev' accurately describes the main change: integrating live data fetching from models.dev API for providers and models throughout the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 95.65% which is sufficient. The required threshold is 80.00%.
Excessive Mock Usage Warning ✅ Passed The custom check for excessive mock usage is not applicable to this pull request. The PR introduces new production code modules but does not include any test files containing mock objects.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch lfoss-3056

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 16%
16.42% (4606/28041) 9.73% (2106/21644) 10.76% (664/6166)

Unit Test Results

Tests Skipped Failures Errors Time
1803 0 💤 0 ❌ 0 🔥 24.646s ⏱️

@github-actions github-actions bot added the enhancement New feature or request label Dec 12, 2025
@codecov
Copy link

codecov bot commented Dec 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.02%. Comparing base (f5e68c2) to head (841b365).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #11007      +/-   ##
==========================================
- Coverage   33.02%   33.02%   -0.01%     
==========================================
  Files        1388     1388              
  Lines       65544    65538       -6     
  Branches     9680     9680              
==========================================
- Hits        21648    21642       -6     
  Misses      42798    42798              
  Partials     1098     1098              
Flag Coverage Δ
frontend 15.16% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...lers/API/queries/models/use-get-model-providers.ts 84.61% <ø> (ø)
...c/frontend/src/modals/modelProviderModal/index.tsx 0.00% <ø> (ø)
src/lfx/src/lfx/base/models/model_metadata.py 100.00% <ø> (ø)
src/lfx/src/lfx/base/models/unified_models.py 9.04% <ø> (-0.44%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
src/lfx/src/lfx/base/models/unified_models.py (2)

100-113: Consider narrowing the exception type (static analysis warning).

The blind Exception catch at line 109 triggers Ruff BLE001. While the fallback behavior is correct for resilience, consider either:

  1. Catching more specific exceptions (e.g., OSError, ValueError, TimeoutError)
  2. Adding # noqa: BLE001 with a comment explaining why broad catching is intentional here
-    except Exception as e:
+    except Exception as e:  # noqa: BLE001 - Intentional broad catch for graceful fallback
         logger.debug(f"Failed to get live provider metadata: {e}")

133-155: Same BLE001 warning applies here.

Line 153 has the same blind exception catch. Add # noqa: BLE001 with justification for consistency.

-    except Exception as e:
+    except Exception as e:  # noqa: BLE001 - Intentional broad catch for graceful fallback
         logger.debug(f"Failed to fetch live models: {e}")
         return []
src/lfx/src/lfx/base/models/models_dev_client.py (2)

174-189: Use idiomatic string containment check.

Line 186 uses .find() != -1 which is not idiomatic Python. Use the in operator instead.

Apply this diff:

-    if model_data.get("id", "").lower().find("embed") != -1:
+    if "embed" in model_data.get("id", "").lower():

191-227: Consider omitting None values from TypedDict construction.

The transform functions explicitly include optional fields with None values (e.g., lines 199-203). Since these TypedDicts have total=False, it's cleaner to omit None values rather than include them, which also prevents potential serialization issues.

Example for _transform_cost:

 def _transform_cost(cost_data: dict[str, Any] | None) -> ModelCost | None:
     """Transform API cost data to ModelCost format."""
     if not cost_data:
         return None
 
-    return ModelCost(
-        input=cost_data.get("input", 0),
-        output=cost_data.get("output", 0),
-        reasoning=cost_data.get("reasoning"),
-        cache_read=cost_data.get("cache_read"),
-        cache_write=cost_data.get("cache_write"),
-        input_audio=cost_data.get("input_audio"),
-        output_audio=cost_data.get("output_audio"),
-    )
+    result = ModelCost(
+        input=cost_data.get("input", 0),
+        output=cost_data.get("output", 0),
+    )
+    # Only add optional fields if present
+    for key in ["reasoning", "cache_read", "cache_write", "input_audio", "output_audio"]:
+        if key in cost_data:
+            result[key] = cost_data[key]
+    return result

Apply similar logic to _transform_limits and _transform_modalities.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a54e508 and 841b365.

📒 Files selected for processing (7)
  • .env.example (2 hunks)
  • src/frontend/src/controllers/API/queries/models/use-get-model-providers.ts (1 hunks)
  • src/frontend/src/modals/modelProviderModal/index.tsx (1 hunks)
  • src/lfx/src/lfx/base/models/__init__.py (1 hunks)
  • src/lfx/src/lfx/base/models/model_metadata.py (1 hunks)
  • src/lfx/src/lfx/base/models/models_dev_client.py (1 hunks)
  • src/lfx/src/lfx/base/models/unified_models.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
src/frontend/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)

src/frontend/src/**/*.{ts,tsx}: Use React 18 with TypeScript for frontend development
Use Zustand for state management

Files:

  • src/frontend/src/modals/modelProviderModal/index.tsx
  • src/frontend/src/controllers/API/queries/models/use-get-model-providers.ts
src/frontend/src/**/*.{tsx,jsx,css,scss}

📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)

Use Tailwind CSS for styling

Files:

  • src/frontend/src/modals/modelProviderModal/index.tsx
src/frontend/src/**/*.{tsx,jsx}

📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)

src/frontend/src/**/*.{tsx,jsx}: Implement dark mode support using the useDarkMode hook and dark store
Use Lucide React for icon components in the application

Files:

  • src/frontend/src/modals/modelProviderModal/index.tsx
🧠 Learnings (5)
📚 Learning: 2025-07-11T22:12:46.255Z
Learnt from: namastex888
Repo: langflow-ai/langflow PR: 9018
File: src/frontend/src/modals/apiModal/codeTabs/code-tabs.tsx:244-244
Timestamp: 2025-07-11T22:12:46.255Z
Learning: In src/frontend/src/modals/apiModal/codeTabs/code-tabs.tsx, the inconsistent showLineNumbers setting between Step 1 (false) and Step 2 (true) in the API modal is intentional to prevent breaking the modal height. Step 1 uses showLineNumbers={false} to save vertical space while Step 2 uses showLineNumbers={true} for better readability of longer code.

Applied to files:

  • src/frontend/src/modals/modelProviderModal/index.tsx
📚 Learning: 2025-07-23T21:19:22.567Z
Learnt from: deon-sanchez
Repo: langflow-ai/langflow PR: 9158
File: src/backend/base/langflow/api/v1/mcp_projects.py:404-404
Timestamp: 2025-07-23T21:19:22.567Z
Learning: In langflow MCP projects configuration, prefer using dynamically computed URLs (like the `sse_url` variable) over hardcoded localhost URLs to ensure compatibility across different deployment environments.

Applied to files:

  • .env.example
📚 Learning: 2025-11-24T19:46:57.920Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/icons.mdc:0-0
Timestamp: 2025-11-24T19:46:57.920Z
Learning: Applies to src/frontend/src/icons/lazyIconImports.ts : Add icon entries to the `lazyIconsMapping` object in `src/frontend/src/icons/lazyIconImports.ts`. The key must match the backend icon name exactly (case-sensitive) and use dynamic imports: `IconName: () => import("@/icons/IconName").then((mod) => ({ default: mod.IconNameIcon }))`.

Applied to files:

  • src/frontend/src/controllers/API/queries/models/use-get-model-providers.ts
📚 Learning: 2025-11-24T19:46:57.920Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/icons.mdc:0-0
Timestamp: 2025-11-24T19:46:57.920Z
Learning: Use clear, recognizable icon names (e.g., `"AstraDB"`, `"Postgres"`, `"OpenAI"`). Always use the same icon name for the same service across backend and frontend.

Applied to files:

  • src/frontend/src/controllers/API/queries/models/use-get-model-providers.ts
📚 Learning: 2025-06-23T12:46:52.420Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/icons.mdc:0-0
Timestamp: 2025-06-23T12:46:52.420Z
Learning: The frontend icon mapping key (in 'lazyIconsMapping') must match the backend 'icon' attribute string exactly, including case sensitivity, to ensure correct icon rendering.

Applied to files:

  • src/frontend/src/controllers/API/queries/models/use-get-model-providers.ts
🧬 Code graph analysis (2)
src/lfx/src/lfx/base/models/models_dev_client.py (1)
src/lfx/src/lfx/base/models/model_metadata.py (4)
  • ModelCost (4-13)
  • ModelLimits (16-20)
  • ModelMetadata (30-69)
  • ModelModalities (23-27)
src/lfx/src/lfx/base/models/unified_models.py (1)
src/lfx/src/lfx/base/models/models_dev_client.py (3)
  • clear_cache (386-394)
  • get_live_models_detailed (307-350)
  • get_provider_metadata_from_api (354-383)
🪛 GitHub Actions: Ruff Style Check
src/lfx/src/lfx/base/models/__init__.py

[error] 20-20: RUF022 __all__ is not sorted.

🪛 GitHub Check: Ruff Style Check (3.13)
src/lfx/src/lfx/base/models/models_dev_client.py

[failure] 169-169: Ruff (BLE001)
src/lfx/src/lfx/base/models/models_dev_client.py:169:12: BLE001 Do not catch blind exception: Exception


[failure] 159-159: Ruff (TRY300)
src/lfx/src/lfx/base/models/models_dev_client.py:159:9: TRY300 Consider moving this statement to an else block

src/lfx/src/lfx/base/models/unified_models.py

[failure] 109-109: Ruff (BLE001)
src/lfx/src/lfx/base/models/unified_models.py:109:12: BLE001 Do not catch blind exception: Exception


[failure] 189-189: Ruff (PLW0602)
src/lfx/src/lfx/base/models/unified_models.py:189:12: PLW0602 Using global for MODELS_DETAILED but no assignment is done


[failure] 153-153: Ruff (BLE001)
src/lfx/src/lfx/base/models/unified_models.py:153:12: BLE001 Do not catch blind exception: Exception

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Test Docker Images / Test docker images
  • GitHub Check: Run Frontend Tests / Determine Test Suites and Shard Distribution
  • GitHub Check: Run Frontend Unit Tests / Frontend Jest Unit Tests
  • GitHub Check: Test Starter Templates
  • GitHub Check: Update Component Index
  • GitHub Check: Run Ruff Check and Format
  • GitHub Check: Update Starter Projects
🔇 Additional comments (13)
src/frontend/src/modals/modelProviderModal/index.tsx (1)

255-255: LGTM! Enables scrolling for overflowing provider list.

The addition of overflow-y-auto correctly handles scenarios where the provider list exceeds the fixed height of 513px, allowing users to scroll through all available providers.

.env.example (1)

143-148: Documentation looks good.

The environment variable documentation is clear and follows the established pattern in this file. It properly documents the purpose, allowed values, default, and fallback behavior.

src/lfx/src/lfx/base/models/model_metadata.py (2)

4-28: Well-structured type definitions.

The new ModelCost, ModelLimits, and ModelModalities TypedDicts are well-designed with total=False for optional fields and clear inline documentation of units and purposes.


72-144: Clean implementation of the factory function.

The conditional assignment pattern (only adding fields when not None) is a good approach to keep the metadata dictionaries lean. The function signature is getting long but remains manageable for a configuration factory.

src/lfx/src/lfx/base/models/unified_models.py (3)

33-35: Environment variable parsing looks good.

The pattern of parsing the env var with a lowercase comparison and default to "false" is correct and defensive.


158-175: Good fallback pattern with appropriate logging.

The live-to-static fallback with a warning log is a solid resilience pattern. The function cleanly encapsulates the toggle behavior.


181-200: > Likely an incorrect or invalid review comment.

src/frontend/src/controllers/API/queries/models/use-get-model-providers.ts (1)

78-100: Verify icon keys match lazyIconImports.ts before merging.

The provider-to-icon mappings reference several custom icons (GoogleGenerativeAI, VertexAI, Mistral, DeepSeek, xAI, WatsonxAI, SambaNova, AWS, Azure, NVIDIA, Ollama) that must be registered in src/frontend/src/icons/lazyIconImports.ts. Icon keys are case-sensitive and must match exactly. Ensure each referenced icon key has a corresponding dynamic import entry in lazyIconImports.ts. The "Bot" fallback is acceptable for unsupported providers (Together AI, Fireworks AI, Alibaba, Cerebras).

src/lfx/src/lfx/base/models/models_dev_client.py (5)

1-93: LGTM: Well-structured module with comprehensive provider mappings.

The imports, constants, and provider mappings are well-organized and comprehensive.


96-120: LGTM: Cache loading logic is sound.

The cache path construction and loading with TTL validation are implemented correctly.


122-131: LGTM: Cache saving is implemented correctly.

Directory creation and error handling are appropriate.


307-351: LGTM: Query function is well-implemented.

The filtering logic and parameter handling are correct.


397-439: LGTM: Convenience and search functions are well-implemented.

The wrapper and search functions provide a clean API surface.

Comment on lines 20 to 41
__all__ = [
# Core components
"LCModelComponent",
# Unified models API
"get_model_provider_variable_mapping",
"get_model_providers",
"get_unified_models_detailed",
"refresh_live_model_data",
# Model metadata types
"ModelCost",
"ModelLimits",
"ModelMetadata",
"ModelModalities",
"create_model_metadata",
# Live models API (models.dev)
"clear_live_models_cache",
"fetch_models_dev_data",
"get_live_models_detailed",
"get_models_by_provider",
"get_provider_metadata_from_api",
"search_models",
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix: __all__ is not sorted (pipeline failure).

The static analysis is failing because __all__ entries are not sorted alphabetically. While the grouping comments are helpful for organization, Ruff's RUF022 rule requires sorted exports.

Apply this diff to fix the pipeline failure:

 __all__ = [
-    # Core components
     "LCModelComponent",
-    # Unified models API
-    "get_model_provider_variable_mapping",
-    "get_model_providers",
-    "get_unified_models_detailed",
-    "refresh_live_model_data",
-    # Model metadata types
     "ModelCost",
     "ModelLimits",
     "ModelMetadata",
     "ModelModalities",
+    "clear_live_models_cache",
     "create_model_metadata",
-    # Live models API (models.dev)
-    "clear_live_models_cache",
     "fetch_models_dev_data",
     "get_live_models_detailed",
+    "get_model_provider_variable_mapping",
+    "get_model_providers",
     "get_models_by_provider",
     "get_provider_metadata_from_api",
+    "get_unified_models_detailed",
+    "refresh_live_model_data",
     "search_models",
 ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
__all__ = [
# Core components
"LCModelComponent",
# Unified models API
"get_model_provider_variable_mapping",
"get_model_providers",
"get_unified_models_detailed",
"refresh_live_model_data",
# Model metadata types
"ModelCost",
"ModelLimits",
"ModelMetadata",
"ModelModalities",
"create_model_metadata",
# Live models API (models.dev)
"clear_live_models_cache",
"fetch_models_dev_data",
"get_live_models_detailed",
"get_models_by_provider",
"get_provider_metadata_from_api",
"search_models",
]
__all__ = [
"LCModelComponent",
"ModelCost",
"ModelLimits",
"ModelMetadata",
"ModelModalities",
"clear_live_models_cache",
"create_model_metadata",
"fetch_models_dev_data",
"get_live_models_detailed",
"get_model_provider_variable_mapping",
"get_model_providers",
"get_models_by_provider",
"get_provider_metadata_from_api",
"get_unified_models_detailed",
"refresh_live_model_data",
"search_models",
]
🧰 Tools
🪛 GitHub Actions: Ruff Style Check

[error] 20-20: RUF022 __all__ is not sorted.

🪛 GitHub Check: Ruff Style Check (3.13)

[failure] 20-41: Ruff (RUF022)
src/lfx/src/lfx/base/models/init.py:20:11: RUF022 __all__ is not sorted

🤖 Prompt for AI Agents
In src/lfx/src/lfx/base/models/__init__.py around lines 20 to 41, the __all__
list is failing RUF022 because its string entries are not alphabetically sorted;
update the list so all exported names (ignore the inline grouping comments) are
sorted in ascending alphabetical order, keeping each entry as a quoted string
with trailing commas and preserving overall formatting/line breaks.

Comment on lines +133 to +172
def fetch_models_dev_data(*, force_refresh: bool = False) -> dict[str, Any]:
"""Fetch model data from models.dev API.
Args:
force_refresh: If True, bypass cache and fetch fresh data.
Returns:
Dictionary containing all provider and model data from the API.
"""
# Try cache first
if not force_refresh:
cached = _load_cache()
if cached is not None:
logger.debug("Using cached models.dev data")
return cached

# Fetch from API
try:
with httpx.Client(timeout=30.0) as client:
response = client.get(MODELS_DEV_API_URL)
response.raise_for_status()
data = response.json()

# Cache the result
_save_cache(data)
logger.info("Successfully fetched models.dev data")
return data

except httpx.HTTPError as e:
logger.warning(f"Failed to fetch models.dev data: {e}")
# Try to return stale cache if available
cached = _load_cache()
if cached is not None:
logger.info("Using stale cache due to API error")
return cached
return {}
except Exception as e:
logger.error(f"Unexpected error fetching models.dev data: {e}")
return {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix stale cache fallback logic and exception handling.

There are several issues in this function:

  1. Line 164: The stale cache fallback won't work as intended because _load_cache() checks TTL and returns None for expired cache. To retrieve stale cache, you need to read the file directly without TTL validation.

  2. Line 169: Catching bare Exception is too broad and can mask unexpected errors. Be more specific or remove this catch block to let unexpected exceptions propagate.

  3. Line 151: The timeout value 30.0 should be defined as a module-level constant alongside other configuration values.

Apply this diff to fix the stale cache logic:

     except httpx.HTTPError as e:
         logger.warning(f"Failed to fetch models.dev data: {e}")
-        # Try to return stale cache if available
-        cached = _load_cache()
-        if cached is not None:
+        # Try to return stale cache (bypass TTL check)
+        cache_path = _get_cache_path()
+        if cache_path.exists():
+            try:
+                with cache_path.open() as f:
+                    cache_data = json.load(f)
+                    cached = cache_data.get("data")
+                if cached:
+                    logger.info("Using stale cache due to API error")
+                    return cached
+            except (OSError, json.JSONDecodeError):
+                pass
-            logger.info("Using stale cache due to API error")
-            return cached
         return {}
-    except Exception as e:
-        logger.error(f"Unexpected error fetching models.dev data: {e}")
-        return {}

For the timeout constant:

 # API Configuration
 MODELS_DEV_API_URL = "https://models.dev/api.json"
 CACHE_TTL_SECONDS = 3600  # 1 hour cache TTL
+HTTP_TIMEOUT_SECONDS = 30.0  # HTTP request timeout
 CACHE_FILE_NAME = ".models_dev_cache.json"
-        with httpx.Client(timeout=30.0) as client:
+        with httpx.Client(timeout=HTTP_TIMEOUT_SECONDS) as client:

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 GitHub Check: Ruff Style Check (3.13)

[failure] 169-169: Ruff (BLE001)
src/lfx/src/lfx/base/models/models_dev_client.py:169:12: BLE001 Do not catch blind exception: Exception


[failure] 159-159: Ruff (TRY300)
src/lfx/src/lfx/base/models/models_dev_client.py:159:9: TRY300 Consider moving this statement to an else block

Comment on lines +229 to +304
def transform_api_model_to_metadata(
provider_id: str,
provider_data: dict[str, Any],
model_id: str,
model_data: dict[str, Any],
) -> ModelMetadata:
"""Transform API model data to ModelMetadata format.
Args:
provider_id: The provider ID from the API (e.g., "openai")
provider_data: The provider data from the API
model_id: The model ID from the API
model_data: The model data from the API
Returns:
ModelMetadata object with transformed data
"""
provider_name = PROVIDER_NAME_MAP.get(provider_id, provider_data.get("name", provider_id.title()))
icon = PROVIDER_ICON_MAP.get(provider_id, "bot") # Default to "bot" if no custom icon

# Determine model type
model_type = _determine_model_type(model_data)

# Build metadata
metadata = ModelMetadata(
# Core identification
provider=provider_name,
provider_id=provider_id,
name=model_id,
display_name=model_data.get("name", model_id),
icon=icon,
# Capabilities
tool_calling=model_data.get("tool_call", False),
reasoning=model_data.get("reasoning", False),
structured_output=model_data.get("structured_output", False),
temperature=model_data.get("temperature", True),
attachment=model_data.get("attachment", False),
# Status flags
preview="-preview" in model_id.lower() or "beta" in model_id.lower(),
not_supported=provider_id not in SUPPORTED_PROVIDERS,
deprecated=False,
default=False,
open_weights=model_data.get("open_weights", False),
# Model classification
model_type=model_type,
)

# Add extended metadata
cost = _transform_cost(model_data.get("cost"))
if cost:
metadata["cost"] = cost

limits = _transform_limits(model_data.get("limit"))
if limits:
metadata["limits"] = limits

modalities = _transform_modalities(model_data.get("modalities"))
if modalities:
metadata["modalities"] = modalities

if model_data.get("knowledge"):
metadata["knowledge_cutoff"] = model_data["knowledge"]
if model_data.get("release_date"):
metadata["release_date"] = model_data["release_date"]
if model_data.get("last_updated"):
metadata["last_updated"] = model_data["last_updated"]

# Provider metadata
if provider_data.get("api"):
metadata["api_base"] = provider_data["api"]
if provider_data.get("env"):
metadata["env_vars"] = provider_data["env"]
if provider_data.get("doc"):
metadata["documentation_url"] = provider_data["doc"]

return metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix icon name inconsistency.

Line 247 uses lowercase "bot" as the default icon, but line 370 uses "Bot" with a capital B, and the PROVIDER_ICON_MAP consistently uses "Bot". This inconsistency could cause icon lookup failures in the frontend.

Apply this diff:

-    icon = PROVIDER_ICON_MAP.get(provider_id, "bot")  # Default to "bot" if no custom icon
+    icon = PROVIDER_ICON_MAP.get(provider_id, "Bot")  # Default to "Bot" if no custom icon
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def transform_api_model_to_metadata(
provider_id: str,
provider_data: dict[str, Any],
model_id: str,
model_data: dict[str, Any],
) -> ModelMetadata:
"""Transform API model data to ModelMetadata format.
Args:
provider_id: The provider ID from the API (e.g., "openai")
provider_data: The provider data from the API
model_id: The model ID from the API
model_data: The model data from the API
Returns:
ModelMetadata object with transformed data
"""
provider_name = PROVIDER_NAME_MAP.get(provider_id, provider_data.get("name", provider_id.title()))
icon = PROVIDER_ICON_MAP.get(provider_id, "bot") # Default to "bot" if no custom icon
# Determine model type
model_type = _determine_model_type(model_data)
# Build metadata
metadata = ModelMetadata(
# Core identification
provider=provider_name,
provider_id=provider_id,
name=model_id,
display_name=model_data.get("name", model_id),
icon=icon,
# Capabilities
tool_calling=model_data.get("tool_call", False),
reasoning=model_data.get("reasoning", False),
structured_output=model_data.get("structured_output", False),
temperature=model_data.get("temperature", True),
attachment=model_data.get("attachment", False),
# Status flags
preview="-preview" in model_id.lower() or "beta" in model_id.lower(),
not_supported=provider_id not in SUPPORTED_PROVIDERS,
deprecated=False,
default=False,
open_weights=model_data.get("open_weights", False),
# Model classification
model_type=model_type,
)
# Add extended metadata
cost = _transform_cost(model_data.get("cost"))
if cost:
metadata["cost"] = cost
limits = _transform_limits(model_data.get("limit"))
if limits:
metadata["limits"] = limits
modalities = _transform_modalities(model_data.get("modalities"))
if modalities:
metadata["modalities"] = modalities
if model_data.get("knowledge"):
metadata["knowledge_cutoff"] = model_data["knowledge"]
if model_data.get("release_date"):
metadata["release_date"] = model_data["release_date"]
if model_data.get("last_updated"):
metadata["last_updated"] = model_data["last_updated"]
# Provider metadata
if provider_data.get("api"):
metadata["api_base"] = provider_data["api"]
if provider_data.get("env"):
metadata["env_vars"] = provider_data["env"]
if provider_data.get("doc"):
metadata["documentation_url"] = provider_data["doc"]
return metadata
def transform_api_model_to_metadata(
provider_id: str,
provider_data: dict[str, Any],
model_id: str,
model_data: dict[str, Any],
) -> ModelMetadata:
"""Transform API model data to ModelMetadata format.
Args:
provider_id: The provider ID from the API (e.g., "openai")
provider_data: The provider data from the API
model_id: The model ID from the API
model_data: The model data from the API
Returns:
ModelMetadata object with transformed data
"""
provider_name = PROVIDER_NAME_MAP.get(provider_id, provider_data.get("name", provider_id.title()))
icon = PROVIDER_ICON_MAP.get(provider_id, "Bot") # Default to "Bot" if no custom icon
# Determine model type
model_type = _determine_model_type(model_data)
# Build metadata
metadata = ModelMetadata(
# Core identification
provider=provider_name,
provider_id=provider_id,
name=model_id,
display_name=model_data.get("name", model_id),
icon=icon,
# Capabilities
tool_calling=model_data.get("tool_call", False),
reasoning=model_data.get("reasoning", False),
structured_output=model_data.get("structured_output", False),
temperature=model_data.get("temperature", True),
attachment=model_data.get("attachment", False),
# Status flags
preview="-preview" in model_id.lower() or "beta" in model_id.lower(),
not_supported=provider_id not in SUPPORTED_PROVIDERS,
deprecated=False,
default=False,
open_weights=model_data.get("open_weights", False),
# Model classification
model_type=model_type,
)
# Add extended metadata
cost = _transform_cost(model_data.get("cost"))
if cost:
metadata["cost"] = cost
limits = _transform_limits(model_data.get("limit"))
if limits:
metadata["limits"] = limits
modalities = _transform_modalities(model_data.get("modalities"))
if modalities:
metadata["modalities"] = modalities
if model_data.get("knowledge"):
metadata["knowledge_cutoff"] = model_data["knowledge"]
if model_data.get("release_date"):
metadata["release_date"] = model_data["release_date"]
if model_data.get("last_updated"):
metadata["last_updated"] = model_data["last_updated"]
# Provider metadata
if provider_data.get("api"):
metadata["api_base"] = provider_data["api"]
if provider_data.get("env"):
metadata["env_vars"] = provider_data["env"]
if provider_data.get("doc"):
metadata["documentation_url"] = provider_data["doc"]
return metadata
🤖 Prompt for AI Agents
In src/lfx/src/lfx/base/models/models_dev_client.py around lines 229 to 304, the
default icon is set to the lowercase string "bot" which is inconsistent with
PROVIDER_ICON_MAP and elsewhere using "Bot"; change the default icon value to
"Bot" (capital B) so the fallback matches the map and frontend lookups—update
the icon assignment to use "Bot" instead of "bot".

Comment on lines +353 to +384
@lru_cache(maxsize=1)
def get_provider_metadata_from_api() -> dict[str, dict[str, Any]]:
"""Get provider metadata from the API for all supported providers.
Returns:
Dictionary mapping provider names to their metadata
"""
api_data = fetch_models_dev_data()
if not api_data:
return {}

provider_metadata = {}
for provider_id, provider_data in api_data.items():
if provider_id not in SUPPORTED_PROVIDERS:
continue

provider_name = PROVIDER_NAME_MAP.get(provider_id, provider_data.get("name", provider_id.title()))
icon = PROVIDER_ICON_MAP.get(provider_id, "Bot")

env_vars = provider_data.get("env", [])
variable_name = env_vars[0] if env_vars else f"{provider_id.upper()}_API_KEY"

provider_metadata[provider_name] = {
"icon": icon,
"variable_name": variable_name,
"api_base": provider_data.get("api"),
"documentation_url": provider_data.get("doc"),
"provider_id": provider_id,
}

return provider_metadata

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Address cache coherence issue with lru_cache.

The @lru_cache decorator creates a memory cache that won't be invalidated when clear_cache() is called. This could lead to stale provider metadata being returned from memory even after the disk cache is cleared.

Apply this diff to clear the lru_cache when clearing disk cache:

 def clear_cache() -> None:
     """Clear the models.dev cache."""
     cache_path = _get_cache_path()
     try:
         if cache_path.exists():
             cache_path.unlink()
+            get_provider_metadata_from_api.cache_clear()
             logger.info("Cleared models.dev cache")
     except OSError as e:
         logger.warning(f"Failed to clear cache: {e}")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/lfx/src/lfx/base/models/models_dev_client.py around lines 353-384, the
get_provider_metadata_from_api function is decorated with @lru_cache which
causes stale in-memory data after clearing the disk cache; update the code that
clears the disk cache (where clear_cache() or similar is implemented) to also
call get_provider_metadata_from_api.cache_clear() so the lru_cache is
invalidated when disk cache is cleared; if clear_cache() lives in another
module, import get_provider_metadata_from_api there and invoke .cache_clear() as
part of the cache-clearing routine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants