Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Dec 10, 2025

⚡️ This pull request contains optimizations for PR #10953

If you approve this dependent PR, these changes will be merged into the original PR branch fix/folders_download.

This PR will be automatically closed if the original PR is merged.


📄 964% (9.64x) speedup for get_cache_service in src/backend/base/langflow/services/deps.py

⏱️ Runtime : 1.94 milliseconds 183 microseconds (best of 160 runs)

📝 Explanation and details

The optimization implements factory instance caching to eliminate repeated object creation overhead. The key change is in get_cache_service(), which now caches a single CacheServiceFactory instance using function attributes instead of creating a new factory on every call.

What changed:

  • Added a conditional check if not hasattr(get_cache_service, "_cache_service_factory") to create the factory only once
  • Stores the factory instance as get_cache_service._cache_service_factory on the function object
  • Reuses the cached factory instance on subsequent calls

Why this is faster:
The original code created a new CacheServiceFactory() object on every call to get_cache_service(). Object instantiation in Python involves memory allocation, constructor execution, and attribute initialization. With 1025 calls in the profiler results, this created 1025 unnecessary factory objects.

The optimization reduces this to a single factory creation (on first call) plus 1024 fast attribute lookups. The line profiler shows the factory creation time dropped from 187μs per call to just 524ns for the attribute check, with the expensive creation happening only once (36.4% of total time in one call vs 99.5% spread across all calls).

Performance impact:

  • 963% speedup (1.94ms → 183μs runtime)
  • Most effective for workloads with frequent cache service requests
  • Particularly beneficial in service-oriented architectures where get_cache_service() is called repeatedly during request processing
  • The caching eliminates redundant factory instantiation while preserving the same service manager behavior

The optimization works well across all test scenarios, from basic single calls to large-scale operations with 1000+ service requests, making it universally beneficial without changing the API contract.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1024 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

# imports
import pytest
from langflow.services.deps import get_cache_service


# Minimal stubs for dependencies to allow testing in isolation.
# In a real project, these would be actual imports.
class ServiceType:
    CACHE_SERVICE = "CACHE_SERVICE"

class CacheService:
    def __init__(self):
        self.type = "sync"

class AsyncBaseCacheService:
    def __init__(self):
        self.type = "async"

class CacheServiceFactory:
    """Factory that returns a CacheService instance"""
    def __call__(self):
        return CacheService()

class AsyncCacheServiceFactory:
    """Factory that returns an AsyncBaseCacheService instance"""
    def __call__(self):
        return AsyncBaseCacheService()

# ServiceManager and get_service_manager for dependency injection
class ServiceManager:
    def __init__(self):
        self._factories = {}
        self._services = {}

    def are_factories_registered(self):
        return bool(self._factories)

    def register_factories(self, factories):
        self._factories.update(factories)

    def get(self, service_type, default=None):
        # If service is already created, return it
        if service_type in self._services:
            return self._services[service_type]
        # If factory registered, use it
        if service_type in self._factories:
            self._services[service_type] = self._factories[service_type]()
            return self._services[service_type]
        # If default provided, use it
        if default is not None:
            service = default()
            self._services[service_type] = service
            return service
        # Not found
        raise KeyError(f"Service {service_type} not found")

    @staticmethod
    def get_factories():
        # By default, only CacheServiceFactory for CACHE_SERVICE
        return {ServiceType.CACHE_SERVICE: CacheServiceFactory()}

# Singleton pattern for service manager for test isolation
_service_manager_instance = None

def get_service_manager():
    global _service_manager_instance
    if _service_manager_instance is None:
        _service_manager_instance = ServiceManager()
    return _service_manager_instance
from langflow.services.deps import get_cache_service

# 1. Basic Test Cases

def test_returns_cache_service_instance():
    """Test that get_cache_service returns a CacheService instance by default."""
    codeflash_output = get_cache_service(); service = codeflash_output

def test_returns_same_instance_on_multiple_calls():
    """Test that get_cache_service returns the same instance on multiple calls (singleton behavior)."""
    codeflash_output = get_cache_service(); s1 = codeflash_output
    codeflash_output = get_cache_service(); s2 = codeflash_output

def test_service_type_is_cache_service():
    """Test that the returned service is registered under the correct service type."""
    service_manager = get_service_manager()
    codeflash_output = get_cache_service(); service = codeflash_output

# 2. Edge Test Cases

def test_returns_async_cache_service_when_factory_is_async():
    """Test that get_cache_service returns an AsyncBaseCacheService if the factory is changed."""
    service_manager = get_service_manager()
    # Register an async factory for CACHE_SERVICE
    service_manager.register_factories({ServiceType.CACHE_SERVICE: AsyncCacheServiceFactory()})
    codeflash_output = get_cache_service(); service = codeflash_output

def test_returns_service_when_already_registered():
    """Test that get_cache_service returns the already registered service (does not recreate)."""
    service_manager = get_service_manager()
    # Manually register a service instance
    custom_service = CacheService()
    service_manager._services[ServiceType.CACHE_SERVICE] = custom_service
    codeflash_output = get_cache_service(); service = codeflash_output


def test_factory_returns_none():
    """Test edge case where the factory returns None."""
    class NoneFactory:
        def __call__(self):
            return None
    service_manager = get_service_manager()
    service_manager.register_factories({ServiceType.CACHE_SERVICE: NoneFactory()})
    codeflash_output = get_cache_service(); service = codeflash_output

def test_service_manager_factories_registration_only_once():
    """Test that factories are only registered once, and not overwritten."""
    service_manager = get_service_manager()
    # Register a custom factory
    service_manager.register_factories({ServiceType.CACHE_SERVICE: AsyncCacheServiceFactory()})
    # Now call get_cache_service, which would normally try to register default factories
    codeflash_output = get_cache_service(); service = codeflash_output

# 3. Large Scale Test Cases

def test_many_service_types_and_factories():
    """Test that get_cache_service works when many unrelated service types/factories are registered."""
    service_manager = get_service_manager()
    # Register 500 dummy factories
    class DummyService:
        pass
    for i in range(500):
        service_manager.register_factories({f"DUMMY_{i}": lambda: DummyService()})
    # Register CacheServiceFactory for CACHE_SERVICE
    service_manager.register_factories({ServiceType.CACHE_SERVICE: CacheServiceFactory()})
    codeflash_output = get_cache_service(); service = codeflash_output
    # Ensure that dummy services are unaffected
    for i in range(0, 500, 100):
        dummy = service_manager.get(f"DUMMY_{i}")

def test_large_number_of_get_cache_service_calls():
    """Test that repeated calls to get_cache_service (up to 1000) are fast and always return the same instance."""
    service_list = []
    for _ in range(1000):
        service_list.append(get_cache_service())
    # All should be the same instance
    first = service_list[0]

def test_large_number_of_service_instances():
    """Test that the ServiceManager can handle up to 1000 unique services and still get the cache service."""
    service_manager = get_service_manager()
    class DummyService:
        pass
    # Register 999 dummy services
    for i in range(1, 1000):
        service_manager.register_factories({f"DUMMY_{i}": lambda: DummyService()})
        service_manager.get(f"DUMMY_{i}")
    # Now get the cache service
    codeflash_output = get_cache_service(); service = codeflash_output
    # Check that all dummy services are present
    for i in range(1, 1000, 200):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

# imports
import pytest  # used for our unit tests
from langflow.services.deps import get_cache_service

# Minimal stub implementations for dependencies, so tests are self-contained
# These stubs mimic the expected interfaces and behaviors.

class CacheService:
    """Stub for synchronous cache service."""
    def __init__(self):
        self.store = {}

    def set(self, key, value):
        self.store[key] = value

    def get(self, key, default=None):
        return self.store.get(key, default)

class AsyncBaseCacheService:
    """Stub for asynchronous cache service."""
    def __init__(self):
        self.store = {}

    async def set(self, key, value):
        self.store[key] = value

    async def get(self, key, default=None):
        return self.store.get(key, default)

class ServiceType:
    """Stub for service type enumeration."""
    CACHE_SERVICE = "CACHE_SERVICE"

class CacheServiceFactory:
    """Stub for a cache service factory."""
    def __call__(self):
        # Always returns a new CacheService instance for testing
        return CacheService()

# Patch for langflow.services.cache.factory.CacheServiceFactory
CacheServiceFactoryPatch = CacheServiceFactory
from langflow.services.deps import get_cache_service

# --- Unit Tests ---

# 1. Basic Test Cases

def test_get_cache_service_returns_instance():
    """Test that get_cache_service returns a CacheService instance."""
    codeflash_output = get_cache_service(); service = codeflash_output

def test_cache_service_set_and_get():
    """Test basic set/get functionality of the cache service."""
    codeflash_output = get_cache_service(); cache = codeflash_output
    cache.set("foo", "bar")

def test_cache_service_get_default():
    """Test that cache.get returns default if key is missing."""
    codeflash_output = get_cache_service(); cache = codeflash_output

# 2. Edge Test Cases

def test_cache_service_empty_key_and_value():
    """Test storing and retrieving empty string as key and value."""
    codeflash_output = get_cache_service(); cache = codeflash_output
    cache.set("", "")

def test_cache_service_none_key():
    """Test storing and retrieving None as key."""
    codeflash_output = get_cache_service(); cache = codeflash_output
    cache.set(None, "value_for_none")

def test_cache_service_none_value():
    """Test storing and retrieving None as value."""
    codeflash_output = get_cache_service(); cache = codeflash_output
    cache.set("key_with_none", None)

def test_cache_service_overwrite_value():
    """Test overwriting a value for a given key."""
    codeflash_output = get_cache_service(); cache = codeflash_output
    cache.set("dup_key", "first")
    cache.set("dup_key", "second")

def test_cache_service_large_object():
    """Test storing and retrieving a large object."""
    codeflash_output = get_cache_service(); cache = codeflash_output
    large_obj = {"numbers": list(range(1000)), "text": "x" * 1000}
    cache.set("large", large_obj)

def test_cache_service_non_string_key():
    """Test using a tuple as a key."""
    codeflash_output = get_cache_service(); cache = codeflash_output
    tup_key = (1, 2, 3)
    cache.set(tup_key, "tuple_value")


def test_cache_service_many_keys():
    """Test storing and retrieving many keys."""
    codeflash_output = get_cache_service(); cache = codeflash_output
    for i in range(1000):
        cache.set(f"key_{i}", i)
    for i in range(1000):
        pass

def test_cache_service_memory_efficiency():
    """Test that cache does not crash or slow down with many large values."""
    codeflash_output = get_cache_service(); cache = codeflash_output
    large_value = "x" * 500
    for i in range(500):
        cache.set(f"big_{i}", large_value)
    for i in range(500):
        pass

def test_cache_service_no_cross_contamination():
    """Ensure multiple instances do not share state."""
    codeflash_output = get_cache_service(); cache1 = codeflash_output
    codeflash_output = get_cache_service(); cache2 = codeflash_output
    cache1.set("unique", "cache1")
    cache2.set("unique", "cache2")

def test_cache_service_performance_large_scale():
    """Test that large number of operations complete quickly."""
    import time
    codeflash_output = get_cache_service(); cache = codeflash_output
    start = time.time()
    for i in range(1000):
        cache.set(i, i)
    for i in range(1000):
        pass
    duration = time.time() - start
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr10953-2025-12-10T18.10.12 and push.

Codeflash

lucaseduoli and others added 5 commits December 10, 2025 14:40
The optimization implements **factory instance caching** to eliminate repeated object creation overhead. The key change is in `get_cache_service()`, which now caches a single `CacheServiceFactory` instance using function attributes instead of creating a new factory on every call.

**What changed:**
- Added a conditional check `if not hasattr(get_cache_service, "_cache_service_factory")` to create the factory only once
- Stores the factory instance as `get_cache_service._cache_service_factory` on the function object
- Reuses the cached factory instance on subsequent calls

**Why this is faster:**
The original code created a new `CacheServiceFactory()` object on every call to `get_cache_service()`. Object instantiation in Python involves memory allocation, constructor execution, and attribute initialization. With 1025 calls in the profiler results, this created 1025 unnecessary factory objects.

The optimization reduces this to a single factory creation (on first call) plus 1024 fast attribute lookups. The line profiler shows the factory creation time dropped from 187μs per call to just 524ns for the attribute check, with the expensive creation happening only once (36.4% of total time in one call vs 99.5% spread across all calls).

**Performance impact:**
- **963% speedup** (1.94ms → 183μs runtime)
- Most effective for workloads with frequent cache service requests
- Particularly beneficial in service-oriented architectures where `get_cache_service()` is called repeatedly during request processing
- The caching eliminates redundant factory instantiation while preserving the same service manager behavior

The optimization works well across all test scenarios, from basic single calls to large-scale operations with 1000+ service requests, making it universally beneficial without changing the API contract.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Dec 10, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the community Pull Request from an external contributor label Dec 10, 2025
@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.06%. Comparing base (5625819) to head (0c85129).

Additional details and impacted files

Impacted file tree graph

@@                  Coverage Diff                  @@
##           fix/folders_download   #10956   +/-   ##
=====================================================
  Coverage                 33.06%   33.06%           
=====================================================
  Files                      1368     1368           
  Lines                     63815    63817    +2     
  Branches                   9391     9391           
=====================================================
+ Hits                      21100    21101    +1     
- Misses                    41671    41673    +2     
+ Partials                   1044     1043    -1     
Flag Coverage Δ
backend 52.81% <100.00%> (-0.01%) ⬇️
lfx 40.02% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/backend/base/langflow/services/deps.py 83.82% <100.00%> (+0.49%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Base automatically changed from fix/folders_download to release-1.7.0 December 10, 2025 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI community Pull Request from an external contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants