feat: Adding rate limiting #709

Vasilije1990 · 2025-04-07T19:38:14Z

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

pull-checklist · 2025-04-07T19:38:18Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

coderabbitai · 2025-04-07T19:38:20Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes add new asynchronous and synchronous decorators for rate limiting and retry logic across multiple components, including embedding engines and API adapters. Manual retry mechanisms have been replaced with standardized decorators. Additionally, new configuration parameters and a dedicated rate limiter module have been introduced, alongside comprehensive tests validating these enhancements. An obsolete import test was removed.

Changes

File(s)	Change Summary
`cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py` `cognee/.../OllamaEmbeddingEngine.py`	Added decorators (`@embedding_sleep_and_retry_async`, `@embedding_rate_limit_async`) to automate retry and rate limiting, eliminating manual retry logic.
`cognee/infrastructure/llm/anthropic/adapter.py` `cognee/.../gemini/adapter.py` `cognee/.../generic_llm_api/adapter.py` `cognee/.../ollama/adapter.py` `cognee/.../openai/adapter.py`	Applied asynchronous and synchronous rate limiting and retry decorators to API adapter methods; included minor textual corrections in some methods.
`cognee/infrastructure/llm/config.py`	Introduced new configuration attributes for both LLM and embedding rate limiting and updated the `to_dict` method accordingly.
`cognee/infrastructure/llm/rate_limiter.py`	Added a new module with singleton classes (`LLMRateLimiter`, `EmbeddingRateLimiter`) and decorator functions to enforce rate limits and retries for both synchronous and asynchronous operations.
`cognee/tests/unit/infrastructure/test_embedding_rate_limiting.py` `cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py` `cognee/tests/unit/infrastructure/test_rate_limiting_example.py` `cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py` `cognee/tests/unit/infrastructure/test_rate_limiting_retry.py`	Added multiple test files to validate rate limiting and retry functionality under various scenarios.
`tests/import_test.py`	Removed obsolete test function `test_import_cognee`.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant AM as Adapter Method
    participant RL as Rate Limit Decorator
    participant SR as Sleep & Retry Decorator
    participant API as External API

    U->>AM: Invoke method
    AM->>RL: Check rate limit
    RL->>SR: Forward call with retry logic
    SR->>API: Execute API call
    API-->>SR: Return response or error
    alt Rate limit error
        SR->>SR: Wait and retry
        SR->>API: Retry API call
    else Success
        SR-->>AM: Return successful response
    end
    AM-->>U: Return final result

Possibly related PRs

chore: remove unused dependencies and make some optional #661 – Introduced similar decorators for rate limiting and retry logic in the OllamaEmbeddingEngine, paralleling changes in LiteLLMEmbeddingEngine.
Feat/cog 1365 unify retrievers #572 – Added decorators to the embedding engine methods, demonstrating a shared approach to handling rate limits.
Cog 813 source code chunks #383 – Modified embedding engine methods with rate limiting decorators, aligning control flow improvements across engines.

Suggested reviewers

hajdul88
borisarzentar

Poem

I'm a hopping rabbit with code so fleet,
Decorators helping my retries complete.
Rate limits and sleeps now lead the way,
Guiding my hops in a bouncy display.
Thump-thump, I cheer with CodeRabbit pride –
Bounding through changes on this joyful ride!

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (11)

cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py (1)
51-51: Remove the unused retry_count variable.

This variable is now redundant since the retry functionality has been moved to the decorator.
- self.retry_count = 0
cognee/infrastructure/llm/gemini/adapter.py (1)
10-14: Remove unused import.

The rate_limit_sync import is not used in this file and should be removed.
from cognee.infrastructure.llm.rate_limiter import (
    rate_limit_async,
-    rate_limit_sync,
    sleep_and_retry_async,
)
🧰 Tools

🪛 Ruff (0.8.2)

12-12: cognee.infrastructure.llm.rate_limiter.rate_limit_sync imported but unused

Remove unused import: cognee.infrastructure.llm.rate_limiter.rate_limit_sync

(F401)
cognee/tests/unit/infrastructure/test_embedding_rate_limiting.py (2)
4-4: Remove unused imports.

The imports lru_cache and LLMConfig are not used in this file.
import os
import time
import asyncio
-from functools import lru_cache
import logging

-from cognee.infrastructure.llm.config import LLMConfig, get_llm_config
+from cognee.infrastructure.llm.config import get_llm_config
from cognee.infrastructure.llm.rate_limiter import EmbeddingRateLimiter
Also applies to: 7-7

🧰 Tools

🪛 Ruff (0.8.2)

4-4: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)

83-84: Consider making this file a proper pytest test.

This file appears to be in a pytest directory but isn't structured as a standard pytest test. Consider:

Renaming the test function to have a test_ prefix

Removing the __main__ block, as pytest discovers and runs tests automatically

Using pytest fixtures for setup and teardown of environment variables

This would make the test more consistent with pytest conventions and easier to integrate with CI/CD pipelines.
-if __name__ == "__main__":
-    asyncio.run(test_embedding_rate_limiting())
cognee/infrastructure/llm/config.py (1)

19-24: Consider Adding Validations for Rate Limit Fields
Defining default values for rate limit configuration is good. Consider adding checks to ensure non-negative and nonzero integers when llm_rate_limit_enabled or embedding_rate_limit_enabled is True, to prevent misconfiguration.
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
75-79: Combine Nested with Statements
Static analysis (SIM117) suggests merging the nested with statements into one, reducing indentation and improving readability.

Below is a proposed refactor for lines 75-79:
-    async with aiohttp.ClientSession() as session:
-        async with session.post(
-            self.endpoint, json=payload, headers=headers, timeout=60.0
-        ) as response:
-            data = await response.json()
-            return data["embedding"]
+    async with aiohttp.ClientSession() as session, session.post(
+        self.endpoint, json=payload, headers=headers, timeout=60.0
+    ) as response:
+        data = await response.json()
+        return data["embedding"
🧰 Tools

🪛 Ruff (0.8.2)

75-78: Use a single with statement with multiple contexts instead of nested with statements

Combine with statements

(SIM117)
cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py (1)
1-9: Import section can be optimized.

The time and functools.lru_cache modules are imported but never used directly in the code.
import asyncio
-import time
import os
-from functools import lru_cache
from unittest.mock import patch
from cognee.shared.logging_utils import get_logger
from cognee.infrastructure.llm.rate_limiter import LLMRateLimiter
from cognee.infrastructure.llm.config import get_llm_config
🧰 Tools

🪛 Ruff (0.8.2)

2-2: time imported but unused

Remove unused import: time

(F401)

4-4: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)
cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py (2)
1-8: Import section contains unused imports.

The functools.lru_cache and LLMConfig imports are not used directly in the code.
import os
import time
import asyncio
-from functools import lru_cache
import logging

-from cognee.infrastructure.llm.config import LLMConfig, get_llm_config
+from cognee.infrastructure.llm.config import get_llm_config
from cognee.infrastructure.llm.rate_limiter import EmbeddingRateLimiter
🧰 Tools

🪛 Ruff (0.8.2)

4-4: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)

7-7: cognee.infrastructure.llm.config.LLMConfig imported but unused

Remove unused import: cognee.infrastructure.llm.config.LLMConfig

(F401)

13-15: Using standard logging instead of the application's logger pattern.

While this works, it's inconsistent with the logging approach used in other test files (like test_rate_limiting_realistic.py), which use get_logger().
-# Configure logging
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
+from cognee.shared.logging_utils import get_logger
+
+# Get logger
+logger = get_logger()
cognee/tests/unit/infrastructure/test_rate_limiting_retry.py (1)
3-5: Remove unused imports

Ruff flagged these imports as unused. Please remove them to keep the test file clean.
- import os
- from unittest.mock import patch, MagicMock
- from functools import lru_cache
🧰 Tools

🪛 Ruff (0.8.2)

3-3: os imported but unused

Remove unused import: os

(F401)

4-4: unittest.mock.patch imported but unused

Remove unused import

(F401)

4-4: unittest.mock.MagicMock imported but unused

Remove unused import

(F401)

5-5: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)
cognee/infrastructure/llm/rate_limiter.py (1)
56-56: Remove unused import

Ruff flagged openai as imported but never used. Please remove it to keep the file clean.
- import openai
🧰 Tools

🪛 Ruff (0.8.2)

56-56: openai imported but unused

Remove unused import: openai

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ca8ef2 and bc36aee.

⛔ Files ignored due to path filters (2)

poetry.lock is excluded by !**/*.lock, !**/*.lock
pyproject.toml is excluded by !**/*.toml

📒 Files selected for processing (15)

cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py (2 hunks)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (4 hunks)
cognee/infrastructure/llm/anthropic/adapter.py (2 hunks)
cognee/infrastructure/llm/config.py (2 hunks)
cognee/infrastructure/llm/gemini/adapter.py (2 hunks)
cognee/infrastructure/llm/generic_llm_api/adapter.py (2 hunks)
cognee/infrastructure/llm/ollama/adapter.py (5 hunks)
cognee/infrastructure/llm/openai/adapter.py (6 hunks)
cognee/infrastructure/llm/rate_limiter.py (1 hunks)
cognee/tests/unit/infrastructure/test_embedding_rate_limiting.py (1 hunks)
cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py (1 hunks)
cognee/tests/unit/infrastructure/test_rate_limiting_example.py (1 hunks)
cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py (1 hunks)
cognee/tests/unit/infrastructure/test_rate_limiting_retry.py (1 hunks)
tests/import_test.py (0 hunks)

💤 Files with no reviewable changes (1)

tests/import_test.py

🧰 Additional context used

🧬 Code Definitions (11)

cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py (3)

cognee/shared/logging_utils.py (1)

get_logger (137-158)

cognee/infrastructure/llm/rate_limiter.py (3)

LLMRateLimiter (86-191)

hit_limit (133-151)

hit_limit (420-445)

cognee/infrastructure/llm/config.py (1)

get_llm_config (104-105)

cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py (1)

cognee/infrastructure/llm/rate_limiter.py (2)

embedding_rate_limit_async (531-565)

embedding_sleep_and_retry_async (640-709)

cognee/infrastructure/llm/anthropic/adapter.py (1)

cognee/infrastructure/llm/rate_limiter.py (2)

rate_limit_async (220-243)

sleep_and_retry_async (331-376)

cognee/infrastructure/llm/gemini/adapter.py (1)

cognee/infrastructure/llm/rate_limiter.py (3)

rate_limit_async (220-243)

rate_limit_sync (194-217)

sleep_and_retry_async (331-376)

cognee/infrastructure/llm/generic_llm_api/adapter.py (1)

cognee/infrastructure/llm/rate_limiter.py (2)

rate_limit_async (220-243)

sleep_and_retry_async (331-376)

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)

cognee/infrastructure/llm/rate_limiter.py (2)

embedding_rate_limit_async (531-565)

embedding_sleep_and_retry_async (640-709)

cognee/tests/unit/infrastructure/test_rate_limiting_retry.py (1)

cognee/infrastructure/llm/rate_limiter.py (3)

sleep_and_retry_sync (283-328)

sleep_and_retry_async (331-376)

is_rate_limit_error (246-257)

cognee/tests/unit/infrastructure/test_rate_limiting_example.py (4)

cognee/shared/logging_utils.py (1)

get_logger (137-158)

cognee/modules/search/types/SearchType.py (1)

SearchType (4-13)

cognee/infrastructure/llm/rate_limiter.py (3)

LLMRateLimiter (86-191)

hit_limit (133-151)

hit_limit (420-445)

cognee/infrastructure/llm/config.py (1)

get_llm_config (104-105)

cognee/infrastructure/llm/openai/adapter.py (1)

cognee/infrastructure/llm/rate_limiter.py (4)

rate_limit_async (220-243)

rate_limit_sync (194-217)

sleep_and_retry_async (331-376)

sleep_and_retry_sync (283-328)

cognee/infrastructure/llm/rate_limiter.py (3)

cognee/shared/logging_utils.py (2)

get_logger (137-158)

error (127-128)

cognee/infrastructure/llm/config.py (1)

get_llm_config (104-105)

cognee/infrastructure/databases/exceptions/EmbeddingException.py (1)

EmbeddingException (5-14)

cognee/infrastructure/llm/ollama/adapter.py (1)

cognee/infrastructure/llm/rate_limiter.py (3)

rate_limit_async (220-243)

rate_limit_sync (194-217)

sleep_and_retry_async (331-376)

🪛 Ruff (0.8.2)

cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py

4-4: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)

7-7: cognee.infrastructure.llm.config.LLMConfig imported but unused

Remove unused import: cognee.infrastructure.llm.config.LLMConfig

(F401)

cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py

2-2: time imported but unused

Remove unused import: time

(F401)

4-4: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)

cognee/tests/unit/infrastructure/test_embedding_rate_limiting.py

4-4: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)

7-7: cognee.infrastructure.llm.config.LLMConfig imported but unused

Remove unused import: cognee.infrastructure.llm.config.LLMConfig

(F401)

cognee/infrastructure/llm/gemini/adapter.py

12-12: cognee.infrastructure.llm.rate_limiter.rate_limit_sync imported but unused

Remove unused import: cognee.infrastructure.llm.rate_limiter.rate_limit_sync

(F401)

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py

75-78: Use a single with statement with multiple contexts instead of nested with statements

Combine with statements

(SIM117)

cognee/tests/unit/infrastructure/test_rate_limiting_retry.py

3-3: os imported but unused

Remove unused import: os

(F401)

4-4: unittest.mock.patch imported but unused

Remove unused import

(F401)

4-4: unittest.mock.MagicMock imported but unused

Remove unused import

(F401)

5-5: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)

cognee/infrastructure/llm/rate_limiter.py

56-56: openai imported but unused

Remove unused import: openai

(F401)

🔇 Additional comments (51)

cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py (2)

13-16: Good addition of rate limiting imports.

The new imports correctly bring in the necessary decorators for handling rate limiting and retry logic, which will help standardize this functionality across the codebase.

58-60: Excellent refactoring to use decorators for rate limiting and retries.

The use of decorators is a significant improvement over manual retry and rate limit handling. This approach:

Centralizes the rate limiting and retry logic

Makes the code more maintainable and consistent across the codebase

Keeps the core embedding functionality clean and focused

cognee/infrastructure/llm/anthropic/adapter.py (2)

8-8: Good addition of rate limiting imports.

The import correctly brings in the necessary decorators for handling rate limiting and retry logic.

27-29: Well-implemented decorators for rate limiting.

The application of these decorators follows the same pattern as in other adapters, providing a consistent approach to rate limiting across different API providers.

cognee/infrastructure/llm/gemini/adapter.py (1)

45-47: Well-implemented decorators for rate limiting.

The consistent application of rate limiting decorators across different adapters helps standardize the API call behavior. Note that the decorators are correctly placed below the monitoring decorator.

cognee/tests/unit/infrastructure/test_embedding_rate_limiting.py (1)

18-75: Well-structured rate limiting test.

The test effectively validates that the rate limiting functionality works as expected, with appropriate logging and assertions. The approach of setting environment variables, running tests, and then cleaning up is robust.

cognee/infrastructure/llm/generic_llm_api/adapter.py (2)

9-9: Imports for Decorators Look Good
The newly introduced imports for async rate limiting and retry logic appear correct and consistent with the rest of the file.

31-32: Chaining Decorators in Correct Order
Decorating with @rate_limit_async under @sleep_and_retry_async() ensures that rate limits are checked first, and retry attempts occur upon hitting rate limit errors.

cognee/infrastructure/llm/config.py (1)

94-99: Inclusion of Rate Limit Fields in Serialization
Thank you for adding the newly introduced fields to the to_dict method. This ensures external consumers have access to rate limit settings.

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)

12-15: New Imports for Embedding Rate Limiting
The imports for embedding_rate_limit_async and embedding_sleep_and_retry_async are used consistently to handle rate limits and retries for embedding generation.

50-50: Applying Rate Limiting to Embedding Method
Decorating embed_text with @embedding_rate_limit_async helps manage concurrent requests, ensuring embedding API calls respect set limits.

61-61: Retry on Rate Limit Errors
Wrapping _get_embedding with @embedding_sleep_and_retry_async() cleanly centralizes the retry logic on rate limit exceptions.

cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py (8)

11-15: Well-documented test function with clear purpose.

The docstring clearly explains the test's purpose of demonstrating how rate limiting works with a smaller rate limit.

18-26: Good test setup with configuration and cache clearing.

The test properly sets up a controlled environment by:

Setting environment variables for rate limiting configuration

Clearing the configuration cache and limiter instance

Creating fresh instances for testing

This ensures the test runs with deterministic behavior.

33-61: Clever use of monkey patching for deterministic testing.

The mock implementation controls the hit_limit method behavior with predefined success/failure sequences, which makes the test deterministic and reliable. The pattern of responses clearly simulates the rate limiting window moving after wait periods.

66-84: Well-structured first batch test.

The test properly captures both successful and rate-limited requests in separate arrays, provides clear logging, and calculates meaningful metrics.

85-104: Good simulation of partial rate limit recovery.

The test properly implements waiting for a partial window reset and tests that only a portion of the capacity has regenerated.

106-125: Thorough testing of full capacity regeneration.

The test correctly verifies that after a full window wait period, all capacity has been restored and all requests succeed.

135-149: Comprehensive verification with clear success/failure messaging.

The test includes explicit pass/fail messages that clearly communicate the expected behavior and actual results for each test batch.

154-166: Well-structured async main function and proper event loop management.

The implementation follows best practices for handling async operations, including properly shutting down async generators.

cognee/tests/unit/infrastructure/test_rate_limiting_example.py (4)

1-8: Good import structure with relevant modules.

The imports include all necessary modules for testing rate limiting functionality and interacting with the cognee system.

10-69: Well-implemented rate limiting test.

The test function:

Properly configures rate limiting via environment variables

Creates a fresh limiter instance

Makes more requests (70) than the limit (60) to verify rate limiting behavior

Measures elapsed time for accurate rate calculation

Includes comprehensive result verification with tolerance (58-62 requests)

This tests both the functional correctness and the timing aspects of rate limiting.

72-109: Comprehensive end-to-end example integration.

The main() function provides a complete workflow that:

Resets the system to a clean state

Adds sample text

Processes it through cognify

Searches for insights

Runs the rate limiting test

This is valuable for demonstrating how rate limiting integrates with the broader system.

124-131: Proper async event loop setup and cleanup.

The code follows best practices for managing the asyncio event loop, including proper cleanup of async generators.

cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py (8)

18-32: Comprehensive test setup with appropriate environment configuration.

The test properly sets environment variables to enable rate limiting, configure the limits, enable mock embeddings, and disable automatic retries.

33-42: Proper cache clearing and instance verification.

The test correctly clears configuration caches and resets the rate limiter instance to ensure a clean test environment. It also logs the configuration to verify settings.

52-79: Well-structured first batch test with clear tracking.

The first batch test properly:

Sends a defined number of requests

Tracks success and failure counts

Times the batch execution

Provides detailed logging

This approach effectively tests the rate limiting functionality.

80-111: Good implementation of partial capacity recovery test.

The test correctly implements waiting for partial capacity recovery and then testing a second batch of requests to verify that only some of the capacity has regenerated.

113-144: Thorough testing of full capacity regeneration.

The test correctly verifies that after a full wait period, rate limiting capacity has been fully restored by sending another batch of requests and tracking the results.

146-154: Good test validation with clear assertions.

The test properly validates that both successful and rate-limited requests occurred, confirming that the rate limiter is working as expected.

156-161: Proper cleanup of environment variables.

The test cleans up after itself by removing all the environment variables it set, which is good practice for test isolation.

164-165: Simple and effective main block.

The code uses asyncio.run() rather than manually managing the event loop, which is a cleaner approach for simpler scripts.

cognee/infrastructure/llm/ollama/adapter.py (5)

6-10: Good addition of rate limiting imports.

The necessary rate limiter decorators are properly imported for both synchronous and asynchronous methods.

30-32: Appropriate rate limiting for asynchronous API calls.

The acreate_structured_output method is correctly decorated with both:

@sleep_and_retry_async() - to automatically retry when rate limits are hit

@rate_limit_async - to apply asynchronous rate limiting

This ensures the method respects rate limits while providing automatic retry functionality.

55-55: Added rate limiting for synchronous transcript creation.

The create_transcript method is correctly decorated with @rate_limit_sync to apply synchronous rate limiting for this method.

75-75: Added rate limiting for synchronous image transcription.

The transcribe_image method is correctly decorated with @rate_limit_sync to apply synchronous rate limiting for this method.

91-91: Fixed typographical issue in user prompt.

Changed the single quote to an apostrophe in "What's in this image?" for consistency.

cognee/tests/unit/infrastructure/test_rate_limiting_retry.py (7)

16-31: Synchronous retry test function looks solid

This approach of incrementing a counter and raising rate limit errors to trigger retries is appropriate.

34-50: Asynchronous retry test function is well-written

Good job simulating rate limit errors and testing asynchronous retry logic.

52-92: Rate limit error detection coverage is comprehensive

The variety of test messages ensures robust detection.

94-125: Synchronous retry test is thorough

Timing checks and attempt counters confirm proper exponential backoff.

127-158: Asynchronous retry test follows best practices

Nice job confirming that backoff timing works as intended in async context.

160-178: Max retries test is clearly implemented

Successfully validates that the function fails once retries are exceeded.

180-196: Main function coordinates all tests neatly

Well-structured approach to running these tests sequentially.

cognee/infrastructure/llm/openai/adapter.py (6)

13-18: Added rate limiting and retry imports

All referenced decorators are appropriately imported and used—good work.

58-59: Async rate limiting decorators

Applying @sleep_and_retry_async before @rate_limit_async properly combines exponential backoff with rate enforcement.

86-87: Synchronous rate limiting decorators

Good layering of @sleep_and_retry_sync after @observe and before @rate_limit_sync.

113-113: Applying rate_limit_sync

Enables consistent rate limiting for create_transcript.

134-135: Applying rate_limit_sync for image transcription

Matches the strategy used for other API calls, ensuring consistency.

147-147: Consistent apostrophe usage

Minor textual improvement maintains uniform style.

cognee/infrastructure/llm/rate_limiter.py (1)

1-710: Comprehensive rate limiter implementation

The synchronous/asynchronous decorators, embedding rate limiter, and backoff logic are well-structured and robust. Very thorough implementation.

🧰 Tools

🪛 Ruff (0.8.2)

56-56: openai imported but unused

Remove unused import: openai

(F401)

cognee/infrastructure/llm/gemini/adapter.py

cognee/infrastructure/llm/rate_limiter.py

cognee/infrastructure/llm/openai/adapter.py

cognee/tests/unit/infrastructure/test_embedding_rate_limiting.py

cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py

…g_rate_limiting

Vasilije1990 added 2 commits April 7, 2025 21:29

Rate limiting with tests

9350e1f

Rate limiting with tests

80cbe55

Vasilije1990 added the run-checks label Apr 7, 2025

Vasilije1990 added 5 commits April 7, 2025 22:54

Ruff format

944118e

Ruff format

1e7f8b1

Ruff format

4502bd4

Ruff format

581a963

Ruff format

bc36aee

Vasilije1990 requested review from borisarzentar and dexters1 April 8, 2025 07:47

Vasilije1990 marked this pull request as ready for review April 8, 2025 07:47

coderabbitai bot reviewed Apr 8, 2025

View reviewed changes

Merge branch 'dev' into adding_rate_limiting

113a2aa

borisarzentar reviewed Apr 11, 2025

View reviewed changes

Vasilije1990 and others added 10 commits April 14, 2025 16:45

Fix rate limiting

1825171

Merge branch 'dev' into adding_rate_limiting

48b8e0e

Fix rate limiting

bf498e1

Merge remote-tracking branch 'origin/adding_rate_limiting' into addin…

b38160f

…g_rate_limiting

Fix poetry lock

082052b

Fix poetry lock

982c291

Fix tests

4dbeeea

Fix tests

3722515

Fix tests

5148184

Fix tests

9c46a66

borisarzentar approved these changes Apr 15, 2025

View reviewed changes

Vasilije1990 merged commit 4e9ca94 into dev Apr 16, 2025
45 of 48 checks passed

Vasilije1990 deleted the adding_rate_limiting branch April 16, 2025 10:03

coderabbitai bot mentioned this pull request Dec 5, 2025

fix: Resolve issue with BAML rate limit handling #1813

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Adding rate limiting #709

feat: Adding rate limiting #709

Uh oh!

Vasilije1990 commented Apr 7, 2025

Uh oh!

pull-checklist bot commented Apr 7, 2025 •

edited by Vasilije1990

Loading

Uh oh!

coderabbitai bot commented Apr 7, 2025 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Adding rate limiting #709

feat: Adding rate limiting #709

Uh oh!

Conversation

Vasilije1990 commented Apr 7, 2025

Description

DCO Affirmation

Uh oh!

pull-checklist bot commented Apr 7, 2025 • edited by Vasilije1990 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Please make sure all the checkboxes are checked:

Uh oh!

coderabbitai bot commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pull-checklist bot commented Apr 7, 2025 •

edited by Vasilije1990

Loading

coderabbitai bot commented Apr 7, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)