-
Notifications
You must be signed in to change notification settings - Fork 954
feat: Adding rate limiting #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please make sure all the checkboxes are checked:
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes add new asynchronous and synchronous decorators for rate limiting and retry logic across multiple components, including embedding engines and API adapters. Manual retry mechanisms have been replaced with standardized decorators. Additionally, new configuration parameters and a dedicated rate limiter module have been introduced, alongside comprehensive tests validating these enhancements. An obsolete import test was removed. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant AM as Adapter Method
participant RL as Rate Limit Decorator
participant SR as Sleep & Retry Decorator
participant API as External API
U->>AM: Invoke method
AM->>RL: Check rate limit
RL->>SR: Forward call with retry logic
SR->>API: Execute API call
API-->>SR: Return response or error
alt Rate limit error
SR->>SR: Wait and retry
SR->>API: Retry API call
else Success
SR-->>AM: Return successful response
end
AM-->>U: Return final result
Possibly related PRs
Suggested reviewers
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (11)
cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py (1)
51-51: Remove the unused retry_count variable.This variable is now redundant since the retry functionality has been moved to the decorator.
- self.retry_count = 0cognee/infrastructure/llm/gemini/adapter.py (1)
10-14: Remove unused import.The
rate_limit_syncimport is not used in this file and should be removed.from cognee.infrastructure.llm.rate_limiter import ( rate_limit_async, - rate_limit_sync, sleep_and_retry_async, )🧰 Tools
🪛 Ruff (0.8.2)
12-12:
cognee.infrastructure.llm.rate_limiter.rate_limit_syncimported but unusedRemove unused import:
cognee.infrastructure.llm.rate_limiter.rate_limit_sync(F401)
cognee/tests/unit/infrastructure/test_embedding_rate_limiting.py (2)
4-4: Remove unused imports.The imports
lru_cacheandLLMConfigare not used in this file.import os import time import asyncio -from functools import lru_cache import logging -from cognee.infrastructure.llm.config import LLMConfig, get_llm_config +from cognee.infrastructure.llm.config import get_llm_config from cognee.infrastructure.llm.rate_limiter import EmbeddingRateLimiterAlso applies to: 7-7
🧰 Tools
🪛 Ruff (0.8.2)
4-4:
functools.lru_cacheimported but unusedRemove unused import:
functools.lru_cache(F401)
83-84: Consider making this file a proper pytest test.This file appears to be in a pytest directory but isn't structured as a standard pytest test. Consider:
- Renaming the test function to have a
test_prefix- Removing the
__main__block, as pytest discovers and runs tests automatically- Using pytest fixtures for setup and teardown of environment variables
This would make the test more consistent with pytest conventions and easier to integrate with CI/CD pipelines.
-if __name__ == "__main__": - asyncio.run(test_embedding_rate_limiting())cognee/infrastructure/llm/config.py (1)
19-24: Consider Adding Validations for Rate Limit Fields
Defining default values for rate limit configuration is good. Consider adding checks to ensure non-negative and nonzero integers whenllm_rate_limit_enabledorembedding_rate_limit_enabledis True, to prevent misconfiguration.cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
75-79: Combine NestedwithStatements
Static analysis (SIM117) suggests merging the nestedwithstatements into one, reducing indentation and improving readability.Below is a proposed refactor for lines 75-79:
- async with aiohttp.ClientSession() as session: - async with session.post( - self.endpoint, json=payload, headers=headers, timeout=60.0 - ) as response: - data = await response.json() - return data["embedding"] + async with aiohttp.ClientSession() as session, session.post( + self.endpoint, json=payload, headers=headers, timeout=60.0 + ) as response: + data = await response.json() + return data["embedding"🧰 Tools
🪛 Ruff (0.8.2)
75-78: Use a single
withstatement with multiple contexts instead of nestedwithstatementsCombine
withstatements(SIM117)
cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py (1)
1-9: Import section can be optimized.The
timeandfunctools.lru_cachemodules are imported but never used directly in the code.import asyncio -import time import os -from functools import lru_cache from unittest.mock import patch from cognee.shared.logging_utils import get_logger from cognee.infrastructure.llm.rate_limiter import LLMRateLimiter from cognee.infrastructure.llm.config import get_llm_config🧰 Tools
🪛 Ruff (0.8.2)
2-2:
timeimported but unusedRemove unused import:
time(F401)
4-4:
functools.lru_cacheimported but unusedRemove unused import:
functools.lru_cache(F401)
cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py (2)
1-8: Import section contains unused imports.The
functools.lru_cacheandLLMConfigimports are not used directly in the code.import os import time import asyncio -from functools import lru_cache import logging -from cognee.infrastructure.llm.config import LLMConfig, get_llm_config +from cognee.infrastructure.llm.config import get_llm_config from cognee.infrastructure.llm.rate_limiter import EmbeddingRateLimiter🧰 Tools
🪛 Ruff (0.8.2)
4-4:
functools.lru_cacheimported but unusedRemove unused import:
functools.lru_cache(F401)
7-7:
cognee.infrastructure.llm.config.LLMConfigimported but unusedRemove unused import:
cognee.infrastructure.llm.config.LLMConfig(F401)
13-15: Using standard logging instead of the application's logger pattern.While this works, it's inconsistent with the logging approach used in other test files (like
test_rate_limiting_realistic.py), which useget_logger().-# Configure logging -logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(__name__) +from cognee.shared.logging_utils import get_logger + +# Get logger +logger = get_logger()cognee/tests/unit/infrastructure/test_rate_limiting_retry.py (1)
3-5: Remove unused importsRuff flagged these imports as unused. Please remove them to keep the test file clean.
- import os - from unittest.mock import patch, MagicMock - from functools import lru_cache🧰 Tools
🪛 Ruff (0.8.2)
3-3:
osimported but unusedRemove unused import:
os(F401)
4-4:
unittest.mock.patchimported but unusedRemove unused import
(F401)
4-4:
unittest.mock.MagicMockimported but unusedRemove unused import
(F401)
5-5:
functools.lru_cacheimported but unusedRemove unused import:
functools.lru_cache(F401)
cognee/infrastructure/llm/rate_limiter.py (1)
56-56: Remove unused importRuff flagged
openaias imported but never used. Please remove it to keep the file clean.- import openai🧰 Tools
🪛 Ruff (0.8.2)
56-56:
openaiimported but unusedRemove unused import:
openai(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
poetry.lockis excluded by!**/*.lock,!**/*.lockpyproject.tomlis excluded by!**/*.toml
📒 Files selected for processing (15)
cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py(2 hunks)cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py(4 hunks)cognee/infrastructure/llm/anthropic/adapter.py(2 hunks)cognee/infrastructure/llm/config.py(2 hunks)cognee/infrastructure/llm/gemini/adapter.py(2 hunks)cognee/infrastructure/llm/generic_llm_api/adapter.py(2 hunks)cognee/infrastructure/llm/ollama/adapter.py(5 hunks)cognee/infrastructure/llm/openai/adapter.py(6 hunks)cognee/infrastructure/llm/rate_limiter.py(1 hunks)cognee/tests/unit/infrastructure/test_embedding_rate_limiting.py(1 hunks)cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py(1 hunks)cognee/tests/unit/infrastructure/test_rate_limiting_example.py(1 hunks)cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py(1 hunks)cognee/tests/unit/infrastructure/test_rate_limiting_retry.py(1 hunks)tests/import_test.py(0 hunks)
💤 Files with no reviewable changes (1)
- tests/import_test.py
🧰 Additional context used
🧬 Code Definitions (11)
cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py (3)
cognee/shared/logging_utils.py (1)
get_logger(137-158)cognee/infrastructure/llm/rate_limiter.py (3)
LLMRateLimiter(86-191)hit_limit(133-151)hit_limit(420-445)cognee/infrastructure/llm/config.py (1)
get_llm_config(104-105)
cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py (1)
cognee/infrastructure/llm/rate_limiter.py (2)
embedding_rate_limit_async(531-565)embedding_sleep_and_retry_async(640-709)
cognee/infrastructure/llm/anthropic/adapter.py (1)
cognee/infrastructure/llm/rate_limiter.py (2)
rate_limit_async(220-243)sleep_and_retry_async(331-376)
cognee/infrastructure/llm/gemini/adapter.py (1)
cognee/infrastructure/llm/rate_limiter.py (3)
rate_limit_async(220-243)rate_limit_sync(194-217)sleep_and_retry_async(331-376)
cognee/infrastructure/llm/generic_llm_api/adapter.py (1)
cognee/infrastructure/llm/rate_limiter.py (2)
rate_limit_async(220-243)sleep_and_retry_async(331-376)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
cognee/infrastructure/llm/rate_limiter.py (2)
embedding_rate_limit_async(531-565)embedding_sleep_and_retry_async(640-709)
cognee/tests/unit/infrastructure/test_rate_limiting_retry.py (1)
cognee/infrastructure/llm/rate_limiter.py (3)
sleep_and_retry_sync(283-328)sleep_and_retry_async(331-376)is_rate_limit_error(246-257)
cognee/tests/unit/infrastructure/test_rate_limiting_example.py (4)
cognee/shared/logging_utils.py (1)
get_logger(137-158)cognee/modules/search/types/SearchType.py (1)
SearchType(4-13)cognee/infrastructure/llm/rate_limiter.py (3)
LLMRateLimiter(86-191)hit_limit(133-151)hit_limit(420-445)cognee/infrastructure/llm/config.py (1)
get_llm_config(104-105)
cognee/infrastructure/llm/openai/adapter.py (1)
cognee/infrastructure/llm/rate_limiter.py (4)
rate_limit_async(220-243)rate_limit_sync(194-217)sleep_and_retry_async(331-376)sleep_and_retry_sync(283-328)
cognee/infrastructure/llm/rate_limiter.py (3)
cognee/shared/logging_utils.py (2)
get_logger(137-158)error(127-128)cognee/infrastructure/llm/config.py (1)
get_llm_config(104-105)cognee/infrastructure/databases/exceptions/EmbeddingException.py (1)
EmbeddingException(5-14)
cognee/infrastructure/llm/ollama/adapter.py (1)
cognee/infrastructure/llm/rate_limiter.py (3)
rate_limit_async(220-243)rate_limit_sync(194-217)sleep_and_retry_async(331-376)
🪛 Ruff (0.8.2)
cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py
4-4: functools.lru_cache imported but unused
Remove unused import: functools.lru_cache
(F401)
7-7: cognee.infrastructure.llm.config.LLMConfig imported but unused
Remove unused import: cognee.infrastructure.llm.config.LLMConfig
(F401)
cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py
2-2: time imported but unused
Remove unused import: time
(F401)
4-4: functools.lru_cache imported but unused
Remove unused import: functools.lru_cache
(F401)
cognee/tests/unit/infrastructure/test_embedding_rate_limiting.py
4-4: functools.lru_cache imported but unused
Remove unused import: functools.lru_cache
(F401)
7-7: cognee.infrastructure.llm.config.LLMConfig imported but unused
Remove unused import: cognee.infrastructure.llm.config.LLMConfig
(F401)
cognee/infrastructure/llm/gemini/adapter.py
12-12: cognee.infrastructure.llm.rate_limiter.rate_limit_sync imported but unused
Remove unused import: cognee.infrastructure.llm.rate_limiter.rate_limit_sync
(F401)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py
75-78: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
cognee/tests/unit/infrastructure/test_rate_limiting_retry.py
3-3: os imported but unused
Remove unused import: os
(F401)
4-4: unittest.mock.patch imported but unused
Remove unused import
(F401)
4-4: unittest.mock.MagicMock imported but unused
Remove unused import
(F401)
5-5: functools.lru_cache imported but unused
Remove unused import: functools.lru_cache
(F401)
cognee/infrastructure/llm/rate_limiter.py
56-56: openai imported but unused
Remove unused import: openai
(F401)
🔇 Additional comments (51)
cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py (2)
13-16: Good addition of rate limiting imports.The new imports correctly bring in the necessary decorators for handling rate limiting and retry logic, which will help standardize this functionality across the codebase.
58-60: Excellent refactoring to use decorators for rate limiting and retries.The use of decorators is a significant improvement over manual retry and rate limit handling. This approach:
- Centralizes the rate limiting and retry logic
- Makes the code more maintainable and consistent across the codebase
- Keeps the core embedding functionality clean and focused
cognee/infrastructure/llm/anthropic/adapter.py (2)
8-8: Good addition of rate limiting imports.The import correctly brings in the necessary decorators for handling rate limiting and retry logic.
27-29: Well-implemented decorators for rate limiting.The application of these decorators follows the same pattern as in other adapters, providing a consistent approach to rate limiting across different API providers.
cognee/infrastructure/llm/gemini/adapter.py (1)
45-47: Well-implemented decorators for rate limiting.The consistent application of rate limiting decorators across different adapters helps standardize the API call behavior. Note that the decorators are correctly placed below the monitoring decorator.
cognee/tests/unit/infrastructure/test_embedding_rate_limiting.py (1)
18-75: Well-structured rate limiting test.The test effectively validates that the rate limiting functionality works as expected, with appropriate logging and assertions. The approach of setting environment variables, running tests, and then cleaning up is robust.
cognee/infrastructure/llm/generic_llm_api/adapter.py (2)
9-9: Imports for Decorators Look Good
The newly introduced imports for async rate limiting and retry logic appear correct and consistent with the rest of the file.
31-32: Chaining Decorators in Correct Order
Decorating with@rate_limit_asyncunder@sleep_and_retry_async()ensures that rate limits are checked first, and retry attempts occur upon hitting rate limit errors.cognee/infrastructure/llm/config.py (1)
94-99: Inclusion of Rate Limit Fields in Serialization
Thank you for adding the newly introduced fields to theto_dictmethod. This ensures external consumers have access to rate limit settings.cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (3)
12-15: New Imports for Embedding Rate Limiting
The imports forembedding_rate_limit_asyncandembedding_sleep_and_retry_asyncare used consistently to handle rate limits and retries for embedding generation.
50-50: Applying Rate Limiting to Embedding Method
Decoratingembed_textwith@embedding_rate_limit_asynchelps manage concurrent requests, ensuring embedding API calls respect set limits.
61-61: Retry on Rate Limit Errors
Wrapping_get_embeddingwith@embedding_sleep_and_retry_async()cleanly centralizes the retry logic on rate limit exceptions.cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py (8)
11-15: Well-documented test function with clear purpose.The docstring clearly explains the test's purpose of demonstrating how rate limiting works with a smaller rate limit.
18-26: Good test setup with configuration and cache clearing.The test properly sets up a controlled environment by:
- Setting environment variables for rate limiting configuration
- Clearing the configuration cache and limiter instance
- Creating fresh instances for testing
This ensures the test runs with deterministic behavior.
33-61: Clever use of monkey patching for deterministic testing.The mock implementation controls the
hit_limitmethod behavior with predefined success/failure sequences, which makes the test deterministic and reliable. The pattern of responses clearly simulates the rate limiting window moving after wait periods.
66-84: Well-structured first batch test.The test properly captures both successful and rate-limited requests in separate arrays, provides clear logging, and calculates meaningful metrics.
85-104: Good simulation of partial rate limit recovery.The test properly implements waiting for a partial window reset and tests that only a portion of the capacity has regenerated.
106-125: Thorough testing of full capacity regeneration.The test correctly verifies that after a full window wait period, all capacity has been restored and all requests succeed.
135-149: Comprehensive verification with clear success/failure messaging.The test includes explicit pass/fail messages that clearly communicate the expected behavior and actual results for each test batch.
154-166: Well-structured async main function and proper event loop management.The implementation follows best practices for handling async operations, including properly shutting down async generators.
cognee/tests/unit/infrastructure/test_rate_limiting_example.py (4)
1-8: Good import structure with relevant modules.The imports include all necessary modules for testing rate limiting functionality and interacting with the cognee system.
10-69: Well-implemented rate limiting test.The test function:
- Properly configures rate limiting via environment variables
- Creates a fresh limiter instance
- Makes more requests (70) than the limit (60) to verify rate limiting behavior
- Measures elapsed time for accurate rate calculation
- Includes comprehensive result verification with tolerance (58-62 requests)
This tests both the functional correctness and the timing aspects of rate limiting.
72-109: Comprehensive end-to-end example integration.The
main()function provides a complete workflow that:
- Resets the system to a clean state
- Adds sample text
- Processes it through cognify
- Searches for insights
- Runs the rate limiting test
This is valuable for demonstrating how rate limiting integrates with the broader system.
124-131: Proper async event loop setup and cleanup.The code follows best practices for managing the asyncio event loop, including proper cleanup of async generators.
cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py (8)
18-32: Comprehensive test setup with appropriate environment configuration.The test properly sets environment variables to enable rate limiting, configure the limits, enable mock embeddings, and disable automatic retries.
33-42: Proper cache clearing and instance verification.The test correctly clears configuration caches and resets the rate limiter instance to ensure a clean test environment. It also logs the configuration to verify settings.
52-79: Well-structured first batch test with clear tracking.The first batch test properly:
- Sends a defined number of requests
- Tracks success and failure counts
- Times the batch execution
- Provides detailed logging
This approach effectively tests the rate limiting functionality.
80-111: Good implementation of partial capacity recovery test.The test correctly implements waiting for partial capacity recovery and then testing a second batch of requests to verify that only some of the capacity has regenerated.
113-144: Thorough testing of full capacity regeneration.The test correctly verifies that after a full wait period, rate limiting capacity has been fully restored by sending another batch of requests and tracking the results.
146-154: Good test validation with clear assertions.The test properly validates that both successful and rate-limited requests occurred, confirming that the rate limiter is working as expected.
156-161: Proper cleanup of environment variables.The test cleans up after itself by removing all the environment variables it set, which is good practice for test isolation.
164-165: Simple and effective main block.The code uses
asyncio.run()rather than manually managing the event loop, which is a cleaner approach for simpler scripts.cognee/infrastructure/llm/ollama/adapter.py (5)
6-10: Good addition of rate limiting imports.The necessary rate limiter decorators are properly imported for both synchronous and asynchronous methods.
30-32: Appropriate rate limiting for asynchronous API calls.The
acreate_structured_outputmethod is correctly decorated with both:
@sleep_and_retry_async()- to automatically retry when rate limits are hit@rate_limit_async- to apply asynchronous rate limitingThis ensures the method respects rate limits while providing automatic retry functionality.
55-55: Added rate limiting for synchronous transcript creation.The
create_transcriptmethod is correctly decorated with@rate_limit_syncto apply synchronous rate limiting for this method.
75-75: Added rate limiting for synchronous image transcription.The
transcribe_imagemethod is correctly decorated with@rate_limit_syncto apply synchronous rate limiting for this method.
91-91: Fixed typographical issue in user prompt.Changed the single quote to an apostrophe in "What's in this image?" for consistency.
cognee/tests/unit/infrastructure/test_rate_limiting_retry.py (7)
16-31: Synchronous retry test function looks solidThis approach of incrementing a counter and raising rate limit errors to trigger retries is appropriate.
34-50: Asynchronous retry test function is well-writtenGood job simulating rate limit errors and testing asynchronous retry logic.
52-92: Rate limit error detection coverage is comprehensiveThe variety of test messages ensures robust detection.
94-125: Synchronous retry test is thoroughTiming checks and attempt counters confirm proper exponential backoff.
127-158: Asynchronous retry test follows best practicesNice job confirming that backoff timing works as intended in async context.
160-178: Max retries test is clearly implementedSuccessfully validates that the function fails once retries are exceeded.
180-196: Main function coordinates all tests neatlyWell-structured approach to running these tests sequentially.
cognee/infrastructure/llm/openai/adapter.py (6)
13-18: Added rate limiting and retry importsAll referenced decorators are appropriately imported and used—good work.
58-59: Async rate limiting decoratorsApplying
@sleep_and_retry_asyncbefore@rate_limit_asyncproperly combines exponential backoff with rate enforcement.
86-87: Synchronous rate limiting decoratorsGood layering of
@sleep_and_retry_syncafter@observeand before@rate_limit_sync.
113-113: Applying rate_limit_syncEnables consistent rate limiting for create_transcript.
134-135: Applying rate_limit_sync for image transcriptionMatches the strategy used for other API calls, ensuring consistency.
147-147: Consistent apostrophe usageMinor textual improvement maintains uniform style.
cognee/infrastructure/llm/rate_limiter.py (1)
1-710: Comprehensive rate limiter implementationThe synchronous/asynchronous decorators, embedding rate limiter, and backoff logic are well-structured and robust. Very thorough implementation.
🧰 Tools
🪛 Ruff (0.8.2)
56-56:
openaiimported but unusedRemove unused import:
openai(F401)
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.