-
Notifications
You must be signed in to change notification settings - Fork 966
Feat/cog 1491 pipeline steps in eval #641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request removes unused import statements from two corpus builder modules and updates the task retrieval functionality. The Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant C as EvalConfig
participant T as TaskGetters
participant G as Task Getter Functions
U->>C: Read configuration (task_getter_type option)
C->>T: Determine task getter based on type
alt Default
T->>G: get_default_tasks_by_indices(...)
else NoSummaries
T->>G: get_no_summary_tasks(...)
else JustChunks
T->>G: get_just_chunks_tasks(...)
end
G-->>T: Return tasks
T-->>C: Return tasks
C-->>U: Provide retrieved tasks
Suggested labels
Suggested reviewers
Poem
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py (3)
7-19: Improve type annotations for the chunker parameterThe function effectively filters tasks by indices with proper error handling for out-of-range indices. However, the type annotation for the
chunkerparameter could be improved.- async def get_default_tasks_by_indices( - indices: List[int], chunk_size: int = None, chunker=TextChunker - ) -> List[Task]: + from typing import Type + + async def get_default_tasks_by_indices( + indices: List[int], chunk_size: Optional[int] = None, chunker: Type[TextChunker] = TextChunker + ) -> List[Task]:Also consider handling the case where an empty list of indices is provided, perhaps returning an empty list with a warning log.
21-27: Consider expanding docstring with parameter detailsThe function works correctly to filter out summarization tasks. The comment explaining the indices is helpful, but consider expanding the docstring to include parameter descriptions.
async def get_no_summary_tasks(chunk_size: int = None, chunker=TextChunker) -> List[Task]: - """Returns default tasks without summarization tasks.""" + """Returns default tasks without summarization tasks. + + Args: + chunk_size: Optional size for text chunks + chunker: Chunker class to use for text chunking + + Returns: + List of tasks excluding summarization tasks + """
29-33: Avoid duplicating index documentationThe function correctly implements the task filtering logic, but the comment about indices is duplicated from the previous function.
Consider centralizing this information in a module-level docstring or constants to avoid duplication and improve maintainability:
+ # Task indices mapping + TASK_INDICES = { + "classify": 0, + "check_permissions": 1, + "extract_chunks": 2, + "extract_graph": 3, + "summarize": 4, + "add_data_points": 5 + } + async def get_just_chunks_tasks(chunk_size: int = None, chunker=TextChunker) -> List[Task]: """Returns default tasks with only chunk extraction and data points addition.""" - # Default tasks indices: 0=classify, 1=check_permissions, 2=extract_chunks, 3=extract_graph, 4=summarize, 5=add_data_points + # Only include classify, permissions, chunk extraction and data points return await get_default_tasks_by_indices([0, 1, 2, 5], chunk_size=chunk_size, chunker=chunker)This would make the code more maintainable if task indices change in the future.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
cognee/eval_framework/corpus_builder/corpus_builder_executor.py(0 hunks)cognee/eval_framework/corpus_builder/run_corpus_builder.py(0 hunks)cognee/eval_framework/corpus_builder/task_getters/TaskGetters.py(1 hunks)cognee/eval_framework/corpus_builder/task_getters/default_task_getter.py(0 hunks)cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py(1 hunks)cognee/eval_framework/eval_config.py(1 hunks)
💤 Files with no reviewable changes (3)
- cognee/eval_framework/corpus_builder/run_corpus_builder.py
- cognee/eval_framework/corpus_builder/task_getters/default_task_getter.py
- cognee/eval_framework/corpus_builder/corpus_builder_executor.py
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: lint (ubuntu-latest, 3.11.x)
- GitHub Check: lint (ubuntu-latest, 3.10.x)
- GitHub Check: Build Cognee Backend Docker App Image
- GitHub Check: docker-compose-test
🔇 Additional comments (4)
cognee/eval_framework/eval_config.py (1)
11-13: Clean formatting and well-documented optionsThe updated formatting for
task_getter_typeimproves readability and the comment clearly indicates all the available options. This change aligns well with the new task getter types introduced in the PR.cognee/eval_framework/corpus_builder/task_getters/TaskGetters.py (2)
8-11: Imports correctly added for new task getter functionsThe import statement has been properly updated to include the new task getter functions from the newly created module.
19-20: New enum members follow consistent patternThe additions of
NO_SUMMARIESandJUST_CHUNKSenum members maintain the same structure as existing members, keeping the codebase consistent.cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py (1)
1-5: Imports look goodAll necessary imports are included for the functionality being implemented.
| DEFAULT = ("Default", get_default_tasks) | ||
| CASCADE_GRAPH = ("CascadeGraph", get_cascade_graph_tasks) | ||
| NO_SUMMARIES = ("NoSummaries", get_no_summary_tasks) | ||
| JUST_CHUNKS = ("JustChunks", get_just_chunks_tasks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe OnlyChunks, its just naming so not that important
| """Returns default tasks without summarization tasks.""" | ||
| # Default tasks indices: 0=classify, 1=check_permissions, 2=extract_chunks, 3=extract_graph, 4=summarize, 5=add_data_points | ||
| return await get_default_tasks_by_indices( | ||
| [0, 1, 2, 3, 5], chunk_size=chunk_size, chunker=chunker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I like the indices because whenever we change the default pipeline, we will have to update the index mapping. Fine for now but I think we will have to find a bit nicer solution
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin
Summary by CodeRabbit