-
Notifications
You must be signed in to change notification settings - Fork 967
Added basic profiling #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@Vasilije1990 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 40 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughA new GitHub Actions workflow file named Changes
Sequence Diagram(s)sequenceDiagram
participant GitHub as GitHub Actions
participant Repo as Repository
participant Python as Python Environment
participant Profiler as Profiler Script
participant Comparison as Comparison Script
GitHub->>Repo: Checkout Code
GitHub->>Python: Set Up Python 3.10
GitHub->>Python: Install Poetry
GitHub->>Python: Install Dependencies
GitHub->>Python: Set Environment Variables
GitHub->>Profiler: Run Profiler on Base Branch
Profiler->>Profiler: Generate Profiling Results
GitHub->>Profiler: Run Profiler on Head Branch
Profiler->>Profiler: Generate Profiling Results
GitHub->>Comparison: Compare Profiling Results
Comparison->>GitHub: Post Results to Pull Request
Possibly related PRs
Suggested labels
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (2)
.github/workflows/profiling.yaml (2)
29-33: Optimize dependency installationThe current setup installs all extras and then adds additional packages separately.
Consider combining the installations:
- poetry install --no-interaction --all-extras - poetry add scalene requests + poetry install --no-interaction --all-extras + poetry run pip install scalene requestsThis avoids modifying the project's dependencies while still making the tools available for CI.
70-101: Enhance profiling results analysis and presentationThe current implementation provides basic metrics but could be more informative.
Consider enhancing the comparison script:
python -c ' import json + from typing import Dict, Any + + def format_percentage(old: float, new: float) -> str: + if old == 0: + return "N/A" + change = ((new - old) / old) * 100 + return f"{change:+.2f}%" + try: with open("base_results.json") as f: base = json.load(f) with open("head_results.json") as f: head = json.load(f) - cpu_diff = head.get("total_cpu_samples_python", 0) - base.get("total_cpu_samples_python", 0) - memory_diff = head.get("malloc_samples", 0) - base.get("malloc_samples", 0) + + metrics = { + "CPU Usage (samples)": { + "base": base.get("total_cpu_samples_python", 0), + "head": head.get("total_cpu_samples_python", 0) + }, + "Memory Usage (bytes)": { + "base": base.get("malloc_samples", 0), + "head": head.get("malloc_samples", 0) + }, + "Peak Memory (MB)": { + "base": base.get("peak_memory_mb", 0), + "head": head.get("peak_memory_mb", 0) + } + } + with open("profiling_diff.txt", "w") as f: - f.write(f"CPU Usage Difference: {cpu_diff}\n") - f.write(f"Memory Usage Difference: {memory_diff} bytes\n") + f.write("### Performance Impact Summary\n\n") + f.write("| Metric | Base | Head | Change |\n") + f.write("|--------|------|------|--------|\n") + for metric, values in metrics.items(): + base_val = values["base"] + head_val = values["head"] + change = format_percentage(base_val, head_val) + f.write(f"| {metric} | {base_val} | {head_val} | {change} |\n") + + if any(values["head"] > values["base"] * 1.1 for values in metrics.values()): + f.write("\n⚠️ **Warning**: Significant performance regression detected!\n") except Exception as e: with open("profiling_diff.txt", "w") as f: f.write(f"Error comparing profiling results: {e}\n") 'This enhancement:
- Adds more performance metrics
- Shows relative changes as percentages
- Provides warnings for significant regressions
- Uses a more readable table format
🧰 Tools
🪛 actionlint (1.7.4)
91-91: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
.github/workflows/profiling.yaml(1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/profiling.yaml
12-12: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
18-18: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
37-37: shellcheck reported issue in this script: SC2086:info:1:62: Double quote to prevent globbing and word splitting
(shellcheck)
37-37: shellcheck reported issue in this script: SC2086:info:2:62: Double quote to prevent globbing and word splitting
(shellcheck)
45-45: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
59-59: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
91-91: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
.github/workflows/profiling.yaml
Outdated
| steps: | ||
| # Checkout the code from the repository with full history | ||
| - name: Checkout code | ||
| uses: actions/checkout@v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update GitHub Actions to latest versions
Several actions are using outdated versions which may have security or compatibility issues:
- actions/checkout@v3 → v4
- actions/setup-python@v4 → v5
- actions/github-script@v6 → v7
- uses: actions/checkout@v3
+ uses: actions/checkout@v4
- uses: actions/setup-python@v4
+ uses: actions/setup-python@v5
- uses: actions/github-script@v6
+ uses: actions/github-script@v7Also applies to: 18-18, 91-91
🧰 Tools
🪛 actionlint (1.7.4)
12-12: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to actions/checkout@v4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@borisarzentar, to update the GitHub Actions to their latest versions, you can modify the workflow file as follows:
- Change
uses: actions/checkout@v3touses: actions/checkout@v4 - Change
uses: actions/setup-python@v4touses: actions/setup-python@v5 - Change
uses: actions/github-script@v6touses: actions/github-script@v7
This will ensure your workflow uses the latest versions of these actions.
| - name: Run profiler on head branch | ||
| env: | ||
| HEAD_SHA: ${{ env.HEAD_SHA }} | ||
| run: | | ||
| echo "Profiling the head branch for code_graph_pipeline.py" | ||
| echo "Checking out head SHA: $HEAD_SHA" | ||
| git checkout $HEAD_SHA | ||
| echo "This is the working directory: $PWD" | ||
| # Ensure the script is executable | ||
| chmod +x cognee/api/v1/cognify/code_graph_pipeline.py | ||
| # Run Scalene | ||
| poetry run scalene --json --outfile head_results.json cognee/api/v1/cognify/code_graph_pipeline.py | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Reduce code duplication in profiling steps
The head branch profiling code is duplicated from the base branch profiling.
Consider creating a reusable composite action:
# .github/actions/run-profiling/action.yml
name: 'Run Profiling'
inputs:
sha:
required: true
output:
required: true
runs:
using: "composite"
steps:
- run: |
FILE_PATH="cognee/api/v1/cognify/code_graph_pipeline.py"
echo "Profiling for $FILE_PATH"
echo "Checking out SHA: ${{ inputs.sha }}"
git checkout "${{ inputs.sha }}"
if [ ! -f "$FILE_PATH" ]; then
echo "Error: $FILE_PATH not found"
exit 1
fi
chmod +x "$FILE_PATH"
poetry run scalene --json --outfile "${{ inputs.output }}" "$FILE_PATH"
shell: bashThen use it in the workflow:
- - name: Run profiler on base branch
+ - name: Run profiler on base
+ uses: ./.github/actions/run-profiling
+ with:
+ sha: ${{ env.BASE_SHA }}
+ output: base_results.json
- - name: Run profiler on head branch
+ - name: Run profiler on head
+ uses: ./.github/actions/run-profiling
+ with:
+ sha: ${{ env.HEAD_SHA }}
+ output: head_results.json🧰 Tools
🪛 actionlint (1.7.4)
59-59: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
| on: | ||
| push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix workflow trigger configuration
The current trigger configuration will fail because:
- The workflow requires pull request context for base/head SHA comparison
- Running on all pushes is inefficient and will fail for non-PR pushes
Update the trigger configuration:
on:
- push
+ pull_request:
+ types: [opened, synchronize]
+ paths:
+ - 'cognee/api/v1/cognify/code_graph_pipeline.py'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| on: | |
| push | |
| on: | |
| pull_request: | |
| types: [opened, synchronize] | |
| paths: | |
| - 'cognee/api/v1/cognify/code_graph_pipeline.py' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vasilije1990 coderabbit has a good point here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| - name: Run profiler on base branch | ||
| env: | ||
| BASE_SHA: ${{ env.BASE_SHA }} | ||
| run: | | ||
| echo "Profiling the base branch for code_graph_pipeline.py" | ||
| echo "Checking out base SHA: $BASE_SHA" | ||
| git checkout $BASE_SHA | ||
| echo "This is the working directory: $PWD" | ||
| # Ensure the script is executable | ||
| chmod +x cognee/api/v1/cognify/code_graph_pipeline.py | ||
| # Run Scalene | ||
| poetry run scalene --json --outfile base_results.json cognee/api/v1/cognify/code_graph_pipeline.py | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve shell script security and error handling
The current implementation has several potential issues:
- Unquoted variables could cause word splitting
- No validation for file existence
- Hardcoded path could break if file structure changes
Improve the script:
run: |
echo "Profiling the base branch for code_graph_pipeline.py"
echo "Checking out base SHA: $BASE_SHA"
- git checkout $BASE_SHA
+ git checkout "$BASE_SHA"
echo "This is the working directory: $PWD"
+
+ FILE_PATH="cognee/api/v1/cognify/code_graph_pipeline.py"
+ if [ ! -f "$FILE_PATH" ]; then
+ echo "Error: $FILE_PATH not found"
+ exit 1
+ fi
+
# Ensure the script is executable
- chmod +x cognee/api/v1/cognify/code_graph_pipeline.py
+ chmod +x "$FILE_PATH"
# Run Scalene
- poetry run scalene --json --outfile base_results.json cognee/api/v1/cognify/code_graph_pipeline.py
+ poetry run scalene --json --outfile base_results.json "$FILE_PATH"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Run profiler on base branch | |
| env: | |
| BASE_SHA: ${{ env.BASE_SHA }} | |
| run: | | |
| echo "Profiling the base branch for code_graph_pipeline.py" | |
| echo "Checking out base SHA: $BASE_SHA" | |
| git checkout $BASE_SHA | |
| echo "This is the working directory: $PWD" | |
| # Ensure the script is executable | |
| chmod +x cognee/api/v1/cognify/code_graph_pipeline.py | |
| # Run Scalene | |
| poetry run scalene --json --outfile base_results.json cognee/api/v1/cognify/code_graph_pipeline.py | |
| - name: Run profiler on base branch | |
| env: | |
| BASE_SHA: ${{ env.BASE_SHA }} | |
| run: | | |
| echo "Profiling the base branch for code_graph_pipeline.py" | |
| echo "Checking out base SHA: $BASE_SHA" | |
| git checkout "$BASE_SHA" | |
| echo "This is the working directory: $PWD" | |
| FILE_PATH="cognee/api/v1/cognify/code_graph_pipeline.py" | |
| if [ ! -f "$FILE_PATH" ]; then | |
| echo "Error: $FILE_PATH not found" | |
| exit 1 | |
| fi | |
| # Ensure the script is executable | |
| chmod +x "$FILE_PATH" | |
| # Run Scalene | |
| poetry run scalene --json --outfile base_results.json "$FILE_PATH" |
🧰 Tools
🪛 actionlint (1.7.4)
45-45: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
.github/workflows/profiling.yaml (2)
32-32: Remove trailing whitespaceThere is a trailing space at the end of line 32.
- poetry add scalene + poetry add scalene🧰 Tools
🪛 yamllint (1.35.1)
[error] 32-32: trailing spaces
(trailing-spaces)
70-101: Improve profiling results comparison and presentationThe current implementation could be enhanced for better maintainability and user experience.
- Move the Python comparison script to a separate file
- Enhance the results presentation with more metrics
- Add percentage changes for better context
- name: Compare profiling results + run: python .github/scripts/compare_profiles.py - run: | - python -c ' - import json - try: - with open("base_results.json") as f: - base = json.load(f) - with open("head_results.json") as f: - head = json.load(f) - cpu_diff = head.get("total_cpu_samples_python", 0) - base.get("total_cpu_samples_python", 0) - memory_diff = head.get("malloc_samples", 0) - base.get("malloc_samples", 0) - with open("profiling_diff.txt", "w") as f: - f.write(f"CPU Usage Difference: {cpu_diff}\n") - f.write(f"Memory Usage Difference: {memory_diff} bytes\n") - except Exception as e: - with open("profiling_diff.txt", "w") as f: - f.write(f"Error comparing profiling results: {e}\n") - 'Would you like me to generate the enhanced comparison script with additional metrics and better error handling?
🧰 Tools
🪛 actionlint (1.7.4)
91-91: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
.github/workflows/profiling.yaml(1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/profiling.yaml
12-12: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
18-18: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
37-37: shellcheck reported issue in this script: SC2086:info:1:62: Double quote to prevent globbing and word splitting
(shellcheck)
37-37: shellcheck reported issue in this script: SC2086:info:2:62: Double quote to prevent globbing and word splitting
(shellcheck)
45-45: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
59-59: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
91-91: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 yamllint (1.35.1)
.github/workflows/profiling.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
🔇 Additional comments (3)
.github/workflows/profiling.yaml (3)
3-4:
Fix workflow trigger configuration
The current trigger configuration will fail because:
- The workflow requires pull request context for base/head SHA comparison
- Running on all pushes is inefficient and will fail for non-PR pushes
Update the trigger configuration:
on:
- push
+ pull_request:
+ types: [opened, synchronize]
+ paths:
+ - 'cognee/api/v1/cognify/code_graph_pipeline.py'12-12:
Update GitHub Actions to latest versions
Several actions are using outdated versions which may have security or compatibility issues:
- uses: actions/checkout@v3
+ uses: actions/checkout@v4
- uses: actions/setup-python@v4
+ uses: actions/setup-python@v5
- uses: actions/github-script@v6
+ uses: actions/github-script@v7Also applies to: 18-18, 91-91
🧰 Tools
🪛 actionlint (1.7.4)
12-12: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
42-68: 🛠️ Refactor suggestion
Reduce code duplication in profiling steps
The head branch profiling code is duplicated from the base branch profiling. Consider creating a reusable composite action as suggested in the previous review.
🧰 Tools
🪛 actionlint (1.7.4)
45-45: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
59-59: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
| - name: Set environment variables | ||
| run: | | ||
| echo "BASE_SHA=${{ github.event.pull_request.base.sha }}" >> $GITHUB_ENV | ||
| echo "HEAD_SHA=${{ github.event.pull_request.head.sha }}" >> $GITHUB_ENV | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve environment variables setup
The current setup lacks validation and has potential shell scripting issues.
- name: Set environment variables
run: |
+ if [[ -z "${{ github.event.pull_request }}" ]]; then
+ echo "Error: No pull request context found"
+ exit 1
+ fi
- echo "BASE_SHA=${{ github.event.pull_request.base.sha }}" >> $GITHUB_ENV
- echo "HEAD_SHA=${{ github.event.pull_request.head.sha }}" >> $GITHUB_ENV
+ echo "BASE_SHA=${{ github.event.pull_request.base.sha }}" >> "$GITHUB_ENV"
+ echo "HEAD_SHA=${{ github.event.pull_request.head.sha }}" >> "$GITHUB_ENV"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Set environment variables | |
| run: | | |
| echo "BASE_SHA=${{ github.event.pull_request.base.sha }}" >> $GITHUB_ENV | |
| echo "HEAD_SHA=${{ github.event.pull_request.head.sha }}" >> $GITHUB_ENV | |
| - name: Set environment variables | |
| run: | | |
| if [[ -z "${{ github.event.pull_request }}" ]]; then | |
| echo "Error: No pull request context found" | |
| exit 1 | |
| fi | |
| echo "BASE_SHA=${{ github.event.pull_request.base.sha }}" >> "$GITHUB_ENV" | |
| echo "HEAD_SHA=${{ github.event.pull_request.head.sha }}" >> "$GITHUB_ENV" | |
🧰 Tools
🪛 actionlint (1.7.4)
37-37: shellcheck reported issue in this script: SC2086:info:1:62: Double quote to prevent globbing and word splitting
(shellcheck)
37-37: shellcheck reported issue in this script: SC2086:info:2:62: Double quote to prevent globbing and word splitting
(shellcheck)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (1)
.github/workflows/profiling.yaml (1)
98-129: Enhance profiling results analysis and visualizationThe current comparison logic could be improved to provide more meaningful insights:
- Add more comprehensive metrics:
python -c ' import json +import statistics + try: with open("base_results.json") as f: base = json.load(f) with open("head_results.json") as f: head = json.load(f) - cpu_diff = head.get("total_cpu_samples_python", 0) - base.get("total_cpu_samples_python", 0) - memory_diff = head.get("malloc_samples", 0) - base.get("malloc_samples", 0) + # Calculate comprehensive metrics + metrics = { + "cpu_time": { + "diff": head["total_cpu_samples_python"] - base["total_cpu_samples_python"], + "percent": ((head["total_cpu_samples_python"] / base["total_cpu_samples_python"]) - 1) * 100 + }, + "memory": { + "diff": head["malloc_samples"] - base["malloc_samples"], + "percent": ((head["malloc_samples"] / base["malloc_samples"]) - 1) * 100 + } + } + + # Define regression thresholds + REGRESSION_THRESHOLD = 10 # 10% degradation threshold + with open("profiling_diff.txt", "w") as f: - f.write(f"CPU Usage Difference: {cpu_diff}\n") - f.write(f"Memory Usage Difference: {memory_diff} bytes\n") + f.write("### Performance Impact Summary\n\n") + for metric, data in metrics.items(): + f.write(f"#### {metric.title()}\n") + f.write(f"- Absolute change: {data['diff']:.2f}\n") + f.write(f"- Relative change: {data['percent']:.1f}%\n") + if abs(data['percent']) > REGRESSION_THRESHOLD: + f.write(f"⚠️ **Performance regression detected!**\n") + f.write("\nNote: Regressions are flagged at >10% degradation\n") except Exception as e: with open("profiling_diff.txt", "w") as f: f.write(f"Error comparing profiling results: {e}\n") '
- Consider adding historical trend analysis by storing results in a database or artifact.
- Add visualization using GitHub-compatible Markdown charts.
🧰 Tools
🪛 actionlint (1.7.4)
119-119: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
.github/workflows/profiling.yaml(1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/profiling.yaml
16-16: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
22-22: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
35-35: shellcheck reported issue in this script: SC2086:info:1:140: Double quote to prevent globbing and word splitting
(shellcheck)
46-46: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
65-65: shellcheck reported issue in this script: SC2086:info:1:62: Double quote to prevent globbing and word splitting
(shellcheck)
65-65: shellcheck reported issue in this script: SC2086:info:2:62: Double quote to prevent globbing and word splitting
(shellcheck)
73-73: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
87-87: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
119-119: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🔇 Additional comments (2)
.github/workflows/profiling.yaml (2)
16-18: Update GitHub Actions to latest versions
Several actions are using outdated versions which may have security or compatibility issues:
- actions/checkout@v3 → v4
- actions/setup-python@v4 → v5
- actions/github-script@v6 → v7
🧰 Tools
🪛 actionlint (1.7.4)
16-16: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
70-96: Reduce code duplication and improve shell script security
The profiling steps contain duplicated code and shell script security issues. Consider creating a reusable composite action and improving shell script security as suggested in previous comments.
🧰 Tools
🪛 actionlint (1.7.4)
73-73: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
87-87: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
.github/workflows/profiling.yaml
Outdated
| on: | ||
| pull_request_target: | ||
| types: | ||
| - opened | ||
| - reopened | ||
| - synchronize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical Security Risk: Unsafe usage of pull_request_target
The current trigger configuration using pull_request_target poses a significant security risk as it:
- Runs with repository secrets and permissions
- Checks out and executes untrusted code from pull requests
- Lacks path filters to limit scope
Implement these security measures:
on:
- pull_request_target:
+ pull_request:
types:
- opened
- reopened
- synchronize
+ paths:
+ - 'cognee/api/v1/cognify/code_graph_pipeline.py'If pull_request_target is absolutely necessary for token permissions:
- Never checkout or run code from the PR
- Use explicit commit SHA references
- Implement additional security validations
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| on: | |
| pull_request_target: | |
| types: | |
| - opened | |
| - reopened | |
| - synchronize | |
| on: | |
| pull_request: | |
| types: | |
| - opened | |
| - reopened | |
| - synchronize | |
| paths: | |
| - 'cognee/api/v1/cognify/code_graph_pipeline.py' |
| - name: Install Poetry | ||
| uses: snok/[email protected] | ||
| with: | ||
| virtualenvs-create: true | ||
| virtualenvs-in-project: true | ||
| installer-parallel: true | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| poetry install --no-interaction --all-extras | ||
| poetry run pip install pyinstrument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve dependency management security and efficiency
The current setup could be more secure and efficient:
- name: Install Poetry
uses: snok/[email protected]
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true
+ version: 1.7.1 # Pin to specific version
- name: Install dependencies
run: |
+ # Verify poetry.lock is in sync
+ poetry lock --check
- poetry install --no-interaction --all-extras
+ # Install only needed dependencies
+ poetry install --no-interaction --only main,dev
poetry run pip install pyinstrument📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Install Poetry | |
| uses: snok/[email protected] | |
| with: | |
| virtualenvs-create: true | |
| virtualenvs-in-project: true | |
| installer-parallel: true | |
| - name: Install dependencies | |
| run: | | |
| poetry install --no-interaction --all-extras | |
| poetry run pip install pyinstrument | |
| - name: Install Poetry | |
| uses: snok/[email protected] | |
| with: | |
| virtualenvs-create: true | |
| virtualenvs-in-project: true | |
| installer-parallel: true | |
| version: 1.7.1 # Pin to specific version | |
| - name: Install dependencies | |
| run: | | |
| # Verify poetry.lock is in sync | |
| poetry lock --check | |
| # Install only needed dependencies | |
| poetry install --no-interaction --only main,dev | |
| poetry run pip install pyinstrument |
| - name: Set is_maintainer variable | ||
| run: | | ||
| echo "is_maintainer=${{ steps.check_permissions.outputs.result == 'admin' || steps.check_permissions.outputs.result == 'maintainer' }}" >> $GITHUB_ENV | ||
|
|
||
| - name: Stop if not a maintainer | ||
| if: env.is_maintainer != 'true' | ||
| run: | | ||
| echo "User ${{ github.event.sender.login }} is not a maintainer. Exiting." | ||
| exit 0 # Use exit 0 to mark the job as successful but stop execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve shell script security and error handling
The maintainer check implementation has potential security issues:
- name: Set is_maintainer variable
run: |
- echo "is_maintainer=${{ steps.check_permissions.outputs.result == 'admin' || steps.check_permissions.outputs.result == 'maintainer' }}" >> $GITHUB_ENV
+ is_maintainer="${{ steps.check_permissions.outputs.result == 'admin' || steps.check_permissions.outputs.result == 'maintainer' }}"
+ echo "is_maintainer=${is_maintainer}" >> "$GITHUB_ENV"
- name: Stop if not a maintainer
if: env.is_maintainer != 'true'
run: |
+ echo "::warning::Profiling skipped - User ${{ github.event.sender.login }} is not a maintainer"
echo "User ${{ github.event.sender.login }} is not a maintainer. Exiting."
exit 0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Set is_maintainer variable | |
| run: | | |
| echo "is_maintainer=${{ steps.check_permissions.outputs.result == 'admin' || steps.check_permissions.outputs.result == 'maintainer' }}" >> $GITHUB_ENV | |
| - name: Stop if not a maintainer | |
| if: env.is_maintainer != 'true' | |
| run: | | |
| echo "User ${{ github.event.sender.login }} is not a maintainer. Exiting." | |
| exit 0 # Use exit 0 to mark the job as successful but stop execution | |
| - name: Set is_maintainer variable | |
| run: | | |
| is_maintainer="${{ steps.check_permissions.outputs.result == 'admin' || steps.check_permissions.outputs.result == 'maintainer' }}" | |
| echo "is_maintainer=${is_maintainer}" >> "$GITHUB_ENV" | |
| - name: Stop if not a maintainer | |
| if: env.is_maintainer != 'true' | |
| run: | | |
| echo "::warning::Profiling skipped - User ${{ github.event.sender.login }} is not a maintainer" | |
| echo "User ${{ github.event.sender.login }} is not a maintainer. Exiting." | |
| exit 0 # Use exit 0 to mark the job as successful but stop execution |
🧰 Tools
🪛 actionlint (1.7.4)
35-35: shellcheck reported issue in this script: SC2086:info:1:140: Double quote to prevent globbing and word splitting
(shellcheck)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
.github/workflows/profiling.yaml(1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/profiling.yaml
12-12: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
18-18: the runner of "actions/github-script@v6" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
31-31: shellcheck reported issue in this script: SC2086:info:1:140: Double quote to prevent globbing and word splitting
(shellcheck)
42-42: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
61-61: shellcheck reported issue in this script: SC2086:info:1:62: Double quote to prevent globbing and word splitting
(shellcheck)
61-61: shellcheck reported issue in this script: SC2086:info:2:62: Double quote to prevent globbing and word splitting
(shellcheck)
69-69: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
83-83: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
🔇 Additional comments (5)
.github/workflows/profiling.yaml (5)
3-4:
Fix workflow trigger configuration
The current trigger configuration will fail because:
- The workflow requires pull request context for base/head SHA comparison
- Running on all pushes is inefficient and will fail for non-PR pushes
on:
- push
+ pull_request:
+ types: [opened, synchronize]
+ paths:
+ - 'cognee/api/v1/cognify/code_graph_pipeline.py'12-12:
Update GitHub Actions to latest versions
Several actions are using outdated versions which may have security or compatibility issues.
- uses: actions/checkout@v3
+ uses: actions/checkout@v4
- uses: actions/setup-python@v4
+ uses: actions/setup-python@v5
- uses: actions/github-script@v6
+ uses: actions/github-script@v7Also applies to: 18-18, 42-42
🧰 Tools
🪛 actionlint (1.7.4)
12-12: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
30-38:
Improve permission check implementation
The current implementation has security concerns and lacks proper error handling.
- name: Set is_maintainer variable
run: |
- echo "is_maintainer=${{ steps.check_permissions.outputs.result == 'admin' || steps.check_permissions.outputs.result == 'maintainer' }}" >> $GITHUB_ENV
+ is_maintainer="${{ steps.check_permissions.outputs.result == 'admin' || steps.check_permissions.outputs.result == 'maintainer' }}"
+ echo "is_maintainer=${is_maintainer}" >> "$GITHUB_ENV"
- name: Stop if not a maintainer
if: env.is_maintainer != 'true'
run: |
+ echo "::warning::Profiling skipped - User ${{ github.event.sender.login }} is not a maintainer"
echo "User ${{ github.event.sender.login }} is not a maintainer. Exiting."
exit 0🧰 Tools
🪛 actionlint (1.7.4)
31-31: shellcheck reported issue in this script: SC2086:info:1:140: Double quote to prevent globbing and word splitting
(shellcheck)
46-56: 🛠️ Refactor suggestion
Improve dependency management security and efficiency
- name: Install Poetry
uses: snok/[email protected]
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true
+ version: 1.7.1 # Pin to specific version
- name: Install dependencies
run: |
+ # Verify poetry.lock is in sync
+ poetry lock --check
- poetry install --no-interaction --all-extras
+ # Install only needed dependencies
+ poetry install --no-interaction --only main,dev
poetry run pip install pyinstrument66-91:
Fix inconsistency between comments and implementation
The comments mention Scalene but the implementation uses pyinstrument. This inconsistency should be addressed.
Also, consider creating a reusable composite action to reduce code duplication:
# .github/actions/run-profiling/action.yml
name: 'Run Profiling'
inputs:
sha:
required: true
output:
required: true
runs:
using: "composite"
steps:
- run: |
FILE_PATH="cognee/api/v1/cognify/code_graph_pipeline.py"
echo "Profiling for $FILE_PATH"
echo "Checking out SHA: ${{ inputs.sha }}"
git checkout "${{ inputs.sha }}"
if [ ! -f "$FILE_PATH" ]; then
echo "Error: $FILE_PATH not found"
exit 1
fi
chmod +x "$FILE_PATH"
poetry run pyinstrument --renderer json -o "${{ inputs.output }}" "$FILE_PATH"
shell: bash🧰 Tools
🪛 actionlint (1.7.4)
69-69: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
83-83: shellcheck reported issue in this script: SC2086:info:3:14: Double quote to prevent globbing and word splitting
(shellcheck)
| - name: Compare profiling results | ||
| run: | | ||
| python -c ' | ||
| import json | ||
| try: | ||
| with open("base_results.json") as f: | ||
| base = json.load(f) | ||
| with open("head_results.json") as f: | ||
| head = json.load(f) | ||
| cpu_diff = head.get("total_cpu_samples_python", 0) - base.get("total_cpu_samples_python", 0) | ||
| memory_diff = head.get("malloc_samples", 0) - base.get("malloc_samples", 0) | ||
| results = [ | ||
| f"CPU Usage Difference: {cpu_diff}", | ||
| f"Memory Usage Difference: {memory_diff} bytes" | ||
| ] | ||
| with open("profiling_diff.txt", "w") as f: | ||
| f.write("\\n".join(results) + "\\n") | ||
| print("\\n".join(results)) # Print results to terminal | ||
| except Exception as e: | ||
| error_message = f"Error comparing profiling results: {e}" | ||
| with open("profiling_diff.txt", "w") as f: | ||
| f.write(error_message + "\\n") | ||
| print(error_message) # Print error to terminal | ||
| ' | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Move Python comparison logic to a separate script
The embedded Python script should be moved to a separate file for better maintainability and testing.
Create a new file scripts/compare_profiles.py:
import json
import sys
from typing import Dict, Any
def compare_profiles(base_file: str, head_file: str) -> Dict[str, Any]:
try:
with open(base_file) as f:
base = json.load(f)
with open(head_file) as f:
head = json.load(f)
return {
'cpu_diff': head.get('total_cpu_samples_python', 0) - base.get('total_cpu_samples_python', 0),
'memory_diff': head.get('malloc_samples', 0) - base.get('malloc_samples', 0)
}
except Exception as e:
return {'error': str(e)}
if __name__ == '__main__':
results = compare_profiles('base_results.json', 'head_results.json')
with open('profiling_diff.txt', 'w') as f:
if 'error' in results:
f.write(f"Error comparing profiling results: {results['error']}\n")
else:
f.write(f"CPU Usage Difference: {results['cpu_diff']}\n")
f.write(f"Memory Usage Difference: {results['memory_diff']} bytes\n")Then update the workflow:
- name: Compare profiling results
run: |
- python -c '...'
+ python scripts/compare_profiles.py| # Post results to the pull request | ||
| # - name: Post profiling results to PR | ||
| # uses: actions/github-script@v6 | ||
| # with: | ||
| # script: | | ||
| # const fs = require('fs'); | ||
| # const diff = fs.readFileSync('profiling_diff.txt', 'utf-8'); | ||
| # github.rest.issues.createComment({ | ||
| # issue_number: context.issue.number, | ||
| # owner: context.repo.owner, | ||
| # repo: context.repo.repo, | ||
| # body: `### Profiling Results for code_graph_pipeline.py\n\`\`\`\n${diff || 'No differences found.'}\n\`\`\`` | ||
| # }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enable PR comment creation for visibility
The PR comment creation is currently commented out. This should be enabled to make profiling results visible to reviewers.
-# - name: Post profiling results to PR
-# uses: actions/github-script@v6
+ - name: Post profiling results to PR
+ if: github.event_name == 'pull_request'
+ uses: actions/github-script@v7
+ with:
+ script: |
+ const fs = require('fs');
+ const diff = fs.readFileSync('profiling_diff.txt', 'utf-8');
+ const body = `### Profiling Results for code_graph_pipeline.py
+
+ \`\`\`
+ ${diff || 'No differences found.'}
+ \`\`\`
+
+ <details>
+ <summary>About this profiling</summary>
+
+ - Base SHA: \`${{ env.BASE_SHA }}\`
+ - Head SHA: \`${{ env.HEAD_SHA }}\`
+ - Tool: pyinstrument
+ </details>`;
+
+ await github.rest.issues.createComment({
+ issue_number: context.issue.number,
+ owner: context.repo.owner,
+ repo: context.repo.repo,
+ body
+ });📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Post results to the pull request | |
| # - name: Post profiling results to PR | |
| # uses: actions/github-script@v6 | |
| # with: | |
| # script: | | |
| # const fs = require('fs'); | |
| # const diff = fs.readFileSync('profiling_diff.txt', 'utf-8'); | |
| # github.rest.issues.createComment({ | |
| # issue_number: context.issue.number, | |
| # owner: context.repo.owner, | |
| # repo: context.repo.repo, | |
| # body: `### Profiling Results for code_graph_pipeline.py\n\`\`\`\n${diff || 'No differences found.'}\n\`\`\`` | |
| # }); | |
| - name: Post profiling results to PR | |
| if: github.event_name == 'pull_request' | |
| uses: actions/github-script@v7 | |
| with: | |
| script: | | |
| const fs = require('fs'); | |
| const diff = fs.readFileSync('profiling_diff.txt', 'utf-8'); | |
| const body = `### Profiling Results for code_graph_pipeline.py | |
| \`\`\` | |
| ${diff || 'No differences found.'} | |
| \`\`\` | |
| <details> | |
| <summary>About this profiling</summary> | |
| - Base SHA: \`${{ env.BASE_SHA }}\` | |
| - Head SHA: \`${{ env.HEAD_SHA }}\` | |
| - Tool: pyinstrument | |
| </details>`; | |
| await github.rest.issues.createComment({ | |
| issue_number: context.issue.number, | |
| owner: context.repo.owner, | |
| repo: context.repo.repo, | |
| body | |
| }); |
| on: | ||
| push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vasilije1990 coderabbit has a good point here.
.github/workflows/profiling.yaml
Outdated
| steps: | ||
| # Checkout the code from the repository with full history | ||
| - name: Checkout code | ||
| uses: actions/checkout@v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to actions/checkout@v4
| print(error_message) # Print error to terminal | ||
| ' | ||
|
|
||
| # Post results to the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can upload this diff result as artifact -> https://github.com/actions/upload-artifact
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 9573981 | Triggered | Generic Password | ea879b2 | cognee/tests/test_deduplication.py | View secret |
| 9573981 | Triggered | Generic Password | ea879b2 | .github/workflows/test_deduplication.yml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Summary by CodeRabbit