-
Notifications
You must be signed in to change notification settings - Fork 960
Cog 577 add unit test task #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
dce894b
86e726d
d7ffef1
fbd0115
f326a4d
3cc1138
1889071
7d3f222
7d3657a
ade1fd2
a57530e
501d210
636bfae
c42bb51
adbf3df
e6bdb67
6f4ba20
c385783
2be1127
d7d8460
949dd50
d01061d
a541125
7a6cf53
8107709
2d74590
83995fa
826de0e
8a59cad
58a733c
af88870
aa1480c
cd80525
49bc07d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| import os | ||
|
|
||
| import pytest | ||
|
|
||
|
|
||
| @pytest.fixture(autouse=True, scope="session") | ||
| def copy_cognee_db_to_target_location(): | ||
| os.makedirs("cognee/.cognee_system/databases/", exist_ok=True) | ||
| os.system( | ||
| "cp cognee/tests/integration/run_toy_tasks/data/cognee_db cognee/.cognee_system/databases/cognee_db" | ||
| ) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| import os | ||
| import uuid | ||
|
|
||
| from cognee.modules.data.processing.document_types.PdfDocument import PdfDocument | ||
|
|
||
| GROUND_TRUTH = [ | ||
| {"word_count": 879, "len_text": 5622, "cut_type": "sentence_end"}, | ||
| {"word_count": 951, "len_text": 6384, "cut_type": "sentence_end"}, | ||
| ] | ||
|
|
||
|
|
||
| def test_PdfDocument(): | ||
| test_file_path = os.path.join( | ||
| os.sep, | ||
| *(os.path.dirname(__file__).split(os.sep)[:-2]), | ||
| "test_data", | ||
| "artificial-intelligence.pdf", | ||
| ) | ||
0xideas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| pdf_doc = PdfDocument( | ||
| id=uuid.uuid4(), name="Test document.pdf", raw_data_location=test_file_path | ||
| ) | ||
|
|
||
| for ground_truth, paragraph_data in zip( | ||
| GROUND_TRUTH, pdf_doc.read(chunk_size=1024) | ||
| ): | ||
|
Comment on lines
+23
to
+25
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Codebase verification Chunk size not adequately tested Tests do not verify different chunk sizes, which may affect the results.
🔗 Analysis chainVerify chunk size handling The test uses a fixed chunk size of 1024 bytes but doesn't verify how different chunk sizes might affect the results. Let's verify if the chunk size is documented or tested elsewhere: 🏁 Scripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Search for other tests or documentation of chunk size handling
rg -g '!*.pyc' -g '!*.pdf' "chunk_size" ./cognee/
Length of output: 7381 Script: #!/bin/bash
# Search for different chunk_size values in test files
rg -g 'tests/**/*.py' 'chunk_size\s*=\s*(?!1024)\d+' ./cognee/
Length of output: 313 Script: #!/bin/bash
# Search for different chunk_size values in test files, excluding 1024
rg -g 'tests/**/*.py' 'chunk_size\s*=\s*\d+' ./cognee/ | grep -v 'chunk_size\s*=\s*1024'
Length of output: 90 |
||
| assert ( | ||
| ground_truth["word_count"] == paragraph_data.word_count | ||
| ), f'{ground_truth["word_count"] = } != {paragraph_data.word_count = }' | ||
| assert ground_truth["len_text"] == len( | ||
| paragraph_data.text | ||
| ), f'{ground_truth["len_text"] = } != {len(paragraph_data.text) = }' | ||
| assert ( | ||
| ground_truth["cut_type"] == paragraph_data.cut_type | ||
| ), f'{ground_truth["cut_type"] = } != {paragraph_data.cut_type = }' | ||
Uh oh!
There was an error while loading. Please reload this page.