-
Notifications
You must be signed in to change notification settings - Fork 92
Generate optimized SQL query to improve performance of find_chunks_by_dedup_key method #9318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughSwitched MDStore chunk lookup from MongoDB to PostgreSQL JSONB queries, updated MapServer to pass base64 dedup keys (strings) instead of Buffers, added Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/server/object_services/md_store.js (1)
1540-1543: Update JSDoc type to match actual parameter type.The JSDoc indicates
@param {nb.DBBuffer[]} dedup_keysbut the implementation now expects base64 strings (as passed frommap_server.js). Update the type to reflect the actual usage:/** * @param {nb.Bucket} bucket - * @param {nb.DBBuffer[]} dedup_keys + * @param {string[]} dedup_keys - base64 encoded dedup keys * @returns {Promise<nb.ChunkSchemaDB[]>} */
🧹 Nitpick comments (2)
src/server/object_services/map_server.js (1)
88-93: Consider usingfilter+mapfor a more functional approach.The implementation is correct and aligns with the PostgreSQL path that expects base64 strings. The optional chaining (
chunk?.digest_b64) provides null safety.A more concise alternative using functional patterns:
- const dedup_keys = []; - chunks.forEach(chunk => { - if (chunk?.digest_b64) { - dedup_keys.push(chunk.digest_b64); - } - }); + const dedup_keys = chunks + .map(chunk => chunk?.digest_b64) + .filter(Boolean);This is optional and the current implementation works correctly.
src/test/integration_tests/db/test_md_store.js (1)
402-418: Good edge case coverage for empty dedup_key array.This test ensures that passing an empty array returns an empty result without errors, which validates the
FALSE AND data ? 'dedup_key'branch in the SQL query.Consider adding a test case for chunks that don't have a
dedup_keyfield to ensure they're properly excluded from results.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/sdk/nb.d.ts(1 hunks)src/server/object_services/map_server.js(1 hunks)src/server/object_services/md_store.js(2 hunks)src/test/integration_tests/db/test_md_store.js(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*
⚙️ CodeRabbit configuration file
src/test/**/*.*: Ensure that the PR includes tests for the changes.
Files:
src/test/integration_tests/db/test_md_store.js
🧠 Learnings (2)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Applied to files:
src/server/object_services/md_store.js
🧬 Code graph analysis (3)
src/server/object_services/md_store.js (1)
src/util/postgres_client.js (6)
require(13-13)require(14-14)require(26-26)query(326-326)query(331-331)res(258-258)
src/test/integration_tests/db/test_md_store.js (1)
src/server/object_services/md_store.js (2)
config(29-29)assert(8-8)
src/server/object_services/map_server.js (1)
src/test/integration_tests/db/test_md_store.js (2)
chunk(386-394)chunk(405-413)
🔇 Additional comments (3)
src/sdk/nb.d.ts (1)
771-771: LGTM!The addition of the
schemaproperty to theDBCollectioninterface is necessary to support thedecode_jsonusage infind_chunks_by_dedup_key. Theanytype is consistent with the existing patterns in this interface.src/test/integration_tests/db/test_md_store.js (1)
383-400: Test coverage looks good for the PostgreSQL path.The test correctly validates that:
- The result is an array
- At least one chunk is returned
- The frag ID matches the inserted chunk
The bucket mock structure
{ _id, system: { _id } }aligns with the expected parameters infind_chunks_by_dedup_key.src/server/object_services/md_store.js (1)
1545-1567: SQL query implementation looks correct with parameterized queries.The implementation:
- Uses
ANY($3)for array membership, which is more efficient thanINfor PostgreSQL- Uses parameterized queries preventing SQL injection
- Properly handles the empty
dedup_keysarray case withFALSE AND data ? 'dedup_key'- Sorts by
_id DESCto utilize the primary key indexTwo observations:
- Silent error handling: Returning an empty array on error could mask legitimate issues. Consider logging at a higher severity or re-throwing certain errors:
} catch (err) { dbg.error('Error while finding chunks by dedup_key. error is ', err); // Consider: throw err; or at least return based on error type return []; }
- Null handling: Line 1556 checks
data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb. This handles both missing keys and JSON null values, which is appropriate for JSONB columns.
Signed-off-by: Karthik P S <[email protected]>
20920f6 to
3eaf530
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/server/object_services/md_store.js (1)
1539-1543: Update JSDoc to reflect the changed parameter type.The JSDoc declares
dedup_keysasnb.DBBuffer[], but the PR changes this tostring[](base64-encoded). The implementation at line 1551 passes dedup_keys directly to a SQL query expecting string values for the ANY operator.Update the JSDoc to match the actual signature:
/** * @param {nb.Bucket} bucket - * @param {nb.DBBuffer[]} dedup_keys + * @param {string[]} dedup_keys - Base64-encoded deduplication keys * @returns {Promise<nb.ChunkSchemaDB[]>} */
♻️ Duplicate comments (1)
src/server/object_services/md_store.js (1)
16-16: Import path convention.The import path includes the
.jsextension. Per Node.js conventions, the extension can be omitted:-const { decode_json } = require('../../util/postgres_client.js'); +const { decode_json } = require('../../util/postgres_client');
🧹 Nitpick comments (1)
src/sdk/nb.d.ts (1)
773-773: Schema field addition looks good.The new
schema: anyfield enables PostgreSQL JSON decoding in md_store.js. Whileanyprovides no type safety, it's a pragmatic choice for now.For future improvement, consider typing this field more specifically, perhaps as a generic parameter based on the collection's schema type.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/sdk/nb.d.ts(1 hunks)src/server/object_services/map_server.js(1 hunks)src/server/object_services/md_store.js(2 hunks)src/test/integration_tests/db/test_md_store.js(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/test/integration_tests/db/test_md_store.js
- src/server/object_services/map_server.js
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Applied to files:
src/server/object_services/md_store.js
🧬 Code graph analysis (1)
src/server/object_services/md_store.js (1)
src/util/postgres_client.js (3)
query(326-326)query(331-331)res(258-258)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-jest-unit-tests
- GitHub Check: run-package-lock-validation
🔇 Additional comments (2)
src/server/object_services/md_store.js (2)
1545-1556: Query construction is well-implemented.The SQL query correctly:
- Uses parameterized queries ($1, $2, $3) to prevent SQL injection
- Employs the ANY operator for array comparison (per PR objectives)
- Sorts by the
_idprimary key column to leverage indexing (per PR objectives)- Follows patterns consistent with other queries in the codebase
These changes align with the stated performance goals of reducing query size and improving execution time.
1549-1554: Verify the dedup_keys parameter is properly formatted as an array for the ANY() operator.The code uses PostgreSQL's
ANY($3)operator, which requires the bound parameter to be a proper array type. When using node-postgres, the parameter must be passed as an array object. Ensure thatdedup_keysis consistently an array before being pushed tovaluesat line 1551—this includes edge cases where dedup_keys might be an empty array (which the code already handles) or single-item arrays. The JSONB operators (data ? 'dedup_key'anddata ->>) are standard PostgreSQL syntax and should work correctly.
liranmauda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On test_encription.js Is there any test that should run?
If not, we can do in line 33 (so you will not have to wrap the whole test.
Also we could add a comment for it that explain thta this test should not run if the DB is not SQL
if (config.DB_TYPE !== 'postgres') return;
Signed-off-by: Karthik P S <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/server/object_services/md_store.js(2 hunks)src/test/integration_tests/internal/test_encryption.js(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/server/object_services/md_store.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*
⚙️ CodeRabbit configuration file
src/test/**/*.*: Ensure that the PR includes tests for the changes.
Files:
src/test/integration_tests/internal/test_encryption.js
🧠 Learnings (2)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
📚 Learning: 2025-11-19T15:03:42.260Z
Learnt from: shirady
Repo: noobaa/noobaa-core PR: 9291
File: src/server/common_services/auth_server.js:548-554
Timestamp: 2025-11-19T15:03:42.260Z
Learning: In src/server/common_services/auth_server.js, account objects are loaded directly from system_store (e.g., system_store.data.get_by_id()), so account.owner is an object ID reference with an ._id property, not a string. This differs from s3_rest.js where account.owner is a string due to RPC serialization.
Applied to files:
src/test/integration_tests/internal/test_encryption.js
🧬 Code graph analysis (1)
src/test/integration_tests/internal/test_encryption.js (3)
src/server/object_services/md_store.js (1)
config(29-29)src/test/integration_tests/db/test_md_store.js (3)
config(12-12)mocha(10-10)i(202-202)src/server/object_services/map_server.js (1)
config(11-11)
🪛 Biome (2.1.2)
src/test/integration_tests/internal/test_encryption.js
[error] 36-36: Illegal return statement outside of a function
(parse)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-package-lock-validation
- GitHub Check: run-jest-unit-tests
🔇 Additional comments (1)
src/test/integration_tests/internal/test_encryption.js (1)
138-164: Account lifecycle and minor test tweaks look consistentThe changes around:
- Creating 20 accounts and validating their master keys and access-key encryption (
create accounts and compare account access keys succefully).- Using those accounts across external-connection / namespace-resource tests.
- Regenerating creds for all non-coretest accounts.
- Deleting all those accounts at the end of the suite.
- Minor formatting/wrapping adjustments (e.g.,
pool_by_name(..., namespace_resource_name)and thedisable_master_keycalls).are consistent with the existing test patterns and help ensure we don’t leak test accounts/resources into later suites. With the DB-type guard fixed, this flow should give good coverage of account/master-key behavior on PostgreSQL without introducing new functional risk.
No changes needed here from my side.
Also applies to: 220-220, 455-471, 473-478, 594-595, 617-618
| // @ts-ignore | ||
| // Do not run below tests if DB is not PostgreSQL | ||
| if (config.DB_TYPE !== 'postgres') return; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix illegal top‑level return and use Mocha‑level skipping instead
The guard:
// @ts-ignore
// Do not run below tests if DB is not PostgreSQL
if (config.DB_TYPE !== 'postgres') return;is at file scope, which is a syntax error in Node/JS (return is only allowed inside functions). This matches the Biome parse error on Line 36. As written, this file will fail to parse, so the tests will never run.
Suggest moving the DB-type guard into Mocha context and removing the // @ts-ignore, for example:
config.MIN_CHUNK_AGE_FOR_DEDUP = 0;
-
-// @ts-ignore
-// Do not run below tests if DB is not PostgreSQL
-if (config.DB_TYPE !== 'postgres') return;
+// Run these tests only when DB is PostgreSQL
+const IS_POSTGRES = config.DB_TYPE === 'postgres';
@@
-mocha.describe('Encryption tests', function() {
+mocha.describe('Encryption tests', function() {
+ mocha.before(function() {
+ if (!IS_POSTGRES) this.skip();
+ });
@@
-mocha.describe('Rotation tests', function() {
+mocha.describe('Rotation tests', function() {
+ mocha.before(function() {
+ if (!IS_POSTGRES) this.skip();
+ });This keeps the suite PostgreSQL-only, avoids top-level control flow, and gives Mocha proper “skipped” semantics instead of a parse error. You can then drop the // @ts-ignore entirely.
🧰 Tools
🪛 Biome (2.1.2)
[error] 36-36: Illegal return statement outside of a function
(parse)
🤖 Prompt for AI Agents
In src/test/integration_tests/internal/test_encryption.js around lines 34–37,
remove the top-level "if (config.DB_TYPE !== 'postgres') return;" and the "//
@ts-ignore" (top-level return is illegal and causes parse errors); instead, keep
the test file but move the DB-type guard into the Mocha suite by placing a
before() hook (or the first statement inside the top-level describe) that checks
config.DB_TYPE !== 'postgres' and calls this.skip() (or uses the suite-level
skip mechanism) so the suite is properly skipped under Mocha when not running
against PostgreSQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a psedo-code
the actuall code should be something like:
mocha.describe('Encryption tests', function() {
before(function () {
if (config.DB_TYPE !== 'postgres') {
this.skip();
}
...
});
On any top level describe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liranmauda, thanks for the clarification! I understand now—you want to use a before hook inside each top-level describe to skip the tests when the DB is not PostgreSQL.
Here's how to apply this pattern to both top-level describes in the file:
config.MIN_CHUNK_AGE_FOR_DEDUP = 0;
mocha.describe('Encryption tests', function() {
mocha.before(function() {
if (config.DB_TYPE !== 'postgres') {
this.skip();
}
});
const { rpc_client, EMAIL, SYSTEM } = coretest;
let response_account;
// ... rest of the tests
});
// ... (other code)
mocha.describe('Rotation tests', function() {
mocha.before(function() {
if (config.DB_TYPE !== 'postgres') {
this.skip();
}
});
const { rpc_client, EMAIL, SYSTEM } = coretest;
let accounts;
let buckets;
// ... rest of the tests
});This removes the illegal top-level return statement (and the // @ts-ignore) and properly skips both test suites when not running against PostgreSQL. Each suite will show as "skipped" in the test output rather than causing a parse error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipping the test using before hook would still run all the "after" hooks. This would throw errors specifically in the unpopulate_system function call as the initilaization would not have taken place. I would have to repeat the same skip statements in each 'after' hook. Wouldn't a single return or conditional if be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
jackyalbo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can you share a comparison of before and after to this PR description?
| if (dedup_keys.length) { | ||
| query += ` AND (data ->> 'dedup_key' = ANY($3) AND data ? 'dedup_key')`; | ||
| values.push(dedup_keys); | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we suppose to have empty dedup keys.
| if (!dedup_keys.length) return; |
we can avoid running the query in this case. just return in the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result: Tested with 1M+ rows in datachunks, with the query filtering 100 dedup_keys.
nbcore=# select count(*) from datachunks
count
---------
1088000
**Current query:**
EXPLAIN ANALYZE SELECT * FROM datachunks WHERE (data->>'system'='693124b23742600022f30a64' and data->>'bucket'='693124b23742600022f30a6c' and (data->>'dedup_key' IN ('YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=',
'YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=',
....(removed for readability)....
'zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE=') and data ? 'dedup_key') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) ORDER BY data->>'_id' DESC;
Sort (cost=169.30..169.30 rows=1 width=1454) (actual time=0.448..0.452 rows=100 loops=1)
Sort Key: ((data ->> '_id'::text)) DESC
Sort Method: quicksort Memory: 169kB
-> Index Scan using idx_btree_datachunks_dedup_key_system_bucket on datachunks (cost=0.55..169.29 rows=1 width=1454) (actual time=0.030..0.375 rows=100 loops=1)
Index Cond: (((data ->> 'dedup_key'::text) = ANY ('{YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,wntv6uTo3IhN1BwepJqGTe493C0pB88j4SMsB92UrvQ=,
....(removed for readability)....
aO3VxeMSCz/041jhXpKEc+CPFTCO++Q9Q9WdaTF093k=,zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE=}'::text[])) AND ((data ->> 'system'::text) = '693124b23742600022f30a64'::text) AND ((data ->> 'bucket'::text) = '693124b23742600022f30a6c'::text))
Filter: (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb))
Planning Time: 0.125 ms
Execution Time: 0.471 ms
(8 rows)
**With changes:**
EXPLAIN ANALYZE SELECT * FROM datachunks WHERE (data->>'system'='693124b23742600022f30a64' and data->>'bucket'='693124b23742600022f30a6c' and (data->>'dedup_key'=ANY(ARRAY['YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=',
'YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=',
....(removed for readability)....
'zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE='])
and data ? 'dedup_key') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) ORDER BY _id DESC;
Sort (cost=169.29..169.30 rows=1 width=1422) (actual time=0.433..0.437 rows=100 loops=1)
Sort Key: _id DESC
Sort Method: quicksort Memory: 167kB
-> Index Scan using idx_btree_datachunks_dedup_key_system_bucket on datachunks (cost=0.55..169.28 rows=1 width=1422) (actual time=0.030..0.368 rows=100 loops=1)
Index Cond: (((data ->> 'dedup_key'::text) = ANY ('{YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,wntv6uTo3
IhN1BwepJqGTe493C0pB88j4SMsB92UrvQ=,z8R8VzzFUaX+10H+JDzb/5qGh10IPUuGbVInOuuTz3Q=,FzlSOeTfPmi1upqzMe6ZfhlYa+V0MzB2nKQqPZ9pj9Q=,
....(removed for readability)....
aO3VxeMSCz/041jhXpKEc+CPFTCO++Q9Q9WdaTF093k=,zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE=}'::text[])) AND ((data ->> 'system'::text) = '693124b23742600022f30a64'::text) AND ((data ->> 'bucket'::text) = '693124b23742600022f30a6c'::text))
Filter: (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb))
Planning Time: 0.119 ms
Execution Time: 0.450 ms
| const dedup_keys = []; | ||
| chunks.forEach(chunk => { | ||
| if (chunk?.digest_b64) { | ||
| dedup_keys.push(chunk.digest_b64); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const dedup_keys = []; | |
| chunks.forEach(chunk => { | |
| if (chunk?.digest_b64) { | |
| dedup_keys.push(chunk.digest_b64); | |
| } | |
| }); | |
| const dedup_keys = chunks.map(chunk => chunk.digest_b64).filter(Boolean); |
| await this.load_blocks_for_chunks(chunks); | ||
| return chunks; | ||
| const values = []; | ||
| let query = `SELECT * FROM ${this._chunks.name} WHERE (data ->> 'system' = $1 AND data ->> 'bucket' = $2`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking maybe working with @Neon-White format for this queries making them more readable. @dannyzaken WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noobaa-core/src/server/object_services/md_store.js
Lines 307 to 317 in dae43d2
| const query = ` | |
| UPDATE ${table_name} | |
| SET data = jsonb_set(data, '{deleted}', to_jsonb($1::text), true) | |
| WHERE | |
| ${sql_and_conditions( | |
| `data->>'bucket' = '${bucket_id}'`, | |
| `data->>'deleted' IS NULL`, | |
| `data->>'upload_started' IS NOT NULL`, | |
| `(EXTRACT(EPOCH FROM NOW()) - ${convert_mongoid_to_timestamp_sql("data->>'upload_started'")}) / 86400 > ${days_after_initiation}`, | |
| sql_condition0, sql_condition1, sql_condition2, sql_condition3, | |
| )};`; |
for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatted the query as suggested.
Signed-off-by: Karthik P S <[email protected]>
| return chunks; | ||
| } catch (err) { | ||
| dbg.error('Error while finding chunks by dedup_key. error is ', err); | ||
| return []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not catch all errors here and ignore them, as it can cause silent failures that are hidden in the logs. It's a good idea to log the error and rethrow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm rethrowing the error now rather than ignoring.
|
|
||
| // @ts-ignore | ||
| // Do not run below tests if DB is not PostgreSQL | ||
| if (config.DB_TYPE !== 'postgres') return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karthikps97 @liranmauda, why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are executed as part of both PostgreSQL tests and MongoDB tests. We are skipping this for the MongoDB run as it will fail since executeSQL is not implemented by mongo client.
| assert(chunksArr[0].frags[0]?._id?.toString() === chunk.frags[0]._id.toString()); | ||
| }); | ||
|
|
||
| mocha.it('find_chunks_by_dedup_key empty dedup_key array passed', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this specific case need to be tested? We don't expect an empty array, IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We return early from the function in case an empty dedup_keys array is passed. So, added this test case.
| return md_store.delete_chunks_by_ids(_.map(chunks, '_id')); | ||
| }); | ||
|
|
||
| mocha.it('find_chunks_by_dedup_key()', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth adding more tests for different cases. e.g.: test that we return an empty array when there is no match, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one test for this scenario.
Signed-off-by: Karthik P S <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/server/object_services/md_store.js (1)
1560-1564: Fix incomplete optional chaining to prevent runtime errors.Line 1562 uses incomplete optional chaining. If
resorres.rowsis undefined, calling.map()will throw a TypeError. This issue was flagged in a previous review but remains unresolved.🔎 Proposed fix
- const res = await this._chunks.executeSQL(query, values); - const chunks = res.rows.map(row => decode_json(this._chunks.schema, row.data)); - await this.load_blocks_for_chunks(chunks); - return chunks; + const res = await this._chunks.executeSQL(query, values); + const chunks = res?.rows?.map(row => decode_json(this._chunks.schema, row.data)) ?? []; + await this.load_blocks_for_chunks(chunks); + return chunks;This ensures
chunksis always an array, even if the query returns no result or an unexpected response.
🧹 Nitpick comments (1)
src/server/object_services/md_store.js (1)
1547-1557: Consider adding a LIMIT clause to prevent unbounded result sets.The query lacks a
LIMITclause, which could return a very large number of rows if many chunks match the dedup keys. This could lead to memory pressure or performance degradation.Consider adding a reasonable limit (e.g., 1000 or 10000) or accepting a
limitparameter if callers need control over the result set size.🔎 Example with LIMIT clause
const query = ` SELECT * FROM ${this._chunks.name} WHERE (data ->> 'system' = $1 AND data ->> 'bucket' = $2 AND (data ->> 'dedup_key' = ANY($3) AND data ? 'dedup_key') AND (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) - ORDER BY _id DESC; + ORDER BY _id DESC + LIMIT 10000; `;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/server/object_services/md_store.jssrc/test/integration_tests/db/test_md_store.js
🚧 Files skipped from review as they are similar to previous changes (1)
- src/test/integration_tests/db/test_md_store.js
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Applied to files:
src/server/object_services/md_store.js
📚 Learning: 2025-11-13T07:56:23.620Z
Learnt from: shirady
Repo: noobaa/noobaa-core PR: 9281
File: src/server/system_services/account_server.js:1053-1058
Timestamp: 2025-11-13T07:56:23.620Z
Learning: In noobaa-core, account_server.js is only used in containerized deployments, not in NSFS/NC deployments. NSFS/NC deployments have separate account management code in src/manage_nsfs/ directory. Therefore, account_server.js only processes accounts from account_schema.js where owner is an objectid reference, never from nsfs_account_schema.js where owner is a string.
Applied to files:
src/server/object_services/md_store.js
📚 Learning: 2025-11-19T15:03:42.260Z
Learnt from: shirady
Repo: noobaa/noobaa-core PR: 9291
File: src/server/common_services/auth_server.js:548-554
Timestamp: 2025-11-19T15:03:42.260Z
Learning: In src/server/common_services/auth_server.js, account objects are loaded directly from system_store (e.g., system_store.data.get_by_id()), so account.owner is an object ID reference with an ._id property, not a string. This differs from s3_rest.js where account.owner is a string due to RPC serialization.
Applied to files:
src/server/object_services/md_store.js
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.
Applied to files:
src/server/object_services/md_store.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-package-lock-validation
- GitHub Check: Build Noobaa Image
- GitHub Check: run-jest-unit-tests
🔇 Additional comments (2)
src/server/object_services/md_store.js (2)
1545-1545: LGTM on the early return!The early return for empty or missing
dedup_keysprevents unnecessary query execution and aligns with the feedback from past reviews.
1565-1568: Error handling looks appropriate.Logging the error before rethrowing provides valuable debugging information while still propagating the error to callers, which aligns with feedback from previous reviews.
Describe the Problem
The mongo to pg query converter uses IN operator for array. But in case of postgres, using ANY operator improves performance as the query size is reduced and it also allows for plan caching. The sort is also applied on '_id' JSON field of data column, but there is _id column already present (primary key).
Explain the Changes
Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
Summary by CodeRabbit
Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.