Generate optimized SQL query to improve performance of find_chunks_by_dedup_key method #9318

karthikps97 · 2025-11-27T08:43:44Z

Describe the Problem

The mongo to pg query converter uses IN operator for array. But in case of postgres, using ANY operator improves performance as the query size is reduced and it also allows for plan caching. The sort is also applied on '_id' JSON field of data column, but there is _id column already present (primary key).

Explain the Changes

Refactored the find_chunks_by_dedup_key method to self generate SQL queries. Improvements are:

Using ANY instead of IN improves query planning time for larger array size. (Postgres internally converts IN to ANY, but it requires an additional parsing step)
Using _id instead of data._id for sorting improves query execution time as we are utilizing the primary key index.

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

Doc added/updated
Tests added

Summary by CodeRabbit

Improvements
- Chunk deduplication retrieval now uses a SQL-backed path for broader database support and improved resilience.
- Collection objects expose schema metadata for easier inspection.
Tests
- Added integration tests for deduplicated chunk lookup, non-existent-key and empty-key behavior (Postgres-only).
- Encryption test suite now skips when not running against Postgres.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-27T08:43:50Z

Walkthrough

Switched MDStore chunk lookup from MongoDB to PostgreSQL JSONB queries, updated MapServer to pass base64 dedup keys (strings) instead of Buffers, added schema to the DBCollection TypeScript interface, and added PostgreSQL-gated integration tests for dedup lookup.

Changes

Cohort / File(s)	Summary
Type definitions `src/sdk/nb.d.ts`	Added public `schema: any` property to the `DBCollection` interface.
Deduplication key handling `src/server/object_services/map_server.js`	`GetMapping.find_dups` now collects `chunk.digest_b64` (base64 strings) into `dedup_keys` and filters falsy values; dedup key type moved from `Buffer[]` to `string[]`.
Database implementation migration `src/server/object_services/md_store.js`	Replaced MongoDB-based `find_chunks_by_dedup_key` with a PostgreSQL implementation: added `decode_json` import, constructs and executes SQL filtering by system, bucket, JSONB `dedup_key` membership and `deleted IS NULL`, decodes rows to chunk objects, calls `load_blocks_for_chunks`, and returns chunks; errors are logged and rethrown.
PostgreSQL-gated integration tests `src/test/integration_tests/db/test_md_store.js`, `src/test/integration_tests/internal/test_encryption.js`	Added three PostgreSQL-only tests for `find_chunks_by_dedup_key` (success, nonexistent key, empty key array); added DB_TYPE guard to encryption tests to skip when not PostgreSQL.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Map as MapServer (GetMapping)
participant MD as MDStore
participant PG as PostgreSQL
participant Blocks as Block Loader
Map->>MD: find_chunks_by_dedup_key(bucket, [dedup_key_b64,...])
MD->>PG: execute SQL (filter by system, bucket, JSONB dedup_key IN list, deleted IS NULL)
PG-->>MD: rows (data JSONB)
MD->>MD: decode_json(row.data) → chunk objects
MD->>Blocks: load_blocks_for_chunks(chunks)
Blocks-->>MD: chunks with blocks
MD-->>Map: return chunks array

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Focus areas:
- src/server/object_services/md_store.js: SQL construction/parameterization, JSONB membership condition, decode_json usage, parity with previous Mongo behavior, and error propagation.
- src/server/object_services/map_server.js: callers' assumptions about dedup key types.
- src/test/integration_tests/db/test_md_store.js: test setup, teardown, and PostgreSQL gating correctness.

Possibly related PRs

improve performance by introducing composite index for datachunks #9293 — Adds a composite index and migration for (system, bucket, dedup_key) to optimize PostgreSQL JSONB dedup queries.

Suggested reviewers

dannyzaken
jackyalbo

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: refactoring find_chunks_by_dedup_key to use optimized SQL queries for better performance, which is the primary focus across md_store.js, map_server.js, and the related tests.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/server/object_services/md_store.js (1)
1540-1543: Update JSDoc type to match actual parameter type.

The JSDoc indicates @param {nb.DBBuffer[]} dedup_keys but the implementation now expects base64 strings (as passed from map_server.js). Update the type to reflect the actual usage:
     /**
      * @param {nb.Bucket} bucket
-     * @param {nb.DBBuffer[]} dedup_keys
+     * @param {string[]} dedup_keys - base64 encoded dedup keys
      * @returns {Promise<nb.ChunkSchemaDB[]>}
      */

🧹 Nitpick comments (2)

src/server/object_services/map_server.js (1)
88-93: Consider using filter + map for a more functional approach.

The implementation is correct and aligns with the PostgreSQL path that expects base64 strings. The optional chaining (chunk?.digest_b64) provides null safety.

A more concise alternative using functional patterns:
-            const dedup_keys = [];
-            chunks.forEach(chunk => {
-                if (chunk?.digest_b64) {
-                    dedup_keys.push(chunk.digest_b64);
-                }
-            });
+            const dedup_keys = chunks
+                .map(chunk => chunk?.digest_b64)
+                .filter(Boolean);
This is optional and the current implementation works correctly.
src/test/integration_tests/db/test_md_store.js (1)

402-418: Good edge case coverage for empty dedup_key array.

This test ensures that passing an empty array returns an empty result without errors, which validates the FALSE AND data ? 'dedup_key' branch in the SQL query.

Consider adding a test case for chunks that don't have a dedup_key field to ensure they're properly excluded from results.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1382037 and 20920f6.

📒 Files selected for processing (4)

src/sdk/nb.d.ts (1 hunks)
src/server/object_services/map_server.js (1 hunks)
src/server/object_services/md_store.js (2 hunks)
src/test/integration_tests/db/test_md_store.js (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

src/test/integration_tests/db/test_md_store.js

🧠 Learnings (2)

📓 Common learnings

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.

📚 Learning: 2025-08-08T13:12:46.728Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

src/server/object_services/md_store.js

🧬 Code graph analysis (3)

src/server/object_services/md_store.js (1)

src/util/postgres_client.js (6)

require (13-13)

require (14-14)

require (26-26)

query (326-326)

query (331-331)

res (258-258)

src/test/integration_tests/db/test_md_store.js (1)

src/server/object_services/md_store.js (2)

config (29-29)

assert (8-8)

src/server/object_services/map_server.js (1)

src/test/integration_tests/db/test_md_store.js (2)

chunk (386-394)

chunk (405-413)

🔇 Additional comments (3)

src/sdk/nb.d.ts (1)

771-771: LGTM!

The addition of the schema property to the DBCollection interface is necessary to support the decode_json usage in find_chunks_by_dedup_key. The any type is consistent with the existing patterns in this interface.

src/test/integration_tests/db/test_md_store.js (1)

383-400: Test coverage looks good for the PostgreSQL path.

The test correctly validates that:

The result is an array

At least one chunk is returned

The frag ID matches the inserted chunk

The bucket mock structure { _id, system: { _id } } aligns with the expected parameters in find_chunks_by_dedup_key.
src/server/object_services/md_store.js (1)
1545-1567: SQL query implementation looks correct with parameterized queries.

The implementation:

Uses ANY($3) for array membership, which is more efficient than IN for PostgreSQL

Uses parameterized queries preventing SQL injection

Properly handles the empty dedup_keys array case with FALSE AND data ? 'dedup_key'

Sorts by _id DESC to utilize the primary key index

Two observations:

Silent error handling: Returning an empty array on error could mask legitimate issues. Consider logging at a higher severity or re-throwing certain errors:
} catch (err) {
    dbg.error('Error while finding chunks by dedup_key. error is ', err);
    // Consider: throw err; or at least return based on error type
    return [];
}
Null handling: Line 1556 checks data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb. This handles both missing keys and JSON null values, which is appropriate for JSONB columns.

src/server/object_services/md_store.js

Signed-off-by: Karthik P S <[email protected]>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/server/object_services/md_store.js (1)
1539-1543: Update JSDoc to reflect the changed parameter type.

The JSDoc declares dedup_keys as nb.DBBuffer[], but the PR changes this to string[] (base64-encoded). The implementation at line 1551 passes dedup_keys directly to a SQL query expecting string values for the ANY operator.

Update the JSDoc to match the actual signature:
 /**
  * @param {nb.Bucket} bucket
- * @param {nb.DBBuffer[]} dedup_keys
+ * @param {string[]} dedup_keys - Base64-encoded deduplication keys
  * @returns {Promise<nb.ChunkSchemaDB[]>}
  */

♻️ Duplicate comments (1)

src/server/object_services/md_store.js (1)
16-16: Import path convention.

The import path includes the .js extension. Per Node.js conventions, the extension can be omitted:
-const { decode_json } = require('../../util/postgres_client.js');
+const { decode_json } = require('../../util/postgres_client');

🧹 Nitpick comments (1)

src/sdk/nb.d.ts (1)

773-773: Schema field addition looks good.

The new schema: any field enables PostgreSQL JSON decoding in md_store.js. While any provides no type safety, it's a pragmatic choice for now.

For future improvement, consider typing this field more specifically, perhaps as a generic parameter based on the collection's schema type.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20920f6 and 3eaf530.

📒 Files selected for processing (4)

src/sdk/nb.d.ts (1 hunks)
src/server/object_services/map_server.js (1 hunks)
src/server/object_services/md_store.js (2 hunks)
src/test/integration_tests/db/test_md_store.js (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/test/integration_tests/db/test_md_store.js
src/server/object_services/map_server.js

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

📚 Learning: 2025-08-08T13:12:46.728Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

src/server/object_services/md_store.js

🧬 Code graph analysis (1)

src/server/object_services/md_store.js (1)

src/util/postgres_client.js (3)

query (326-326)

query (331-331)

res (258-258)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build Noobaa Image
GitHub Check: run-jest-unit-tests
GitHub Check: run-package-lock-validation

🔇 Additional comments (2)

src/server/object_services/md_store.js (2)

1545-1556: Query construction is well-implemented.

The SQL query correctly:

Uses parameterized queries ($1, $2, $3) to prevent SQL injection

Employs the ANY operator for array comparison (per PR objectives)

Sorts by the _id primary key column to leverage indexing (per PR objectives)

Follows patterns consistent with other queries in the codebase

These changes align with the stated performance goals of reducing query size and improving execution time.

1549-1554: Verify the dedup_keys parameter is properly formatted as an array for the ANY() operator.

The code uses PostgreSQL's ANY($3) operator, which requires the bound parameter to be a proper array type. When using node-postgres, the parameter must be passed as an array object. Ensure that dedup_keys is consistently an array before being pushed to values at line 1551—this includes edge cases where dedup_keys might be an empty array (which the code already handles) or single-item arrays. The JSONB operators (data ? 'dedup_key' and data ->>) are standard PostgreSQL syntax and should work correctly.

src/server/object_services/md_store.js

liranmauda

On test_encription.js Is there any test that should run?
If not, we can do in line 33 (so you will not have to wrap the whole test.
Also we could add a comment for it that explain thta this test should not run if the DB is not SQL

if (config.DB_TYPE !== 'postgres') return;

Signed-off-by: Karthik P S <[email protected]>

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3eaf530 and 9295259.

📒 Files selected for processing (2)

src/server/object_services/md_store.js (2 hunks)
src/test/integration_tests/internal/test_encryption.js (8 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/server/object_services/md_store.js

🧰 Additional context used

📓 Path-based instructions (1)

src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

src/test/integration_tests/internal/test_encryption.js

🧠 Learnings (2)

📓 Common learnings

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

📚 Learning: 2025-11-19T15:03:42.260Z

Learnt from: shirady
Repo: noobaa/noobaa-core PR: 9291
File: src/server/common_services/auth_server.js:548-554
Timestamp: 2025-11-19T15:03:42.260Z
Learning: In src/server/common_services/auth_server.js, account objects are loaded directly from system_store (e.g., system_store.data.get_by_id()), so account.owner is an object ID reference with an ._id property, not a string. This differs from s3_rest.js where account.owner is a string due to RPC serialization.

Applied to files:

src/test/integration_tests/internal/test_encryption.js

🧬 Code graph analysis (1)

src/test/integration_tests/internal/test_encryption.js (3)

src/server/object_services/md_store.js (1)

config (29-29)

src/test/integration_tests/db/test_md_store.js (3)

config (12-12)

mocha (10-10)

i (202-202)

src/server/object_services/map_server.js (1)

config (11-11)

🪛 Biome (2.1.2)

src/test/integration_tests/internal/test_encryption.js

[error] 36-36: Illegal return statement outside of a function

(parse)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build Noobaa Image
GitHub Check: run-package-lock-validation
GitHub Check: run-jest-unit-tests

🔇 Additional comments (1)

src/test/integration_tests/internal/test_encryption.js (1)

138-164: Account lifecycle and minor test tweaks look consistent

The changes around:

Creating 20 accounts and validating their master keys and access-key encryption (create accounts and compare account access keys succefully).

Using those accounts across external-connection / namespace-resource tests.

Regenerating creds for all non-coretest accounts.

Deleting all those accounts at the end of the suite.

Minor formatting/wrapping adjustments (e.g., pool_by_name(..., namespace_resource_name) and the disable_master_key calls).

are consistent with the existing test patterns and help ensure we don’t leak test accounts/resources into later suites. With the DB-type guard fixed, this flow should give good coverage of account/master-key behavior on PostgreSQL without introducing new functional risk.

No changes needed here from my side.

Also applies to: 220-220, 455-471, 473-478, 594-595, 617-618

coderabbitai · 2025-12-09T06:14:19Z

src/test/integration_tests/internal/test_encryption.js

+// @ts-ignore
+// Do not run below tests if DB is not PostgreSQL
+if (config.DB_TYPE !== 'postgres') return;
+


⚠️ Potential issue | 🔴 Critical

Fix illegal top‑level return and use Mocha‑level skipping instead

The guard:

// @ts-ignore // Do not run below tests if DB is not PostgreSQL if (config.DB_TYPE !== 'postgres') return;

is at file scope, which is a syntax error in Node/JS (return is only allowed inside functions). This matches the Biome parse error on Line 36. As written, this file will fail to parse, so the tests will never run.

Suggest moving the DB-type guard into Mocha context and removing the // @ts-ignore, for example:

config.MIN_CHUNK_AGE_FOR_DEDUP = 0; - -// @ts-ignore -// Do not run below tests if DB is not PostgreSQL -if (config.DB_TYPE !== 'postgres') return; +// Run these tests only when DB is PostgreSQL +const IS_POSTGRES = config.DB_TYPE === 'postgres'; @@ -mocha.describe('Encryption tests', function() { +mocha.describe('Encryption tests', function() { + mocha.before(function() { + if (!IS_POSTGRES) this.skip(); + }); @@ -mocha.describe('Rotation tests', function() { +mocha.describe('Rotation tests', function() { + mocha.before(function() { + if (!IS_POSTGRES) this.skip(); + });

This keeps the suite PostgreSQL-only, avoids top-level control flow, and gives Mocha proper “skipped” semantics instead of a parse error. You can then drop the // @ts-ignore entirely.

🧰 Tools

🪛 Biome (2.1.2)

[error] 36-36: Illegal return statement outside of a function

(parse)

🤖 Prompt for AI Agents

In src/test/integration_tests/internal/test_encryption.js around lines 34–37, remove the top-level "if (config.DB_TYPE !== 'postgres') return;" and the "// @ts-ignore" (top-level return is illegal and causes parse errors); instead, keep the test file but move the DB-type guard into the Mocha suite by placing a before() hook (or the first statement inside the top-level describe) that checks config.DB_TYPE !== 'postgres' and calls this.skip() (or uses the suite-level skip mechanism) so the suite is properly skipped under Mocha when not running against PostgreSQL.

It was a psedo-code
the actuall code should be something like:

mocha.describe('Encryption tests', function() { before(function () { if (config.DB_TYPE !== 'postgres') { this.skip(); } ... });

On any top level describe.

@liranmauda, thanks for the clarification! I understand now—you want to use a before hook inside each top-level describe to skip the tests when the DB is not PostgreSQL.

Here's how to apply this pattern to both top-level describes in the file:

config.MIN_CHUNK_AGE_FOR_DEDUP = 0; mocha.describe('Encryption tests', function() { mocha.before(function() { if (config.DB_TYPE !== 'postgres') { this.skip(); } }); const { rpc_client, EMAIL, SYSTEM } = coretest; let response_account; // ... rest of the tests }); // ... (other code) mocha.describe('Rotation tests', function() { mocha.before(function() { if (config.DB_TYPE !== 'postgres') { this.skip(); } }); const { rpc_client, EMAIL, SYSTEM } = coretest; let accounts; let buckets; // ... rest of the tests });

This removes the illegal top-level return statement (and the // @ts-ignore) and properly skips both test suites when not running against PostgreSQL. Each suite will show as "skipped" in the test output rather than causing a parse error.

Skipping the test using before hook would still run all the "after" hooks. This would throw errors specifically in the unpopulate_system function call as the initilaization would not have taken place. I would have to repeat the same skip statements in each 'after' hook. Wouldn't a single return or conditional if be better?

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

jackyalbo

Also can you share a comparison of before and after to this PR description?

jackyalbo · 2025-12-09T09:46:27Z

src/server/object_services/md_store.js

+        if (dedup_keys.length) {
+            query += ` AND (data ->> 'dedup_key' = ANY($3) AND data ? 'dedup_key')`;
+            values.push(dedup_keys);
+        } else {


I don't think we suppose to have empty dedup keys.

noobaa-core/src/server/object_services/map_server.js

Line 90 in dae43d2

if (!dedup_keys.length) return;

we can avoid running the query in this case. just return in the beginning.

Result: Tested with 1M+ rows in datachunks, with the query filtering 100 dedup_keys. nbcore=# select count(*) from datachunks count --------- 1088000 **Current query:** EXPLAIN ANALYZE SELECT * FROM datachunks WHERE (data->>'system'='693124b23742600022f30a64' and data->>'bucket'='693124b23742600022f30a6c' and (data->>'dedup_key' IN ('YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=', 'YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=', ....(removed for readability).... 'zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE=') and data ? 'dedup_key') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) ORDER BY data->>'_id' DESC; Sort (cost=169.30..169.30 rows=1 width=1454) (actual time=0.448..0.452 rows=100 loops=1) Sort Key: ((data ->> '_id'::text)) DESC Sort Method: quicksort Memory: 169kB -> Index Scan using idx_btree_datachunks_dedup_key_system_bucket on datachunks (cost=0.55..169.29 rows=1 width=1454) (actual time=0.030..0.375 rows=100 loops=1) Index Cond: (((data ->> 'dedup_key'::text) = ANY ('{YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,wntv6uTo3IhN1BwepJqGTe493C0pB88j4SMsB92UrvQ=, ....(removed for readability).... aO3VxeMSCz/041jhXpKEc+CPFTCO++Q9Q9WdaTF093k=,zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE=}'::text[])) AND ((data ->> 'system'::text) = '693124b23742600022f30a64'::text) AND ((data ->> 'bucket'::text) = '693124b23742600022f30a6c'::text)) Filter: (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb)) Planning Time: 0.125 ms Execution Time: 0.471 ms (8 rows) **With changes:** EXPLAIN ANALYZE SELECT * FROM datachunks WHERE (data->>'system'='693124b23742600022f30a64' and data->>'bucket'='693124b23742600022f30a6c' and (data->>'dedup_key'=ANY(ARRAY['YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=', 'YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=', ....(removed for readability).... 'zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE=']) and data ? 'dedup_key') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) ORDER BY _id DESC; Sort (cost=169.29..169.30 rows=1 width=1422) (actual time=0.433..0.437 rows=100 loops=1) Sort Key: _id DESC Sort Method: quicksort Memory: 167kB -> Index Scan using idx_btree_datachunks_dedup_key_system_bucket on datachunks (cost=0.55..169.28 rows=1 width=1422) (actual time=0.030..0.368 rows=100 loops=1) Index Cond: (((data ->> 'dedup_key'::text) = ANY ('{YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,wntv6uTo3 IhN1BwepJqGTe493C0pB88j4SMsB92UrvQ=,z8R8VzzFUaX+10H+JDzb/5qGh10IPUuGbVInOuuTz3Q=,FzlSOeTfPmi1upqzMe6ZfhlYa+V0MzB2nKQqPZ9pj9Q=, ....(removed for readability).... aO3VxeMSCz/041jhXpKEc+CPFTCO++Q9Q9WdaTF093k=,zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE=}'::text[])) AND ((data ->> 'system'::text) = '693124b23742600022f30a64'::text) AND ((data ->> 'bucket'::text) = '693124b23742600022f30a6c'::text)) Filter: (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb)) Planning Time: 0.119 ms Execution Time: 0.450 ms

jackyalbo · 2025-12-09T10:14:04Z

src/server/object_services/map_server.js

+            const dedup_keys = [];
+            chunks.forEach(chunk => {
+                if (chunk?.digest_b64) {
+                    dedup_keys.push(chunk.digest_b64);
+                }
+            });


Suggested change

const dedup_keys = [];

chunks.forEach(chunk => {

if (chunk?.digest_b64) {

dedup_keys.push(chunk.digest_b64);

}

});

const dedup_keys = chunks.map(chunk => chunk.digest_b64).filter(Boolean);

jackyalbo · 2025-12-09T10:20:14Z

src/server/object_services/md_store.js

-        await this.load_blocks_for_chunks(chunks);
-        return chunks;
+        const values = [];
+        let query = `SELECT * FROM ${this._chunks.name} WHERE (data ->> 'system' = $1 AND data ->> 'bucket' = $2`;


I'm thinking maybe working with @Neon-White format for this queries making them more readable. @dannyzaken WDYT?

noobaa-core/src/server/object_services/md_store.js

Lines 307 to 317 in dae43d2

const query = `

UPDATE ${table_name}

SET data = jsonb_set(data, '{deleted}', to_jsonb($1::text), true)

WHERE

${sql_and_conditions(

`data->>'bucket' = '${bucket_id}'`,

`data->>'deleted' IS NULL`,

`data->>'upload_started' IS NOT NULL`,

`(EXTRACT(EPOCH FROM NOW()) - ${convert_mongoid_to_timestamp_sql("data->>'upload_started'")}) / 86400 > ${days_after_initiation}`,

sql_condition0, sql_condition1, sql_condition2, sql_condition3,

)};`;

for example

Formatted the query as suggested.

Signed-off-by: Karthik P S <[email protected]>

dannyzaken · 2025-12-22T15:59:21Z

src/server/object_services/md_store.js

+            return chunks;
+        } catch (err) {
+            dbg.error('Error while finding chunks by dedup_key. error is ', err);
+            return [];


We should not catch all errors here and ignore them, as it can cause silent failures that are hidden in the logs. It's a good idea to log the error and rethrow it.

I'm rethrowing the error now rather than ignoring.

dannyzaken · 2025-12-22T16:07:55Z

src/test/integration_tests/internal/test_encryption.js


+// @ts-ignore
+// Do not run below tests if DB is not PostgreSQL
+if (config.DB_TYPE !== 'postgres') return;


@karthikps97 @liranmauda, why is this needed?

These tests are executed as part of both PostgreSQL tests and MongoDB tests. We are skipping this for the MongoDB run as it will fail since executeSQL is not implemented by mongo client.

dannyzaken · 2025-12-22T16:25:21Z

src/test/integration_tests/db/test_md_store.js

+            assert(chunksArr[0].frags[0]?._id?.toString() === chunk.frags[0]._id.toString());
+        });
+
+        mocha.it('find_chunks_by_dedup_key empty dedup_key array passed', async () => {


Why does this specific case need to be tested? We don't expect an empty array, IIRC

We return early from the function in case an empty dedup_keys array is passed. So, added this test case.

dannyzaken · 2025-12-22T16:41:32Z

src/test/integration_tests/db/test_md_store.js

            return md_store.delete_chunks_by_ids(_.map(chunks, '_id'));
        });

+        mocha.it('find_chunks_by_dedup_key()', async () => {


worth adding more tests for different cases. e.g.: test that we return an empty array when there is no match, etc.

Added one test for this scenario.

Signed-off-by: Karthik P S <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/server/object_services/md_store.js (1)
1560-1564: Fix incomplete optional chaining to prevent runtime errors.

Line 1562 uses incomplete optional chaining. If res or res.rows is undefined, calling .map() will throw a TypeError. This issue was flagged in a previous review but remains unresolved.
🔎 Proposed fix
-        const res = await this._chunks.executeSQL(query, values);
-        const chunks = res.rows.map(row => decode_json(this._chunks.schema, row.data));
-        await this.load_blocks_for_chunks(chunks);
-        return chunks;
+        const res = await this._chunks.executeSQL(query, values);
+        const chunks = res?.rows?.map(row => decode_json(this._chunks.schema, row.data)) ?? [];
+        await this.load_blocks_for_chunks(chunks);
+        return chunks;
This ensures chunks is always an array, even if the query returns no result or an unexpected response.

🧹 Nitpick comments (1)

src/server/object_services/md_store.js (1)
1547-1557: Consider adding a LIMIT clause to prevent unbounded result sets.

The query lacks a LIMIT clause, which could return a very large number of rows if many chunks match the dedup keys. This could lead to memory pressure or performance degradation.

Consider adding a reasonable limit (e.g., 1000 or 10000) or accepting a limit parameter if callers need control over the result set size.
🔎 Example with LIMIT clause
 const query = `
     SELECT * 
     FROM ${this._chunks.name} 
     WHERE 
         (data ->> 'system' = $1 
         AND data ->> 'bucket' = $2 
         AND (data ->> 'dedup_key' = ANY($3) 
         AND data ? 'dedup_key') 
         AND (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) 
-    ORDER BY _id DESC;
+    ORDER BY _id DESC
+    LIMIT 10000;
     `;

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 612408d and 8e11ad5.

📒 Files selected for processing (2)

src/server/object_services/md_store.js
src/test/integration_tests/db/test_md_store.js

🚧 Files skipped from review as they are similar to previous changes (1)

src/test/integration_tests/db/test_md_store.js

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

📚 Learning: 2025-08-08T13:12:46.728Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

src/server/object_services/md_store.js

📚 Learning: 2025-11-13T07:56:23.620Z

Learnt from: shirady
Repo: noobaa/noobaa-core PR: 9281
File: src/server/system_services/account_server.js:1053-1058
Timestamp: 2025-11-13T07:56:23.620Z
Learning: In noobaa-core, account_server.js is only used in containerized deployments, not in NSFS/NC deployments. NSFS/NC deployments have separate account management code in src/manage_nsfs/ directory. Therefore, account_server.js only processes accounts from account_schema.js where owner is an objectid reference, never from nsfs_account_schema.js where owner is a string.

Applied to files:

src/server/object_services/md_store.js

📚 Learning: 2025-11-19T15:03:42.260Z

Learnt from: shirady
Repo: noobaa/noobaa-core PR: 9291
File: src/server/common_services/auth_server.js:548-554
Timestamp: 2025-11-19T15:03:42.260Z
Learning: In src/server/common_services/auth_server.js, account objects are loaded directly from system_store (e.g., system_store.data.get_by_id()), so account.owner is an object ID reference with an ._id property, not a string. This differs from s3_rest.js where account.owner is a string due to RPC serialization.

Applied to files:

src/server/object_services/md_store.js

📚 Learning: 2025-08-08T13:13:42.192Z

Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

src/server/object_services/md_store.js

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: run-package-lock-validation
GitHub Check: Build Noobaa Image
GitHub Check: run-jest-unit-tests

🔇 Additional comments (2)

src/server/object_services/md_store.js (2)

1545-1545: LGTM on the early return!

The early return for empty or missing dedup_keys prevents unnecessary query execution and aligns with the feedback from past reviews.

1565-1568: Error handling looks appropriate.

Logging the error before rethrowing provides valuable debugging information while still propagating the error to callers, which aligns with feedback from previous reviews.

karthikps97 requested review from dannyzaken and jackyalbo November 27, 2025 08:43

karthikps97 self-assigned this Nov 27, 2025

karthikps97 added the size/M label Nov 27, 2025

pull-request-size bot added size/XXL and removed size/M labels Dec 2, 2025

karthikps97 marked this pull request as ready for review December 2, 2025 06:37

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

src/server/object_services/md_store.js Outdated Show resolved Hide resolved

Improve performance of find_chunks_by_dedup_key method

3eaf530

Signed-off-by: Karthik P S <[email protected]>

karthikps97 force-pushed the perf-improvements branch from 20920f6 to 3eaf530 Compare December 3, 2025 07:48

karthikps97 requested review from liranmauda and shirady December 3, 2025 07:48

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

src/server/object_services/md_store.js Show resolved Hide resolved

liranmauda requested changes Dec 8, 2025

View reviewed changes

minor code improvements

9295259

Signed-off-by: Karthik P S <[email protected]>

pull-request-size bot added size/L and removed size/XXL labels Dec 9, 2025

karthikps97 requested a review from liranmauda December 9, 2025 06:08

coderabbitai bot reviewed Dec 9, 2025

View reviewed changes

karthikps97 requested a review from alphaprinz December 9, 2025 06:19

jackyalbo reviewed Dec 9, 2025

View reviewed changes

karthikps97 removed request for alphaprinz, dannyzaken, liranmauda and shirady December 11, 2025 04:25

refactoring code

612408d

Signed-off-by: Karthik P S <[email protected]>

pull-request-size bot added size/M and removed size/L labels Dec 17, 2025

karthikps97 requested a review from jackyalbo December 17, 2025 07:04

karthikps97 requested a review from liranmauda December 17, 2025 07:04

liranmauda approved these changes Dec 17, 2025

View reviewed changes

karthikps97 requested review from dannyzaken and removed request for jackyalbo December 18, 2025 08:54

dannyzaken reviewed Dec 22, 2025

View reviewed changes

throw error from function

8e11ad5

Signed-off-by: Karthik P S <[email protected]>

pull-request-size bot added size/L and removed size/M labels Dec 23, 2025

coderabbitai bot reviewed Dec 23, 2025

View reviewed changes

	const query = `
	UPDATE ${table_name}
	SET data = jsonb_set(data, '{deleted}', to_jsonb($1::text), true)
	WHERE
	${sql_and_conditions(
	`data->>'bucket' = '${bucket_id}'`,
	`data->>'deleted' IS NULL`,
	`data->>'upload_started' IS NOT NULL`,
	`(EXTRACT(EPOCH FROM NOW()) - ${convert_mongoid_to_timestamp_sql("data->>'upload_started'")}) / 86400 > ${days_after_initiation}`,
	sql_condition0, sql_condition1, sql_condition2, sql_condition3,
	)};`;

Generate optimized SQL query to improve performance of find_chunks_by_dedup_key method #9318

Are you sure you want to change the base?

Generate optimized SQL query to improve performance of find_chunks_by_dedup_key method #9318

Uh oh!

Conversation

karthikps97 commented Nov 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the Problem

Explain the Changes

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liranmauda left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

karthikps97 Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

jackyalbo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karthikps97 Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackyalbo Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karthikps97 commented Nov 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 27, 2025 •

edited

Loading

coderabbitai bot Dec 9, 2025 •

edited

Loading

karthikps97 Dec 9, 2025 •

edited

Loading

karthikps97 Dec 11, 2025 •

edited

Loading

jackyalbo Dec 9, 2025 •

edited

Loading

karthikps97 Dec 23, 2025 •

edited

Loading