Skip to content

Conversation

@karthikps97
Copy link
Member

@karthikps97 karthikps97 commented Nov 27, 2025

Describe the Problem

The mongo to pg query converter uses IN operator for array. But in case of postgres, using ANY operator improves performance as the query size is reduced and it also allows for plan caching. The sort is also applied on '_id' JSON field of data column, but there is _id column already present (primary key).

Explain the Changes

  1. Refactored the find_chunks_by_dedup_key method to self generate SQL queries. Improvements are:
  • Using ANY instead of IN improves query planning time for larger array size. (Postgres internally converts IN to ANY, but it requires an additional parsing step)
  • Using _id instead of data._id for sorting improves query execution time as we are utilizing the primary key index.

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • Improvements

    • Chunk deduplication retrieval now uses a SQL-backed path for broader database support and improved resilience.
    • Collection objects expose schema metadata for easier inspection.
  • Tests

    • Added integration tests for deduplicated chunk lookup, non-existent-key and empty-key behavior (Postgres-only).
    • Encryption test suite now skips when not running against Postgres.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 27, 2025

Walkthrough

Switched MDStore chunk lookup from MongoDB to PostgreSQL JSONB queries, updated MapServer to pass base64 dedup keys (strings) instead of Buffers, added schema to the DBCollection TypeScript interface, and added PostgreSQL-gated integration tests for dedup lookup.

Changes

Cohort / File(s) Summary
Type definitions
src/sdk/nb.d.ts
Added public schema: any property to the DBCollection interface.
Deduplication key handling
src/server/object_services/map_server.js
GetMapping.find_dups now collects chunk.digest_b64 (base64 strings) into dedup_keys and filters falsy values; dedup key type moved from Buffer[] to string[].
Database implementation migration
src/server/object_services/md_store.js
Replaced MongoDB-based find_chunks_by_dedup_key with a PostgreSQL implementation: added decode_json import, constructs and executes SQL filtering by system, bucket, JSONB dedup_key membership and deleted IS NULL, decodes rows to chunk objects, calls load_blocks_for_chunks, and returns chunks; errors are logged and rethrown.
PostgreSQL-gated integration tests
src/test/integration_tests/db/test_md_store.js, src/test/integration_tests/internal/test_encryption.js
Added three PostgreSQL-only tests for find_chunks_by_dedup_key (success, nonexistent key, empty key array); added DB_TYPE guard to encryption tests to skip when not PostgreSQL.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Map as MapServer (GetMapping)
participant MD as MDStore
participant PG as PostgreSQL
participant Blocks as Block Loader
Map->>MD: find_chunks_by_dedup_key(bucket, [dedup_key_b64,...])
MD->>PG: execute SQL (filter by system, bucket, JSONB dedup_key IN list, deleted IS NULL)
PG-->>MD: rows (data JSONB)
MD->>MD: decode_json(row.data) → chunk objects
MD->>Blocks: load_blocks_for_chunks(chunks)
Blocks-->>MD: chunks with blocks
MD-->>Map: return chunks array

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus areas:
    • src/server/object_services/md_store.js: SQL construction/parameterization, JSONB membership condition, decode_json usage, parity with previous Mongo behavior, and error propagation.
    • src/server/object_services/map_server.js: callers' assumptions about dedup key types.
    • src/test/integration_tests/db/test_md_store.js: test setup, teardown, and PostgreSQL gating correctness.

Possibly related PRs

Suggested reviewers

  • dannyzaken
  • jackyalbo

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: refactoring find_chunks_by_dedup_key to use optimized SQL queries for better performance, which is the primary focus across md_store.js, map_server.js, and the related tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@karthikps97 karthikps97 marked this pull request as ready for review December 2, 2025 06:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/server/object_services/md_store.js (1)

1540-1543: Update JSDoc type to match actual parameter type.

The JSDoc indicates @param {nb.DBBuffer[]} dedup_keys but the implementation now expects base64 strings (as passed from map_server.js). Update the type to reflect the actual usage:

     /**
      * @param {nb.Bucket} bucket
-     * @param {nb.DBBuffer[]} dedup_keys
+     * @param {string[]} dedup_keys - base64 encoded dedup keys
      * @returns {Promise<nb.ChunkSchemaDB[]>}
      */
🧹 Nitpick comments (2)
src/server/object_services/map_server.js (1)

88-93: Consider using filter + map for a more functional approach.

The implementation is correct and aligns with the PostgreSQL path that expects base64 strings. The optional chaining (chunk?.digest_b64) provides null safety.

A more concise alternative using functional patterns:

-            const dedup_keys = [];
-            chunks.forEach(chunk => {
-                if (chunk?.digest_b64) {
-                    dedup_keys.push(chunk.digest_b64);
-                }
-            });
+            const dedup_keys = chunks
+                .map(chunk => chunk?.digest_b64)
+                .filter(Boolean);

This is optional and the current implementation works correctly.

src/test/integration_tests/db/test_md_store.js (1)

402-418: Good edge case coverage for empty dedup_key array.

This test ensures that passing an empty array returns an empty result without errors, which validates the FALSE AND data ? 'dedup_key' branch in the SQL query.

Consider adding a test case for chunks that don't have a dedup_key field to ensure they're properly excluded from results.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1382037 and 20920f6.

📒 Files selected for processing (4)
  • src/sdk/nb.d.ts (1 hunks)
  • src/server/object_services/map_server.js (1 hunks)
  • src/server/object_services/md_store.js (2 hunks)
  • src/test/integration_tests/db/test_md_store.js (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/integration_tests/db/test_md_store.js
🧠 Learnings (2)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

  • src/server/object_services/md_store.js
🧬 Code graph analysis (3)
src/server/object_services/md_store.js (1)
src/util/postgres_client.js (6)
  • require (13-13)
  • require (14-14)
  • require (26-26)
  • query (326-326)
  • query (331-331)
  • res (258-258)
src/test/integration_tests/db/test_md_store.js (1)
src/server/object_services/md_store.js (2)
  • config (29-29)
  • assert (8-8)
src/server/object_services/map_server.js (1)
src/test/integration_tests/db/test_md_store.js (2)
  • chunk (386-394)
  • chunk (405-413)
🔇 Additional comments (3)
src/sdk/nb.d.ts (1)

771-771: LGTM!

The addition of the schema property to the DBCollection interface is necessary to support the decode_json usage in find_chunks_by_dedup_key. The any type is consistent with the existing patterns in this interface.

src/test/integration_tests/db/test_md_store.js (1)

383-400: Test coverage looks good for the PostgreSQL path.

The test correctly validates that:

  1. The result is an array
  2. At least one chunk is returned
  3. The frag ID matches the inserted chunk

The bucket mock structure { _id, system: { _id } } aligns with the expected parameters in find_chunks_by_dedup_key.

src/server/object_services/md_store.js (1)

1545-1567: SQL query implementation looks correct with parameterized queries.

The implementation:

  • Uses ANY($3) for array membership, which is more efficient than IN for PostgreSQL
  • Uses parameterized queries preventing SQL injection
  • Properly handles the empty dedup_keys array case with FALSE AND data ? 'dedup_key'
  • Sorts by _id DESC to utilize the primary key index

Two observations:

  1. Silent error handling: Returning an empty array on error could mask legitimate issues. Consider logging at a higher severity or re-throwing certain errors:
} catch (err) {
    dbg.error('Error while finding chunks by dedup_key. error is ', err);
    // Consider: throw err; or at least return based on error type
    return [];
}
  1. Null handling: Line 1556 checks data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb. This handles both missing keys and JSON null values, which is appropriate for JSONB columns.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/server/object_services/md_store.js (1)

1539-1543: Update JSDoc to reflect the changed parameter type.

The JSDoc declares dedup_keys as nb.DBBuffer[], but the PR changes this to string[] (base64-encoded). The implementation at line 1551 passes dedup_keys directly to a SQL query expecting string values for the ANY operator.

Update the JSDoc to match the actual signature:

 /**
  * @param {nb.Bucket} bucket
- * @param {nb.DBBuffer[]} dedup_keys
+ * @param {string[]} dedup_keys - Base64-encoded deduplication keys
  * @returns {Promise<nb.ChunkSchemaDB[]>}
  */
♻️ Duplicate comments (1)
src/server/object_services/md_store.js (1)

16-16: Import path convention.

The import path includes the .js extension. Per Node.js conventions, the extension can be omitted:

-const { decode_json } = require('../../util/postgres_client.js');
+const { decode_json } = require('../../util/postgres_client');
🧹 Nitpick comments (1)
src/sdk/nb.d.ts (1)

773-773: Schema field addition looks good.

The new schema: any field enables PostgreSQL JSON decoding in md_store.js. While any provides no type safety, it's a pragmatic choice for now.

For future improvement, consider typing this field more specifically, perhaps as a generic parameter based on the collection's schema type.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20920f6 and 3eaf530.

📒 Files selected for processing (4)
  • src/sdk/nb.d.ts (1 hunks)
  • src/server/object_services/map_server.js (1 hunks)
  • src/server/object_services/md_store.js (2 hunks)
  • src/test/integration_tests/db/test_md_store.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/test/integration_tests/db/test_md_store.js
  • src/server/object_services/map_server.js
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

  • src/server/object_services/md_store.js
🧬 Code graph analysis (1)
src/server/object_services/md_store.js (1)
src/util/postgres_client.js (3)
  • query (326-326)
  • query (331-331)
  • res (258-258)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: run-package-lock-validation
🔇 Additional comments (2)
src/server/object_services/md_store.js (2)

1545-1556: Query construction is well-implemented.

The SQL query correctly:

  • Uses parameterized queries ($1, $2, $3) to prevent SQL injection
  • Employs the ANY operator for array comparison (per PR objectives)
  • Sorts by the _id primary key column to leverage indexing (per PR objectives)
  • Follows patterns consistent with other queries in the codebase

These changes align with the stated performance goals of reducing query size and improving execution time.


1549-1554: Verify the dedup_keys parameter is properly formatted as an array for the ANY() operator.

The code uses PostgreSQL's ANY($3) operator, which requires the bound parameter to be a proper array type. When using node-postgres, the parameter must be passed as an array object. Ensure that dedup_keys is consistently an array before being pushed to values at line 1551—this includes edge cases where dedup_keys might be an empty array (which the code already handles) or single-item arrays. The JSONB operators (data ? 'dedup_key' and data ->>) are standard PostgreSQL syntax and should work correctly.

Copy link
Contributor

@liranmauda liranmauda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On test_encription.js Is there any test that should run?
If not, we can do in line 33 (so you will not have to wrap the whole test.
Also we could add a comment for it that explain thta this test should not run if the DB is not SQL

if (config.DB_TYPE !== 'postgres') return;

Signed-off-by: Karthik P S <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3eaf530 and 9295259.

📒 Files selected for processing (2)
  • src/server/object_services/md_store.js (2 hunks)
  • src/test/integration_tests/internal/test_encryption.js (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/server/object_services/md_store.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/integration_tests/internal/test_encryption.js
🧠 Learnings (2)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
📚 Learning: 2025-11-19T15:03:42.260Z
Learnt from: shirady
Repo: noobaa/noobaa-core PR: 9291
File: src/server/common_services/auth_server.js:548-554
Timestamp: 2025-11-19T15:03:42.260Z
Learning: In src/server/common_services/auth_server.js, account objects are loaded directly from system_store (e.g., system_store.data.get_by_id()), so account.owner is an object ID reference with an ._id property, not a string. This differs from s3_rest.js where account.owner is a string due to RPC serialization.

Applied to files:

  • src/test/integration_tests/internal/test_encryption.js
🧬 Code graph analysis (1)
src/test/integration_tests/internal/test_encryption.js (3)
src/server/object_services/md_store.js (1)
  • config (29-29)
src/test/integration_tests/db/test_md_store.js (3)
  • config (12-12)
  • mocha (10-10)
  • i (202-202)
src/server/object_services/map_server.js (1)
  • config (11-11)
🪛 Biome (2.1.2)
src/test/integration_tests/internal/test_encryption.js

[error] 36-36: Illegal return statement outside of a function

(parse)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (1)
src/test/integration_tests/internal/test_encryption.js (1)

138-164: Account lifecycle and minor test tweaks look consistent

The changes around:

  • Creating 20 accounts and validating their master keys and access-key encryption (create accounts and compare account access keys succefully).
  • Using those accounts across external-connection / namespace-resource tests.
  • Regenerating creds for all non-coretest accounts.
  • Deleting all those accounts at the end of the suite.
  • Minor formatting/wrapping adjustments (e.g., pool_by_name(..., namespace_resource_name) and the disable_master_key calls).

are consistent with the existing test patterns and help ensure we don’t leak test accounts/resources into later suites. With the DB-type guard fixed, this flow should give good coverage of account/master-key behavior on PostgreSQL without introducing new functional risk.

No changes needed here from my side.

Also applies to: 220-220, 455-471, 473-478, 594-595, 617-618

Comment on lines +34 to +37
// @ts-ignore
// Do not run below tests if DB is not PostgreSQL
if (config.DB_TYPE !== 'postgres') return;

Copy link

@coderabbitai coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix illegal top‑level return and use Mocha‑level skipping instead

The guard:

// @ts-ignore
// Do not run below tests if DB is not PostgreSQL
if (config.DB_TYPE !== 'postgres') return;

is at file scope, which is a syntax error in Node/JS (return is only allowed inside functions). This matches the Biome parse error on Line 36. As written, this file will fail to parse, so the tests will never run.

Suggest moving the DB-type guard into Mocha context and removing the // @ts-ignore, for example:

 config.MIN_CHUNK_AGE_FOR_DEDUP = 0;
-
-// @ts-ignore
-// Do not run below tests if DB is not PostgreSQL
-if (config.DB_TYPE !== 'postgres') return;
+// Run these tests only when DB is PostgreSQL
+const IS_POSTGRES = config.DB_TYPE === 'postgres';
@@
-mocha.describe('Encryption tests', function() {
+mocha.describe('Encryption tests', function() {
+    mocha.before(function() {
+        if (!IS_POSTGRES) this.skip();
+    });
@@
-mocha.describe('Rotation tests', function() {
+mocha.describe('Rotation tests', function() {
+    mocha.before(function() {
+        if (!IS_POSTGRES) this.skip();
+    });

This keeps the suite PostgreSQL-only, avoids top-level control flow, and gives Mocha proper “skipped” semantics instead of a parse error. You can then drop the // @ts-ignore entirely.

🧰 Tools
🪛 Biome (2.1.2)

[error] 36-36: Illegal return statement outside of a function

(parse)

🤖 Prompt for AI Agents
In src/test/integration_tests/internal/test_encryption.js around lines 34–37,
remove the top-level "if (config.DB_TYPE !== 'postgres') return;" and the "//
@ts-ignore" (top-level return is illegal and causes parse errors); instead, keep
the test file but move the DB-type guard into the Mocha suite by placing a
before() hook (or the first statement inside the top-level describe) that checks
config.DB_TYPE !== 'postgres' and calls this.skip() (or uses the suite-level
skip mechanism) so the suite is properly skipped under Mocha when not running
against PostgreSQL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a psedo-code
the actuall code should be something like:

mocha.describe('Encryption tests', function() {
  before(function () {
    if (config.DB_TYPE !== 'postgres') {
      this.skip(); 
    }
   ...
  });

On any top level describe.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liranmauda, thanks for the clarification! I understand now—you want to use a before hook inside each top-level describe to skip the tests when the DB is not PostgreSQL.

Here's how to apply this pattern to both top-level describes in the file:

config.MIN_CHUNK_AGE_FOR_DEDUP = 0;

mocha.describe('Encryption tests', function() {
    mocha.before(function() {
        if (config.DB_TYPE !== 'postgres') {
            this.skip();
        }
    });

    const { rpc_client, EMAIL, SYSTEM } = coretest;
    let response_account;
    // ... rest of the tests
});

// ... (other code)

mocha.describe('Rotation tests', function() {
    mocha.before(function() {
        if (config.DB_TYPE !== 'postgres') {
            this.skip();
        }
    });

    const { rpc_client, EMAIL, SYSTEM } = coretest;
    let accounts;
    let buckets;
    // ... rest of the tests
});

This removes the illegal top-level return statement (and the // @ts-ignore) and properly skips both test suites when not running against PostgreSQL. Each suite will show as "skipped" in the test output rather than causing a parse error.

Copy link
Member Author

@karthikps97 karthikps97 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping the test using before hook would still run all the "after" hooks. This would throw errors specifically in the unpopulate_system function call as the initilaization would not have taken place. I would have to repeat the same skip statements in each 'after' hook. Wouldn't a single return or conditional if be better?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

@jackyalbo jackyalbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you share a comparison of before and after to this PR description?

if (dedup_keys.length) {
query += ` AND (data ->> 'dedup_key' = ANY($3) AND data ? 'dedup_key')`;
values.push(dedup_keys);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we suppose to have empty dedup keys.

if (!dedup_keys.length) return;

we can avoid running the query in this case. just return in the beginning.

Copy link
Member Author

@karthikps97 karthikps97 Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result: Tested with 1M+ rows in datachunks, with the query filtering 100 dedup_keys.

nbcore=# select count(*) from datachunks
  count  

---------
 1088000

**Current query:**

EXPLAIN ANALYZE SELECT * FROM datachunks WHERE (data->>'system'='693124b23742600022f30a64' and data->>'bucket'='693124b23742600022f30a6c' and (data->>'dedup_key' IN ('YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=', 
'YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=',
....(removed for readability)....
'zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE=') and data ? 'dedup_key') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) ORDER BY data->>'_id' DESC;

Sort  (cost=169.30..169.30 rows=1 width=1454) (actual time=0.448..0.452 rows=100 loops=1)
   Sort Key: ((data ->> '_id'::text)) DESC
   Sort Method: quicksort  Memory: 169kB
   ->  Index Scan using idx_btree_datachunks_dedup_key_system_bucket on datachunks  (cost=0.55..169.29 rows=1 width=1454) (actual time=0.030..0.375 rows=100 loops=1)
         Index Cond: (((data ->> 'dedup_key'::text) = ANY ('{YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,wntv6uTo3IhN1BwepJqGTe493C0pB88j4SMsB92UrvQ=,
....(removed for readability)....
aO3VxeMSCz/041jhXpKEc+CPFTCO++Q9Q9WdaTF093k=,zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE=}'::text[])) AND ((data ->> 'system'::text) = '693124b23742600022f30a64'::text) AND ((data ->> 'bucket'::text) = '693124b23742600022f30a6c'::text))
         Filter: (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb))
 Planning Time: 0.125 ms
 Execution Time: 0.471 ms
(8 rows)


**With changes:**

EXPLAIN ANALYZE SELECT * FROM datachunks WHERE (data->>'system'='693124b23742600022f30a64' and data->>'bucket'='693124b23742600022f30a6c' and (data->>'dedup_key'=ANY(ARRAY['YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=', 
'YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=',
....(removed for readability)....
'zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE=']) 
and data ? 'dedup_key') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) ORDER BY _id DESC;


Sort  (cost=169.29..169.30 rows=1 width=1422) (actual time=0.433..0.437 rows=100 loops=1)
   Sort Key: _id DESC
   Sort Method: quicksort  Memory: 167kB
   ->  Index Scan using idx_btree_datachunks_dedup_key_system_bucket on datachunks  (cost=0.55..169.28 rows=1 width=1422) (actual time=0.030..0.368 rows=100 loops=1)
         Index Cond: (((data ->> 'dedup_key'::text) = ANY ('{YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,YLE95BklbH7n0c5/0i4OlEn2v6QPsweb8IcZf8BmXHc=,wntv6uTo3
IhN1BwepJqGTe493C0pB88j4SMsB92UrvQ=,z8R8VzzFUaX+10H+JDzb/5qGh10IPUuGbVInOuuTz3Q=,FzlSOeTfPmi1upqzMe6ZfhlYa+V0MzB2nKQqPZ9pj9Q=,
....(removed for readability)....
aO3VxeMSCz/041jhXpKEc+CPFTCO++Q9Q9WdaTF093k=,zsViCcfiavZaRGB4G5fbGq6DzPccEIprV8JbESaB8zE=}'::text[])) AND ((data ->> 'system'::text) = '693124b23742600022f30a64'::text) AND ((data ->> 'bucket'::text) = '693124b23742600022f30a6c'::text))
         Filter: (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb))
 Planning Time: 0.119 ms
 Execution Time: 0.450 ms

Comment on lines 88 to 93
const dedup_keys = [];
chunks.forEach(chunk => {
if (chunk?.digest_b64) {
dedup_keys.push(chunk.digest_b64);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const dedup_keys = [];
chunks.forEach(chunk => {
if (chunk?.digest_b64) {
dedup_keys.push(chunk.digest_b64);
}
});
const dedup_keys = chunks.map(chunk => chunk.digest_b64).filter(Boolean);

await this.load_blocks_for_chunks(chunks);
return chunks;
const values = [];
let query = `SELECT * FROM ${this._chunks.name} WHERE (data ->> 'system' = $1 AND data ->> 'bucket' = $2`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking maybe working with @Neon-White format for this queries making them more readable. @dannyzaken WDYT?

Copy link
Contributor

@jackyalbo jackyalbo Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const query = `
UPDATE ${table_name}
SET data = jsonb_set(data, '{deleted}', to_jsonb($1::text), true)
WHERE
${sql_and_conditions(
`data->>'bucket' = '${bucket_id}'`,
`data->>'deleted' IS NULL`,
`data->>'upload_started' IS NOT NULL`,
`(EXTRACT(EPOCH FROM NOW()) - ${convert_mongoid_to_timestamp_sql("data->>'upload_started'")}) / 86400 > ${days_after_initiation}`,
sql_condition0, sql_condition1, sql_condition2, sql_condition3,
)};`;

for example

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatted the query as suggested.

Signed-off-by: Karthik P S <[email protected]>
@karthikps97 karthikps97 requested review from dannyzaken and removed request for jackyalbo December 18, 2025 08:54
return chunks;
} catch (err) {
dbg.error('Error while finding chunks by dedup_key. error is ', err);
return [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not catch all errors here and ignore them, as it can cause silent failures that are hidden in the logs. It's a good idea to log the error and rethrow it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm rethrowing the error now rather than ignoring.


// @ts-ignore
// Do not run below tests if DB is not PostgreSQL
if (config.DB_TYPE !== 'postgres') return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karthikps97 @liranmauda, why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are executed as part of both PostgreSQL tests and MongoDB tests. We are skipping this for the MongoDB run as it will fail since executeSQL is not implemented by mongo client.

assert(chunksArr[0].frags[0]?._id?.toString() === chunk.frags[0]._id.toString());
});

mocha.it('find_chunks_by_dedup_key empty dedup_key array passed', async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this specific case need to be tested? We don't expect an empty array, IIRC

Copy link
Member Author

@karthikps97 karthikps97 Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return early from the function in case an empty dedup_keys array is passed. So, added this test case.

return md_store.delete_chunks_by_ids(_.map(chunks, '_id'));
});

mocha.it('find_chunks_by_dedup_key()', async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth adding more tests for different cases. e.g.: test that we return an empty array when there is no match, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one test for this scenario.

Signed-off-by: Karthik P S <[email protected]>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Dec 23, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/server/object_services/md_store.js (1)

1560-1564: Fix incomplete optional chaining to prevent runtime errors.

Line 1562 uses incomplete optional chaining. If res or res.rows is undefined, calling .map() will throw a TypeError. This issue was flagged in a previous review but remains unresolved.

🔎 Proposed fix
-        const res = await this._chunks.executeSQL(query, values);
-        const chunks = res.rows.map(row => decode_json(this._chunks.schema, row.data));
-        await this.load_blocks_for_chunks(chunks);
-        return chunks;
+        const res = await this._chunks.executeSQL(query, values);
+        const chunks = res?.rows?.map(row => decode_json(this._chunks.schema, row.data)) ?? [];
+        await this.load_blocks_for_chunks(chunks);
+        return chunks;

This ensures chunks is always an array, even if the query returns no result or an unexpected response.

🧹 Nitpick comments (1)
src/server/object_services/md_store.js (1)

1547-1557: Consider adding a LIMIT clause to prevent unbounded result sets.

The query lacks a LIMIT clause, which could return a very large number of rows if many chunks match the dedup keys. This could lead to memory pressure or performance degradation.

Consider adding a reasonable limit (e.g., 1000 or 10000) or accepting a limit parameter if callers need control over the result set size.

🔎 Example with LIMIT clause
 const query = `
     SELECT * 
     FROM ${this._chunks.name} 
     WHERE 
         (data ->> 'system' = $1 
         AND data ->> 'bucket' = $2 
         AND (data ->> 'dedup_key' = ANY($3) 
         AND data ? 'dedup_key') 
         AND (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb)) 
-    ORDER BY _id DESC;
+    ORDER BY _id DESC
+    LIMIT 10000;
     `;
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 612408d and 8e11ad5.

📒 Files selected for processing (2)
  • src/server/object_services/md_store.js
  • src/test/integration_tests/db/test_md_store.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/test/integration_tests/db/test_md_store.js
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

  • src/server/object_services/md_store.js
📚 Learning: 2025-11-13T07:56:23.620Z
Learnt from: shirady
Repo: noobaa/noobaa-core PR: 9281
File: src/server/system_services/account_server.js:1053-1058
Timestamp: 2025-11-13T07:56:23.620Z
Learning: In noobaa-core, account_server.js is only used in containerized deployments, not in NSFS/NC deployments. NSFS/NC deployments have separate account management code in src/manage_nsfs/ directory. Therefore, account_server.js only processes accounts from account_schema.js where owner is an objectid reference, never from nsfs_account_schema.js where owner is a string.

Applied to files:

  • src/server/object_services/md_store.js
📚 Learning: 2025-11-19T15:03:42.260Z
Learnt from: shirady
Repo: noobaa/noobaa-core PR: 9291
File: src/server/common_services/auth_server.js:548-554
Timestamp: 2025-11-19T15:03:42.260Z
Learning: In src/server/common_services/auth_server.js, account objects are loaded directly from system_store (e.g., system_store.data.get_by_id()), so account.owner is an object ID reference with an ._id property, not a string. This differs from s3_rest.js where account.owner is a string due to RPC serialization.

Applied to files:

  • src/server/object_services/md_store.js
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/server/object_services/md_store.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-package-lock-validation
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (2)
src/server/object_services/md_store.js (2)

1545-1545: LGTM on the early return!

The early return for empty or missing dedup_keys prevents unnecessary query execution and aligns with the feedback from past reviews.


1565-1568: Error handling looks appropriate.

Logging the error before rethrowing provides valuable debugging information while still propagating the error to callers, which aligns with feedback from previous reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants