Skip to content

Conversation

@guymguym
Copy link
Member

@guymguym guymguym commented Jul 23, 2025

Describe the Problem

This PR refactors NSFS read/write flows into file_reader and file_writer modules to prepare for more advanced IO paths such as RDMA.

Explain the Changes

  1. The read loop logic in NamespaceFS.read_object_stream() was refactored to FileReader.read_into_stream.
  2. FileWriter was slightly refactored to expose a non Stream API too.
  3. stream_utils.pipeline can now pass an abort signal too, and added some duck typing checks.
  4. stream_utils.wait_finished was removed as stream.promises.finished is standard.

Issues: Fixed #xxx / Gap #xxx

  1. NA

Testing Instructions:

  1. Added unit tests test_file_reader/writer.
  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • New Features

    • New NSFS config options for upload/download memory thresholds and bucket-path existence checks; added option to ignore an additional list-entry error.
  • Bug Fixes

    • Renamed a configuration constant for consistency and improved abort/error handling during streaming.
  • Refactor

    • Streaming subsystem reworked to use reader/writer abstractions and native stream primitives for more reliable, cancellable I/O.
  • Tests

    • Added comprehensive FileReader unit tests covering streaming, buffered reads, ranges, and abort cases.
  • Documentation

    • Docs updated to document the new NSFS_CHECK_BUCKET_PATH_EXISTS option.

@coderabbitai
Copy link

coderabbitai bot commented Jul 23, 2025

Walkthrough

Adds FileReader and FileWriter refactors, new native_fs_utils helpers (use_file, is_sparse_file, warmup_sparse_file), updates NSFS config flags, replaces promisified stream helpers with stream.promises APIs, and adds unit tests for FileReader. Adjusts callsites to use new streaming abstractions and tighter abort/error handling.

Changes

Cohort / File(s) Change Summary
Config
config.js
Added NSFS_CHECK_BUCKET_PATH_EXISTS = true; renamed NFSF_UPLOAD_STREAM_MEM_THRESHOLDNSFS_UPLOAD_STREAM_MEM_THRESHOLD; added NSFS_DOWNLOAD_STREAM_MEM_THRESHOLD = 8 * 1024 * 1024; added NSFS_LIST_IGNORE_ENTRY_ON_EINVAL = true.
NamespaceFS & SDK
src/sdk/namespace_fs.js, src/sdk/object_io.js
Replaced manual buffer/pipeline code with FileReader/FileWriter usage; wired abort signaling; switched wait logic to stream.promises.finished; minor logging/format adjustments.
FileReader & tests
src/util/file_reader.js, src/test/unit_tests/internal/test_file_reader.test.js
New FileReader class (extends stream.Readable) with read_into_stream, buffer-pool support, backpressure handling, sparse-file warmup and stats; added unit tests validating full and ranged reads.
FileWriter & hashing
src/util/file_writer.js, src/tools/file_writer_hashing.js
FileWriter refactor: constructor signature changed (removed large_buf_size, renamed NSFS threshold), added write_entire_stream, write_buffers, finalize; hashing tool now uses write_entire_stream; mkdir now uses { recursive: true }.
Native FS utilities
src/util/native_fs_utils.js
Added use_file, is_sparse_file, warmup_sparse_file; standardized error creation via error_utils; exported the new helpers.
Stream utilities & callsites
src/util/stream_utils.js, src/test/unrelated/stream_pipe_to_multiple_targets.js, src/test/unrelated/for_await_stream.js
Removed wait_finished export and util.promisify usage; pipeline signature changed to pipeline(streams, reuse_last_stream = false, options = {}) and now uses stream.promises.pipeline; callsites replaced with stream.promises.finished.
Bucketspace FS
src/sdk/bucketspace_fs.js
Increased debug log level when logging bucket name (log0 → log1); no functional change.
Other tests/tools
src/test/unrelated/*, src/tools/*
Replaced promisified finished/wait helpers with stream.promises.finished; removed unused util import and wait_finished aliases.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant NamespaceFS
    participant FileReader
    participant FileWriter
    participant NativeFS

    Client->>NamespaceFS: GET object (read_object_stream)
    NamespaceFS->>FileReader: create reader(path, range, pool, signal)
    FileReader->>NativeFS: open/read (warmup_sparse_file if sparse)
    FileReader-->>NamespaceFS: push chunks (honor backpressure)
    NamespaceFS->>Client: pipe/forward stream (await stream.promises.finished)

    alt Upload flow
      Client->>NamespaceFS: PUT object (upload_stream)
      NamespaceFS->>FileWriter: create writer(target_file, opts)
      Client->>FileWriter: pipe source stream
      FileWriter->>NativeFS: write_entire_stream -> write_buffers -> native write
      FileWriter-->>NamespaceFS: finalize / error
      NamespaceFS->>Client: ack / error
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus areas:
    • src/util/file_reader.js — buffer-pool correctness, backpressure, abort semantics, stats updates.
    • src/util/file_writer.js & src/tools/file_writer_hashing.js — write_entire_stream, batching/_write_all_buffers, MD5 update/finalize, offset handling.
    • src/sdk/namespace_fs.js — integration with reader/writer and abort/retry interactions.
    • src/util/stream_utils.js — pipeline signature change and callsite impacts.

Suggested reviewers

  • tangledbytes
  • jackyalbo

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and specifically describes the main refactoring effort: extracting and improving file_reader and file_writer modules from NamespaceFS.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
src/tools/file_writer_hashing.js (1)

74-74: Consider adding a proper interface for target_file

While the /**@type {any}*/ cast works, it bypasses type safety. Consider defining an interface that both nb.NativeFile and TargetHash can implement to maintain type safety.

src/util/file_reader.js (1)

147-148: Consider supporting abort signal in buffer pool

The TODO comment correctly identifies that the buffer pool and semaphore should support abort signals to avoid blocking on aborted requests.

Would you like me to open an issue to track adding abort signal support to the buffer pool implementation?

src/util/native_fs_utils.js (2)

82-82: Minor style inconsistency

The for loop style was changed from for (;;) to for (; ;). While both are valid, this seems like an unnecessary cosmetic change that doesn't improve readability.

Consider keeping the original for (;;) style for infinite loops as it's more common and concise.

Also applies to: 169-169, 527-527


823-825: Consider buffer size flexibility

The warmup buffer is hardcoded to 4096 bytes. Consider making this configurable or at least defining it as a constant.

+const WARMUP_BUFFER_SIZE = 4096;
+
 async function warmup_sparse_file(fs_context, file, file_path, stat, pos) {
     dbg.log0('warmup_sparse_file', {
         file_path, pos, size: stat.size, blocks: stat.blocks,
     });
     if (!warmup_buffer) {
-        warmup_buffer = nb_native().fs.dio_buffer_alloc(4096);
+        warmup_buffer = nb_native().fs.dio_buffer_alloc(WARMUP_BUFFER_SIZE);
     }
     await file.read(fs_context, warmup_buffer, 0, 1, pos);
 }
src/sdk/namespace_fs.js (2)

2898-2899: Consider using optional chaining for cleaner code.

The current null check works correctly, but optional chaining would be more concise and idiomatic:

-        return (stat && stat.xattr[XATTR_VERSION_ID]) || 'null';
+        return stat?.xattr?.[XATTR_VERSION_ID] || 'null';

2626-2627: Valid performance optimization opportunity identified.

The TODO comment correctly identifies a potential optimization. The stat call in _load_bucket could potentially be avoided or cached in certain scenarios, which would reduce filesystem operations and improve performance.

Would you like me to analyze the usage patterns of _load_bucket and suggest specific optimization strategies?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24e11b7 and 21420b0.

📒 Files selected for processing (7)
  • config.js (1 hunks)
  • src/sdk/namespace_fs.js (32 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/tools/file_writer_hashing.js (2 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (8 hunks)
  • src/util/native_fs_utils.js (8 hunks)
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit Configuration File

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unit_tests/internal/test_file_reader.test.js
🧬 Code Graph Analysis (2)
src/tools/file_writer_hashing.js (1)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
src/util/file_writer.js (4)
src/util/native_fs_utils.js (2)
  • dbg (4-4)
  • config (10-10)
config.js (2)
  • require (17-17)
  • config (7-7)
src/util/file_reader.js (2)
  • config (5-5)
  • stream (4-4)
src/tools/file_writer_hashing.js (2)
  • config (7-7)
  • stream (10-10)
🪛 Biome (1.9.4)
src/test/unit_tests/internal/test_file_reader.test.js

[error] 1-2: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)

src/sdk/namespace_fs.js

[error] 2898-2898: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit Configuration File

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unit_tests/internal/test_file_reader.test.js
🧬 Code Graph Analysis (2)
src/tools/file_writer_hashing.js (1)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
src/util/file_writer.js (4)
src/util/native_fs_utils.js (2)
  • dbg (4-4)
  • config (10-10)
config.js (2)
  • require (17-17)
  • config (7-7)
src/util/file_reader.js (2)
  • config (5-5)
  • stream (4-4)
src/tools/file_writer_hashing.js (2)
  • config (7-7)
  • stream (10-10)
🪛 Biome (1.9.4)
src/test/unit_tests/internal/test_file_reader.test.js

[error] 1-2: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)

src/sdk/namespace_fs.js

[error] 2898-2898: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (28)
config.js (2)

988-988: Good typo fix!

Renaming NFSF_UPLOAD_STREAM_MEM_THRESHOLD to NSFS_UPLOAD_STREAM_MEM_THRESHOLD improves consistency with the NSFS naming convention used throughout the codebase.


989-989: LGTM!

Adding NSFS_DOWNLOAD_STREAM_MEM_THRESHOLD with the same 8 MB value as the upload threshold provides symmetry for streaming operations.

src/tools/file_writer_hashing.js (2)

78-78: Nice simplification!

Using write_entire_stream method eliminates the need for manual stream pipeline management and improves code readability.


115-115: Consistent refactoring

Good to see the same simplification pattern applied in both hash_target and file_target functions.

src/test/unit_tests/internal/test_file_reader.test.js (3)

1-2: Keep the 'use strict' directive

While modules are automatically in strict mode, keeping 'use strict' in test files is a common practice for consistency and clarity.


14-133: Excellent test coverage!

The test suite comprehensively covers:

  • Multiple files in the test directory
  • Various edge cases (start/end boundaries)
  • Both stream interfaces (Readable and read_into_stream)
  • Comparison with Node.js native fs.createReadStream for correctness verification
  • Proper resource cleanup with file closing

The test structure using describe_read_cases helper is clean and maintainable.


83-86: Good correctness verification

Comparing FileReader output with Node.js native fs.createReadStream ensures the implementation matches expected behavior for the same byte ranges.

src/util/file_reader.js (4)

12-25: Excellent documentation and design rationale!

The class documentation clearly explains the performance motivation for the dual interface approach (standard Readable vs optimized read_into_stream with buffer pooling).


132-134: Good defensive programming

Preventing Transform streams from being used with read_into_stream is important since they might retain buffer references after the write callback, which would break the buffer pool recycling.


217-222: Smart optimization for sparse files

The sparse file warming logic helps avoid holding large buffers while waiting for high-latency reads from archived or remote storage.


199-212: Comprehensive statistics collection

The histogram of buffer sizes using log2 scale is a clever way to track buffer usage patterns efficiently.

src/util/native_fs_utils.js (3)

102-150: Excellent resource management pattern!

The use_file function provides a clean, safe pattern for file operations with:

  • Guaranteed file closure in the finally block
  • Detailed error logging at each stage
  • Generic return type support via template

This pattern helps prevent file descriptor leaks.


409-409: Good error standardization

Using error_utils.new_error_code to create standardized errors with proper error codes improves error handling consistency.


789-827: Well-documented sparse file utilities

The is_sparse_file and warmup_sparse_file functions are well-implemented with:

  • Clear documentation explaining various sparse file scenarios
  • Lazy allocation of the warmup buffer
  • Single-byte read optimization to trigger data recall

These utilities effectively support the FileReader's sparse file optimization.

src/util/file_writer.js (9)

7-7: LGTM: Import path correction.

The debug module import path has been correctly updated from a relative path to the current directory, which aligns with the typical project structure.


17-27: LGTM: Improved constructor with better type annotations.

The constructor improvements are well-designed:

  • Specific native types (nb.NativeFile, nb.NativeFSContext) provide better type safety
  • Optional parameters are properly marked
  • Removed unused large_buf_size parameter reduces complexity
  • Corrected config constant NSFS_UPLOAD_STREAM_MEM_THRESHOLD aligns with the renamed constant

The parameter reordering and type refinements enhance code maintainability.


41-48: LGTM: Clean stream handling abstraction.

The write_entire_stream method provides a clean abstraction for writing entire streams:

  • Uses stream.promises.pipeline for proper error handling and backpressure
  • stream.promises.finished ensures the stream is fully processed
  • Proper async/await pattern with optional abort signal support

This simplifies stream handling compared to manual pipeline management.


50-62: LGTM: Efficient concurrent buffer processing.

The write_buffers method effectively parallelizes MD5 calculation and file writing:

  • Uses Promise.all to run MD5 update and buffer writing concurrently
  • Conditional MD5 processing only when enabled
  • Proper separation of concerns with stats update

This design improves performance by avoiding sequential processing of independent operations.


64-72: LGTM: Clean MD5 finalization.

The finalize method properly handles MD5 digest calculation:

  • Conditional execution only when MD5 is enabled
  • Proper async/await pattern
  • Stores hex string format for consistent usage

The implementation is clean and follows expected patterns.


104-105: LGTM: Improved documentation clarity.

The comment updates better describe the IOV_MAX batching behavior, making the implementation intent clearer for future maintainers.


125-126: LGTM: Enhanced method documentation.

The updated documentation clearly describes the offset and total bytes tracking behavior, improving code readability.


138-149: LGTM: Improved code reuse in _writev.

The refactoring to use the new write_buffers method eliminates code duplication and centralizes the buffer writing logic. This improves maintainability and ensures consistent behavior across different entry points.


158-163: LGTM: Simplified _final implementation.

The refactoring to use the new finalize method reduces code duplication and provides a cleaner separation of concerns. The stream interface methods now delegate to the more focused public methods.

src/sdk/namespace_fs.js (5)

11-11: Good addition of stream module import for proper stream handling.

The addition of the stream module import is necessary for using stream.promises.finished in the refactored code.


1026-1144: Excellent refactoring to use FileReader for stream handling.

The refactoring of read_object_stream to use FileReader brings several improvements:

  • Cleaner separation of concerns with buffer management delegated to FileReader
  • Proper abort signal handling throughout the operation
  • Better stream completion detection using stream.promises.finished
  • Improved statistics tracking with detailed metrics

The implementation correctly handles edge cases like empty directory content and maintains backward compatibility.


965-965: Well-implemented abort signal handling throughout the codebase.

The addition of object_sdk.throw_if_aborted() checks at strategic points ensures operations can be cancelled promptly:

  • Before and after retry attempts
  • Before file operations
  • Before stream completion waits

The abort signal is also properly propagated to FileReader and file_writer.write_entire_stream, ensuring cancellation works through the entire call chain.

Also applies to: 991-992, 1045-1045, 1055-1056, 1084-1084, 1098-1098, 1137-1138


1607-1610: Correct propagation of abort signal to file writer.

Good addition of the abort signal to file_writer.write_entire_stream. This ensures upload operations can be properly cancelled when the abort controller is triggered.


1780-1791: Clean refactoring of multipart listing logic.

The refactored code is more readable with clearer separation of concerns:

  • Filtering logic is separated from transformation logic
  • Part number extraction is more explicit
  • Consistent use of async/await pattern

The functionality remains unchanged while improving code clarity.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
src/test/unit_tests/internal/test_file_reader.test.js (1)

1-2: Remove redundant 'use strict' directive.

JavaScript modules are automatically in strict mode. Apply this fix:

/* Copyright (C) 2020 NooBaa */
-'use strict';
src/util/file_writer.js (1)

41-48: Consider removing the redundant finished call.

The stream.promises.pipeline already waits for the pipeline to complete before resolving. The additional stream.promises.finished call appears redundant unless there's a specific reason to wait for the 'finish' event separately.

If the extra wait is not needed, simplify to:

 async write_entire_stream(source_stream, options) {
     await stream.promises.pipeline(source_stream, this, options);
-    await stream.promises.finished(this, options);
 }
src/sdk/namespace_fs.js (5)

867-867: Inconsistent loop syntax standardization.

While standardizing loop syntax from for(;;) to for(; ;) improves consistency, not all infinite loops in the file have been updated. This creates inconsistency within the same file.

Consider updating all infinite loops for consistency:

#!/bin/bash
# Search for remaining instances of for(;;) pattern
rg "for\s*\(\s*;\s*;\s*\)" --type js

1027-1032: JSDoc parameter documentation is incomplete.

The JSDoc comment is missing the @returns documentation and parameter descriptions are empty.

Complete the JSDoc documentation:

-    /**
-     * 
-     * @param {*} params
-     * @param {nb.ObjectSDK} object_sdk
-     * @param {nb.S3Response|stream.Writable} res
-     * @returns 
-     */
+    /**
+     * Reads object data and streams it to the response
+     * @param {Object} params - Object read parameters including bucket, key, range
+     * @param {nb.ObjectSDK} object_sdk - SDK instance with abort controller
+     * @param {nb.S3Response|stream.Writable} res - Response stream to write data to
+     * @returns {Promise<null>} Returns null to indicate response was handled
+     */

2627-2627: Address the TODO comment.

The TODO comment indicates a potential unnecessary stat call that could impact performance.

Would you like me to analyze whether this stat call is necessary and potentially optimize it? I can examine the usage patterns and determine if we can avoid this filesystem operation.


2898-2898: Consider using optional chaining for better null safety.

The static analysis tool correctly identifies that optional chaining would be safer here.

Apply optional chaining for safer property access:

-        return (stat && stat.xattr[XATTR_VERSION_ID]) || 'null';
+        return stat?.xattr?.[XATTR_VERSION_ID] || 'null';

3409-3409: Use BigInt literal for better readability.

Instead of using BigInt(0), use the BigInt literal syntax for consistency with modern JavaScript.

-                { mtimeNsBigint: BigInt(0), name: undefined });
+                { mtimeNsBigint: 0n, name: undefined });
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24e11b7 and 21420b0.

📒 Files selected for processing (7)
  • config.js (1 hunks)
  • src/sdk/namespace_fs.js (32 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/tools/file_writer_hashing.js (2 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (8 hunks)
  • src/util/native_fs_utils.js (8 hunks)
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit Configuration File

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unit_tests/internal/test_file_reader.test.js
🧬 Code Graph Analysis (1)
src/tools/file_writer_hashing.js (1)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
🪛 Biome (1.9.4)
src/test/unit_tests/internal/test_file_reader.test.js

[error] 1-2: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)

src/sdk/namespace_fs.js

[error] 2898-2898: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit Configuration File

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unit_tests/internal/test_file_reader.test.js
🧬 Code Graph Analysis (1)
src/tools/file_writer_hashing.js (1)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
🪛 Biome (1.9.4)
src/test/unit_tests/internal/test_file_reader.test.js

[error] 1-2: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)

src/sdk/namespace_fs.js

[error] 2898-2898: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (27)
config.js (1)

988-989: LGTM! Good typo fix and symmetric configuration.

The correction from NFSF_ to NSFS_ fixes an important typo, and adding the corresponding download threshold creates symmetry with the upload configuration. The 8MB threshold is reasonable for stream memory management.

src/tools/file_writer_hashing.js (2)

74-78: Good refactoring to use FileWriter's centralized stream handling.

The change from manual pipeline management to write_entire_stream simplifies the code and aligns with the broader refactoring effort. The type cast is acceptable here since TargetHash implements the necessary writev method for testing purposes.


115-115: Consistent refactoring applied.

Good to see the same simplification pattern applied here as well.

src/util/file_reader.js (3)

4-224: Well-designed FileReader implementation with efficient buffer management.

The dual-mode approach (standard Readable stream and optimized read_into_stream) provides flexibility. The buffer pooling, abort signal handling, and sparse file support are all well implemented. Good documentation explaining the design rationale.


132-134: Good defensive check against Transform streams.

This prevents a subtle bug where Transform streams could hold onto buffers after the write callback, breaking the buffer pool's assumptions.


156-160: Proper buffer cleanup on EOF.

Good attention to detail in setting buffer_pool_cleanup = null before calling the callback to prevent double-cleanup.

src/test/unit_tests/internal/test_file_reader.test.js (1)

14-133: Excellent test coverage for FileReader.

The test suite thoroughly validates both reading modes against Node.js native streams, covering edge cases and various byte ranges. Good use of parameterized tests to avoid duplication.

src/util/native_fs_utils.js (4)

102-150: Excellent resource management pattern implementation.

The use_file function properly implements the RAII pattern with comprehensive error handling and logging at each stage. The finally block ensures files are always closed, preventing resource leaks.


789-827: Smart sparse file handling implementation.

Good documentation explaining the rationale. The warmup approach of reading just 1 byte is efficient for triggering recalls without holding large buffers. The lazy allocation of the warmup buffer is a nice optimization.


409-409: Good use of centralized error handling.

Using error_utils.new_error_code ensures consistent error object structure across the codebase.


82-82: Minor formatting consistency improvement.

The loop syntax changes from for(;;) to for(; ;) improve formatting consistency without affecting functionality.

Also applies to: 169-169, 527-527

src/util/file_writer.js (8)

17-27: LGTM! Type refinements improve code clarity.

The parameter type annotations using specific native types (nb.NativeFile, nb.NativeFSContext) and making parameters optional improves the API documentation and flexibility.


50-62: Excellent parallel processing implementation!

The method efficiently handles MD5 calculation and file writing in parallel using Promise.all, which improves performance. The conditional MD5 update and stats tracking are properly implemented.


64-72: LGTM! Clean finalization logic.

The method properly encapsulates the MD5 finalization logic with appropriate null checking and hex conversion.


137-155: Good refactoring to reduce code duplication!

The delegation to write_buffers eliminates duplicate logic and improves maintainability.


157-169: LGTM! Consistent refactoring pattern.

The delegation to finalize follows the same pattern as _writev, improving code organization and maintainability.


7-7: debug_module Import Path Verified

The import path update is correct—src/util/debug_module.js exists alongside file_writer.js, so require('./debug_module') will resolve successfully. No further action is needed.


26-26: All FileWriter calls have been updated—large_buf_size removed

No instantiations pass large_buf_size anymore:

  • Checked src/tools/file_writer_hashing.js (two calls) – only target_file, fs_context, namespace_resource_id
  • Checked src/sdk/namespace_fs.js – only target_file, fs_context, offset, md5_enabled, stats
  • No occurrences of large_buf_size in the codebase

The removal is safe and callers have been updated.


27-27: NSFS_UPLOAD_STREAM_MEM_THRESHOLD rename verified across codebase

The new constant is defined in config.js (config.NSFS_UPLOAD_STREAM_MEM_THRESHOLD = 8 * 1024 * 1024;) and no occurrences of NFSF_UPLOAD_STREAM_MEM_THRESHOLD remain.

src/sdk/namespace_fs.js (8)

11-11: Good addition of stream module import.

The stream module import is properly added to support the new stream.promises.finished usage throughout the file.


31-31: Proper integration of FileReader module.

The FileReader import follows the existing pattern and is correctly positioned among other utility imports.


1035-1056: Excellent abort signal handling implementation.

The addition of abort signal checks at critical points (lines 1045, 1055) and passing the signal to stream operations improves the cancellation behavior and prevents unnecessary work when requests are aborted.


1104-1140: Well-structured FileReader integration.

The refactoring to use FileReader improves code modularity and maintainability. The implementation properly:

  • Passes all necessary context including abort signal
  • Handles statistics collection
  • Provides detailed logging with performance metrics
  • Ensures proper cleanup with file closure

1136-1138: Proper use of stream.promises.finished with abort signal.

Using stream.promises.finished with the abort signal is the correct approach for waiting for stream completion while respecting cancellation. This replaces the custom utility and uses Node.js native functionality.


1607-1609: Clean abstraction using file_writer.write_entire_stream.

The refactoring to use file_writer.write_entire_stream with abort signal support improves code clarity and properly handles stream cancellation.


1780-1791: Improved readability in multipart listing.

The refactored code using filter and map chaining is more readable and functional. The logic is clearer and easier to maintain.


2552-2552: Centralized restore status calculation.

Good refactoring to use Glacier.get_restore_status for consistent restore status calculation across the codebase.

Copy link
Contributor

@liranmauda liranmauda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link

This PR had no activity for too long - it will now be labeled stale. Update it to prevent it from getting closed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/tools/file_writer_hashing.js (1)

115-121: Avoid writing undefined MD5 to xattr when MD5 is disabled.

If NSFS_CALCULATE_MD5=false, file_writer.digest is undefined, and replacexattr may persist an undefined value.

Apply:

-            if (XATTR) {
-                await target_file.replacexattr(
-                    DEFAULT_FS_CONFIG,
-                    assign_md5_to_fs_xattr(file_writer.digest, {})
-                );
-            }
+            if (XATTR && file_writer.digest) {
+                await target_file.replacexattr(
+                    DEFAULT_FS_CONFIG,
+                    assign_md5_to_fs_xattr(file_writer.digest, {})
+                );
+            }
♻️ Duplicate comments (2)
src/util/file_writer.js (1)

41-48: Drop the extra finished(); pipeline already awaits completion.

stream.promises.pipeline resolves when the Writable finishes; awaiting finished(this) is redundant.

Apply:

-    async write_entire_stream(source_stream, options) {
-        await stream.promises.pipeline(source_stream, this, options);
-        await stream.promises.finished(this, options);
-    }
+    async write_entire_stream(source_stream, options) {
+        await stream.promises.pipeline(source_stream, this, options);
+    }

Regarding prior concern “special handling moving from stream_utils.pipeline”: no extra handling needed here since there’s no reuse of the destination stream.

src/sdk/namespace_fs.js (1)

1020-1026: Fixed event payload to use params.key.

This resolves the undefined variable reported earlier for object_name. Thanks for addressing it.

Also applies to: 1094-1100, 1164-1168

🧹 Nitpick comments (4)
src/tools/file_writer_hashing.js (1)

74-79: Type duck-typing is fine here; consider an explicit minimal interface for clarity.

Casting target to any works at runtime; defining a tiny typedef with writev(fs_context, buffers) improves readability.

src/util/native_fs_utils.js (1)

895-903: Guard warmup read position for zero-length or out-of-range.

Reading 1 byte at pos when stat.size===0 or pos>=stat.size will read 0; add a safe early return/pos clamp.

Apply:

 async function warmup_sparse_file(fs_context, file, file_path, stat, pos) {
   dbg.log0('warmup_sparse_file', {
     file_path, pos, size: stat.size, blocks: stat.blocks,
   });
+  if (!stat.size) return;
+  const safe_pos = Math.min(Math.max(0, pos || 0), stat.size - 1);
   if (!warmup_buffer) {
     warmup_buffer = nb_native().fs.dio_buffer_alloc(4096);
   }
-  await file.read(fs_context, warmup_buffer, 0, 1, pos);
+  await file.read(fs_context, warmup_buffer, 0, 1, safe_pos);
 }
src/sdk/namespace_fs.js (1)

1151-1157: Avoid divide-by-zero in avg_buffer metric.

num_buffers can be 0 (e.g., empty objects). Guard the division.

Apply:

-                avg_buffer: file_reader.num_bytes / file_reader.num_buffers,
+                avg_buffer: file_reader.num_buffers ? (file_reader.num_bytes / file_reader.num_buffers) : 0,
src/sdk/object_io.js (1)

437-439: Remove redundant await stream.promises.finished(uploader) after pipeline.

The custom stream_utils.pipeline() wrapper resolves only after calling resume() on the last stream, ensuring full completion of the uploader stream. The subsequent await finished(uploader) at line 439 is unnecessary. Additionally, the workaround in stream_utils.pipeline() (referenced in src/util/stream_utils.js lines 46-49) addresses Node.js issue #40685, which involves deadlocks when waiting for finish on the last pipeline stream—removing this redundant await eliminates that potential risk.

-        await stream.promises.finished(uploader);
+        // pipeline() already waits for completion and calls resume() on the last stream
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21420b0 and c15e04e.

📒 Files selected for processing (10)
  • config.js (2 hunks)
  • src/sdk/namespace_fs.js (34 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (2 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (8 hunks)
  • src/util/native_fs_utils.js (8 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/test/unit_tests/internal/test_file_reader.test.js
  • config.js
  • src/util/file_reader.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unrelated/stream_pipe_to_multiple_targets.js
🧠 Learnings (1)
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/sdk/namespace_fs.js
🧬 Code graph analysis (4)
src/tools/file_writer_hashing.js (1)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
src/util/native_fs_utils.js (2)
src/sdk/namespace_fs.js (6)
  • error_utils (17-17)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • path (8-8)
src/sdk/bucketspace_fs.js (6)
  • require (15-23)
  • require (24-24)
  • require (25-25)
  • require (27-27)
  • require (30-30)
  • path (6-6)
src/util/file_writer.js (3)
src/sdk/namespace_fs.js (5)
  • dbg (13-13)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
src/sdk/object_io.js (4)
  • dbg (8-8)
  • require (19-19)
  • require (20-20)
  • require (21-21)
src/util/native_fs_utils.js (1)
  • dbg (4-4)
src/sdk/namespace_fs.js (2)
src/util/file_reader.js (4)
  • stream (4-4)
  • config (5-5)
  • native_fs_utils (8-8)
  • nb_native (6-6)
src/util/stream_utils.js (2)
  • stream (5-5)
  • err (40-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (9)
src/util/native_fs_utils.js (2)

116-150: Nice, scoped open/close helper.

use_file encapsulates open/scope/close with solid error logs. LGTM.


876-879: Sparse check helper looks good.

Using stat.blocks*512 < size as a heuristic is reasonable and documented.

src/util/file_writer.js (1)

56-63: Good split: MD5 update in parallel to I/O with stats update post-write.

This is clear and efficient. LGTM.

src/sdk/namespace_fs.js (4)

1113-1150: FileReader integration and abort checks look solid.

Abort checks around open/loop and awaiting stream.promises.finished(res,{signal}) are appropriate. LGTM.


1699-1701: Good: persist create params with explicit umask-applied mode.

Using get_umasked_mode(config.BASE_MODE_FILE) here is correct.


1799-1811: Multipart listing object composition looks correct.

num/size/etag/last_modified mapping from xattr/stat is consistent. LGTM.


1037-1150: Add comprehensive tests for abort scenarios and MD5 configurations in streaming paths.

The analysis confirms your concerns are valid. Test coverage is insufficient:

  • FileReader tests create AbortController (lines 69, 106) but never trigger abort during execution
  • write_entire_stream supports abort signal (stream.promises.pipeline/finished in src/util/file_writer.js:45-47), but tests don't exercise it
  • FileWriter tests via file_writer_hashing.js cover concurrent scenarios but lack abort+MD5 combinations
  • No destroyed cache/hub semantics tests found matching stream_pipe_to_multiple_targets.js patterns

Before merge, add tests for:

  1. read_object_stream abort mid-transfer (verify cleanup and error handling)
  2. write_entire_stream abort with force_md5_etag on/off (verify digest handling)
  3. Stream abort with cache/hub destroyed semantics
src/test/unrelated/stream_pipe_to_multiple_targets.js (1)

146-148: Node.js version and semantics confirmed—code is correct.

Node 24.3.0 fully supports stream.promises.finished (required v15+). Destroyed cache paths are handled correctly: Promise.allSettled prevents rejection when cache.destroy() is called (line 107), and the test explicitly checks cache.destroyed (line 150) to branch behavior. The implementation matches the pattern used in production code elsewhere (file_writer.js, object_io.js, namespace_fs.js).

src/util/stream_utils.js (1)

29-56: Removal of wait_finished verified—no lingering references.

Confirmed that wait_finished has been successfully removed from exports and no files in the codebase reference it. All existing imports of stream_utils use valid exported functions (wait_drain, get_tap_stream, pipeline). The code change is clean with no breaking dependencies.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/util/file_writer.js (1)

141-155: Blocking: implement _write (Writable requires it).

Without _write, single-chunk writes will fail at runtime. Delegate to write_buffers.

Apply:

 class FileWriter extends stream.Writable {
@@
-    async _writev(chunks, callback) {
+    async _writev(chunks, callback) {
         try {
             let size = 0;
             const buffers = chunks.map(it => {
                 size += it.chunk.length;
                 return it.chunk;
             });
             await this.write_buffers(buffers, size);
             return callback();
         } catch (err) {
             console.error('FileWriter._writev: failed', err);
             return callback(err);
         }
     }
+
+    async _write(chunk, _encoding, callback) {
+        try {
+            const buf = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk, _encoding || 'utf8');
+            await this.write_buffers([buf], buf.length);
+            return callback();
+        } catch (err) {
+            console.error('FileWriter._write: failed', err);
+            return callback(err);
+        }
+    }
♻️ Duplicate comments (1)
src/test/unit_tests/internal/test_file_reader.test.js (1)

93-131: Nice addition of read_into_stream tests.

This addresses prior feedback and validates pooling/backpressure paths.

🧹 Nitpick comments (6)
src/util/file_reader.js (2)

131-191: Backpressure and buffer cleanup path LGTM; add early-destroy guard.

Flow handles drain and cleanup correctly. Add a cheap fast-fail if target_stream is destroyed to stop earlier.

-        if (target_stream instanceof stream.Transform) {
+        if (target_stream instanceof stream.Transform) {
             throw new Error('FileReader read_into_stream must be called with a Writable stream, not a Transform stream');
         }
+        if (target_stream.destroyed) this.signal.throwIfAborted();

80-101: Readable path is fine; minor: prefer pooled buffer for parity.

AllocUnsafe per-chunk is OK; if parity with pool is desired, consider borrowing from the smallest pool size when available.

src/test/unit_tests/internal/test_file_reader.test.js (2)

14-53: Add an abort scenario test.

Simulate an abort mid-read to assert early termination and no hangs.

+    it('aborts mid-read', async () => {
+        await native_fs_utils.use_file({
+            fs_context, bucket_path: __filename, open_path: __filename,
+            scope: async file => {
+                const stat = await file.stat(fs_context);
+                const ac = new AbortController();
+                const fr = new FileReader({
+                    fs_context, file, file_path: __filename, stat,
+                    start: 0, end: Infinity, signal: ac.signal, multi_buffer_pool,
+                });
+                const writable = buffer_utils.write_stream();
+                const p = fr.read_into_stream(writable);
+                setTimeout(() => ac.abort(), 1); // abort quickly
+                await assert.rejects(p, { name: 'AbortError' });
+            }
+        });
+    });

10-11: Avoid importing multi_buffer_pool from NamespaceFS in unit tests.

Pull MultiSizeBuffersPool directly to decouple the test from nsfs internals and speed up test init.

-const { multi_buffer_pool } = require('../../../sdk/namespace_fs');
+const buffer_utils = require('../../../util/buffer_utils');
+const multi_buffer_pool = new buffer_utils.MultiSizeBuffersPool({ /* sizes from config */ });

Based on learnings.

src/util/file_writer.js (1)

45-48: pipeline already waits for finish; drop redundant finished().

stream.promises.pipeline resolves on destination finish. The extra finished() is unnecessary.

-    async write_entire_stream(source_stream, options) {
-        await stream.promises.pipeline(source_stream, this, options);
-        await stream.promises.finished(this, options);
-    }
+    async write_entire_stream(source_stream, options) {
+        await stream.promises.pipeline(source_stream, this, options);
+    }
src/sdk/namespace_fs.js (1)

1030-1160: read_object_stream refactor to FileReader looks solid; add minor guards.

  • Good: early abort checks, Glacier state checks, sparse warmup via FileReader, finished() wait.
  • Optional: avoid NaN in avg_buffer when empty reads.
-                avg_buffer: file_reader.num_bytes / file_reader.num_buffers,
+                avg_buffer: file_reader.num_buffers ? (file_reader.num_bytes / file_reader.num_buffers) : 0,
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c15e04e and 69a20c3.

📒 Files selected for processing (11)
  • config.js (2 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (34 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (2 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (8 hunks)
  • src/util/native_fs_utils.js (8 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/tools/file_writer_hashing.js
  • src/test/unrelated/stream_pipe_to_multiple_targets.js
  • src/util/stream_utils.js
  • config.js
  • src/sdk/object_io.js
  • src/util/native_fs_utils.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unit_tests/internal/test_file_reader.test.js
🧠 Learnings (1)
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/sdk/namespace_fs.js
🧬 Code graph analysis (5)
src/sdk/bucketspace_fs.js (1)
src/util/debug_module.js (2)
  • name (103-103)
  • name (432-432)
src/util/file_writer.js (4)
src/sdk/bucketspace_fs.js (6)
  • dbg (37-37)
  • require (15-23)
  • require (24-24)
  • require (25-25)
  • require (27-27)
  • require (30-30)
src/sdk/namespace_fs.js (5)
  • dbg (13-13)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
src/sdk/object_io.js (4)
  • dbg (8-8)
  • require (19-19)
  • require (20-20)
  • require (21-21)
src/util/native_fs_utils.js (1)
  • dbg (4-4)
src/util/file_reader.js (2)
src/sdk/namespace_fs.js (8)
  • stream (11-11)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • config (14-14)
  • nb_native (24-24)
  • multi_buffer_pool (33-54)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (39-39)
src/test/unit_tests/internal/test_file_reader.test.js (3)
src/sdk/namespace_fs.js (7)
  • fs (7-7)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • buffer_utils (18-18)
  • native_fs_utils (21-21)
src/util/native_fs_utils.js (5)
  • fs (5-5)
  • fs_context (605-610)
  • file (124-124)
  • file (815-815)
  • data (182-182)
src/util/file_reader.js (1)
  • native_fs_utils (8-8)
src/sdk/namespace_fs.js (2)
src/util/file_reader.js (4)
  • stream (4-4)
  • config (5-5)
  • native_fs_utils (8-8)
  • nb_native (6-6)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (39-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: run-package-lock-validation
  • GitHub Check: Build Noobaa Image
🔇 Additional comments (8)
src/sdk/bucketspace_fs.js (1)

436-436: Log level bump looks fine; keep consistency with setters.

Getter logs moved to log1 while setters still use log0; this asymmetry is OK if intentional to reduce noise. If not, align both sides.

src/sdk/namespace_fs.js (6)

1603-1626: Upload path switched to write_entire_stream — correct; depends on FileWriter _write fix.

Once _write is added, this simplifies uploads and honors abort signal.

Add a quick smoke test (manual or unit) that uploads small and large bodies to exercise both _write and _writev paths.


1699-1701: Explicit file mode on create_object_upload is fine.

Ensures predictable perms under umask; no change needed.


1799-1811: list_multiparts mapping is clear and efficient.

Simple, readable extraction of num/etag/size/timestamp. LGTM.


2316-2327: _get_file_path tweak keeps normalization and directory handling; good.

Continues to guard against './' traversal and appends .folder for dir-objects. LGTM.


2572-2589: restore_status wiring is a nice addition.

Exposes Glacier restore status on object info; aligns with read-object behavior.


2644-2646: Make bucket stat check optional via config — good for perf tuning.

The guard allows perf experiments; keep TODO as discussed.

src/util/file_writer.js (1)

130-135: Offset initialization is correctly handled—no changes needed.

The code properly initializes this.offset from the constructor parameter (line 30). When undefined, writev correctly treats it as "use current position." The condition at line 133 (if (this.offset >= 0)) distinguishes between positional writes (offset ≥ 0) and append mode (offset < 0 or undefined), which is the intended behavior per the accompanying comment. No defensive default is required.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/sdk/object_io.js (1)

437-438: Redundant await on uploader completion.

Line 437 calls stream_utils.pipeline(transforms) which already waits for all streams in the pipeline (including uploader) to finish. The additional await stream.promises.finished(uploader) on line 438 is redundant and adds unnecessary overhead.

Consider removing line 438:

     await stream_utils.pipeline(transforms);
-    await stream.promises.finished(uploader);
src/tools/file_writer_hashing.js (1)

74-74: Consider using proper typing instead of any cast.

The cast to any circumvents type safety. If TargetHash is intended to be compatible with the target_file parameter, consider adding an interface or using a union type to make this explicit.

src/util/native_fs_utils.js (1)

70-76: Consider adding JSDoc for the reformatted function.

While the reformatting improves readability, the open_file function lacks JSDoc documentation describing its parameters and behavior. This would be helpful given its importance in the codebase.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69a20c3 and e9a7e8d.

📒 Files selected for processing (12)
  • config.js (2 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (34 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/for_await_stream.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (2 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (8 hunks)
  • src/util/native_fs_utils.js (8 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js
  • src/sdk/bucketspace_fs.js
  • src/util/file_reader.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unrelated/for_await_stream.js
  • src/test/unit_tests/internal/test_file_reader.test.js
🧠 Learnings (2)
📚 Learning: 2025-08-08T13:10:36.141Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1314-1317
Timestamp: 2025-08-08T13:10:36.141Z
Learning: In src/server/system_services/pool_server.js (and config usage), the constant config.INTERNAL_STORAGE_POOL_NAME has been removed from the system. Future logic should not depend on this constant and should instead use config.DEFAULT_POOL_NAME or structural markers (e.g., pool.resource_type === 'INTERNAL' or pool.mongo_info) to identify internal/mongo pools.

Applied to files:

  • config.js
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/sdk/namespace_fs.js
🧬 Code graph analysis (4)
src/tools/file_writer_hashing.js (1)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
src/sdk/namespace_fs.js (1)
src/util/file_reader.js (1)
  • stream (4-4)
src/test/unit_tests/internal/test_file_reader.test.js (3)
src/sdk/namespace_fs.js (7)
  • fs (7-7)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • buffer_utils (18-18)
  • native_fs_utils (21-21)
src/util/native_fs_utils.js (5)
  • fs (5-5)
  • fs_context (605-610)
  • file (124-124)
  • file (815-815)
  • data (182-182)
src/util/file_reader.js (1)
  • native_fs_utils (8-8)
src/util/file_writer.js (3)
src/sdk/object_io.js (5)
  • stream_utils (14-14)
  • require (19-19)
  • require (20-20)
  • require (21-21)
  • buffers (785-785)
src/util/file_reader.js (1)
  • stream_utils (7-7)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
🔇 Additional comments (12)
src/test/unrelated/for_await_stream.js (1)

69-69: LGTM! Clean migration to native stream API.

The change from promisified stream.finished to stream.promises.finished is correct and follows modern Node.js best practices.

src/tools/file_writer_hashing.js (1)

78-78: LGTM! Simplified streaming with write_entire_stream.

The refactor from pipeline-based streaming to file_writer.write_entire_stream() is cleaner and more maintainable. The implementation correctly handles both hash targets and file targets.

Also applies to: 115-115

config.js (2)

830-830: LGTM! New configuration constants are well-defined.

The new constants are clearly named and have appropriate default values:

  • NSFS_CHECK_BUCKET_PATH_EXISTS: Enables bucket path existence validation
  • NSFS_DOWNLOAD_STREAM_MEM_THRESHOLD: Sets 8MB threshold for download streaming
  • NSFS_LIST_IGNORE_ENTRY_ON_EINVAL: Handles GPFS stat issues on list operations

Also applies to: 1014-1014, 1019-1019


1013-1013: Constant rename verification successful—all references updated.

The old constant name NFSF_UPLOAD_STREAM_MEM_THRESHOLD has been completely replaced with NSFS_UPLOAD_STREAM_MEM_THRESHOLD across the codebase. The search found zero remaining references to the old name, and the only usage in src/util/file_writer.js:28 has been correctly updated to use the new constant name.

src/test/unit_tests/internal/test_file_reader.test.js (1)

1-133: Excellent test coverage for FileReader!

The test suite is comprehensive and well-structured:

  • Tests both the stream.Readable interface and read_into_stream method
  • Covers edge cases (boundaries, offsets, large values)
  • Validates against Node.js native fs.createReadStream for correctness
  • Uses dynamic test discovery across all files in the directory

The past review comment requesting tests for read_into_stream() has been fully addressed (lines 93-131).

As per coding guidelines.

src/util/stream_utils.js (2)

28-28: LGTM! Clean migration to native stream.promises API.

Using stream.promises.pipeline directly is cleaner and more standard than maintaining a promisified wrapper.


37-50: Improved pipeline with better error handling and flexibility.

The changes enhance the pipeline function:

  • Added optional options parameter for pipeline configuration (line 37)
  • Safer error handling by checking strm.destroy existence before calling (line 42)
  • Correctly passes options to the underlying native pipeline (line 50)

The default values for new parameters maintain backward compatibility for most use cases.

src/util/native_fs_utils.js (3)

102-150: Well-designed use_file helper with proper resource cleanup.

The use_file function provides a clean RAII-style pattern for file operations:

  • Opens file with proper error handling (lines 127-132)
  • Executes scope with the file handle (lines 134-138)
  • Ensures file is closed in finally block even on errors (lines 140-147)
  • Returns the scope result (line 149)

This pattern will help prevent file descriptor leaks across the codebase.


865-878: Good sparse file detection with documented limitations.

The is_sparse_file function uses the standard approach (blocks * 512 < size) to detect sparse files. The extensive JSDoc comment correctly explains the limitations:

  • False positives during async writes
  • Archived/tiered files
  • Cached remote storage files

This is appropriate for its intended use case (optimizing buffer pool usage) as noted in the documentation.


880-903: Smart warmup strategy with lazy buffer allocation.

The warmup_sparse_file function efficiently handles sparse files:

  • Lazily allocates a single shared 4KB buffer (lines 899-901)
  • Performs a tiny 1-byte read to trigger recall/cache load (line 902)
  • Logs useful debugging info (lines 896-898)

This prevents buffer pool exhaustion while waiting for high-latency archive/cache operations.

src/sdk/namespace_fs.js (1)

1110-1135: Confirm range upper bound semantics

Line 1119: FileReader’s public contract treats end as an exclusive bound. Today our S3 path hands params.end around as an inclusive byte index (e.g., Range: bytes=0-0). Passing it through unchanged would lop off the last requested byte. Please double-check that we normalize the header to an exclusive bound before invoking FileReader, or adjust here to add one.

src/util/file_writer.js (1)

42-49: Ensure pipeline closes the FileWriter

Line 47: With stream_utils.pipeline receiving end_stream = false, I’m worried the helper might leave the FileWriter open, skipping _final() and leaving digest unset. Can we confirm the helper still drains and ends the destination in this mode (or flip the flag if not)?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/util/file_reader.js (1)

44-74: Validate and clamp range parameters (start/end).

The constructor sets this.start, this.end, and this.pos without validating or clamping the input ranges. This can produce negative or inverted ranges that would pass invalid values into warmup_sparse_file and read operations.

Based on past review feedback, apply these guards:

  • Coerce and clamp start to max(0, Number.isFinite(start) ? start : 0)
  • Coerce end to Number.isFinite(end) ? end : Infinity
  • Clamp end to min(stat.size, max(this.start, requested_end)) to prevent inversion
  • This ensures pos (initialized to this.start) is non-negative and within [0, stat.size]

Apply this diff:

-        this.start = start || 0;
-        this.end = Math.min(stat.size, end ?? Infinity);
+        this.start = Math.max(0, Number.isFinite(start) ? start : 0);
+        const requested_end = Number.isFinite(end) ? end : Infinity;
+        this.end = Math.min(stat.size, Math.max(this.start, requested_end));
         this.pos = this.start;
🧹 Nitpick comments (3)
src/sdk/namespace_fs.js (2)

1058-1063: Clarify the necessity of waiting for stream completion.

The comment on line 1058-1059 suggests uncertainty about whether stream.promises.finished is needed after res.end(). In fact, waiting for finished is important even for empty streams to ensure the response pipeline completes and any errors in the writable stream are caught. Consider updating the comment to reflect this reasoning for clarity.

-                        // since we don't write anything to the stream waiting for finished might not be needed,
-                        // added just in case there is a delay.
+                        // Wait for the response to finish to ensure the stream pipeline completes
+                        // and any downstream errors are propagated.

2643-2645: Verify or remove TODO comment.

The TODO comment suggests this check can be skipped for performance tests and mentions potential caching. If this is a temporary note for development/testing, it should be clarified or removed before production. If caching is planned, consider opening a tracking issue.

src/util/native_fs_utils.js (1)

102-150: Solid scoped file utility with proper cleanup.

The use_file function provides a good pattern for scoped file operations with guaranteed cleanup. The try-catch-finally structure ensures the file is closed even on errors.

The error handling with separate try-catch blocks for open/scope/close is thorough, though the error logging and rethrowing in each catch could potentially be simplified if the calling code handles errors appropriately. The current approach is defensively safe.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9a7e8d and ad03de9.

📒 Files selected for processing (12)
  • config.js (9 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (34 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/for_await_stream.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (2 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (8 hunks)
  • src/util/native_fs_utils.js (8 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js
  • src/tools/file_writer_hashing.js
  • src/test/unrelated/for_await_stream.js
  • src/test/unit_tests/internal/test_file_reader.test.js
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-30T10:20:09.402Z
Learnt from: liranmauda
Repo: noobaa/noobaa-core PR: 9174
File: config.js:1044-1046
Timestamp: 2025-07-30T10:20:09.402Z
Learning: The NooBaa config.js file has a built-in environment variable override system via `load_config_env_overrides()` function. Any config property can be overridden using environment variables prefixed with `CONFIG_JS_`. For example, `CONFIG_JS_NC_HEALTH_BUCKETS_COUNT_LIMIT=10000` would override `config.NC_HEALTH_BUCKETS_COUNT_LIMIT`. The system handles type conversion automatically and provides consistent override patterns across all config values.

Applied to files:

  • config.js
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/sdk/namespace_fs.js
🧬 Code graph analysis (6)
src/sdk/object_io.js (1)
src/util/stream_utils.js (1)
  • stream (4-4)
src/sdk/bucketspace_fs.js (1)
src/util/debug_module.js (2)
  • name (103-103)
  • name (432-432)
src/sdk/namespace_fs.js (2)
src/util/file_reader.js (2)
  • stream (4-4)
  • nb_native (6-6)
src/util/file_writer.js (1)
  • stream (4-4)
src/util/file_reader.js (3)
src/sdk/namespace_fs.js (9)
  • stream (11-11)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • config (14-14)
  • nb_native (24-24)
  • native_fs_utils (21-21)
  • multi_buffer_pool (33-54)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (40-40)
src/util/native_fs_utils.js (7)
  • config (10-10)
  • nb_native (12-12)
  • nb_native (726-726)
  • file (124-124)
  • file (815-815)
  • buffer_pool_cleanup (163-163)
  • data (182-182)
src/util/native_fs_utils.js (2)
src/sdk/namespace_fs.js (6)
  • error_utils (17-17)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • path (8-8)
src/sdk/bucketspace_fs.js (6)
  • require (15-23)
  • require (24-24)
  • require (25-25)
  • require (27-27)
  • require (30-30)
  • path (6-6)
src/util/file_writer.js (3)
src/sdk/object_io.js (5)
  • stream_utils (14-14)
  • require (19-19)
  • require (20-20)
  • require (21-21)
  • buffers (785-785)
src/util/file_reader.js (1)
  • stream_utils (7-7)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (26)
src/util/file_reader.js (5)

80-101: LGTM!

The _read implementation correctly handles EOF, full reads, and partial reads with proper error propagation. Buffer allocation and slicing logic is sound.


109-118: LGTM!

The read_into_buffer method correctly sequences sparse-file warmup, abort checking, file reading, and stats updates.


131-192: LGTM! Robust buffer pool and backpressure handling.

The read_into_stream implementation demonstrates careful resource management:

  • Guards against Transform streams (which could retain buffers)
  • Handles backpressure with drain promises
  • Ensures buffer cleanup in finally block
  • Multiple abort signal checks for responsiveness

197-212: LGTM! Stats method correctly updated.

The stats update now correctly uses update_nsfs_read_stats for read operations (line 206). This addresses the past review feedback that flagged the incorrect use of update_nsfs_write_stats.


217-222: LGTM!

The _warmup_sparse_file helper properly gates the operation with config flags and sparse-file detection before calling the native utility.

src/sdk/bucketspace_fs.js (1)

436-436: LGTM! Appropriate log level adjustment.

Reducing verbosity from log0 to log1 for lifecycle configuration reads is appropriate—read operations don't require the highest log level. This aligns with the broader logging refinements across the PR.

src/sdk/object_io.js (1)

438-438: LGTM! Modernization to native stream API.

Migrating from stream_utils.wait_finished to the native stream.promises.finished API removes an unnecessary wrapper and aligns with the broader refactor that eliminates wait_finished from stream_utils.js.

config.js (8)

830-830: LGTM! Feature flag for bucket path validation.

The new NSFS_CHECK_BUCKET_PATH_EXISTS constant provides a runtime control for bucket path existence checks, enabling flexible validation behavior across the NSFS stack.


1013-1014: LGTM! Typo fix and symmetric threshold addition.

Line 1013 corrects the typo NFSF_UPLOAD_STREAM_MEM_THRESHOLDNSFS_UPLOAD_STREAM_MEM_THRESHOLD. Line 1014 adds NSFS_DOWNLOAD_STREAM_MEM_THRESHOLD (8 MB) to mirror the upload threshold, providing symmetric memory management for streaming operations.


1182-1238: LGTM! Appropriate log level refinements.

The changes from console.warn to console.log for config loading messages (lines 1182, 1210, 1215, 1218, 1225, 1231) correctly treat informational logs as non-warnings. Error cases (line 1238) still appropriately use console.warn.


1243-1259: LGTM! Enhanced error handling and documentation.

The JSDoc addition (lines 1243-1245) improves type clarity. The updated error handling (lines 1251-1255) now distinguishes between ENOENT (file not found) and other errors with appropriate messaging.


1264-1275: LGTM! Direct file read and improved logging.

Using fs.readFileSync directly (line 1268) is appropriate for this synchronous config initialization. The added logging (lines 1270, 1272) improves visibility into which config directory was selected.


1280-1288: LGTM! Modernized property check and clearer logging.

Using Object.hasOwn (line 1283) is preferred over hasOwnProperty. The improved log message (line 1284) clarifies that environment variables are being updated from config.


1322-1351: LGTM! Enhanced logging for config loading.

The updated log messages (lines 1327, 1341-1343, 1348) now include config paths and action context, significantly improving observability during config initialization. Using console.warn for missing config (line 1348) is appropriate.


1355-1377: LGTM! Consistent logging improvements.

The updated messages (lines 1371, 1374) align with the broader logging refinements throughout the config loading paths.

src/util/stream_utils.js (1)

28-56: LGTM! Successful migration to native stream API.

The updates modernize pipeline to use stream.promises.pipeline (line 28) with several improvements:

  • Default parameter values (line 37) improve API usability
  • Defensive check for strm.destroy existence (line 42) prevents errors on streams without destroy
  • Options parameter (line 50) provides flexibility for custom pipeline behavior

The removal of the wait_finished export aligns with callers migrating to stream.promises.finished directly.

src/sdk/namespace_fs.js (3)

967-998: Good abort handling in retry loop.

The placement of object_sdk.throw_if_aborted() at the start of the retry loop (line 969) and before the delay (line 995) ensures prompt cancellation on abort, preventing unnecessary retries.


1119-1157: LGTM! Correct streaming pattern with FileReader.

The implementation correctly follows the FileReader API contract:

  1. file_reader.read_into_stream(res) pipes data without calling end() (as documented in the FileReader summary)
  2. res.end() is explicitly called to signal completion
  3. await stream.promises.finished(res, { signal }) ensures the response completes and propagates errors, with abort signal support

The metrics logging after completion provides valuable observability.


1602-1632: Good abort signal propagation to FileWriter.

The abort signal is correctly extracted and passed to file_writer.write_entire_stream, enabling prompt cancellation of upload operations.

src/util/native_fs_utils.js (3)

70-99: LGTM! Improved readability with multiline signature.

The reformatted function signature with explicit default parameters improves readability, and the updated log messages with "native_fs_utils" provide clearer context.


865-903: Good sparse file utilities, but clarify documentation reference.

The is_sparse_file and warmup_sparse_file functions provide useful optimizations for handling sparse/archived files.

However, the documentation on line 870 references _fail_if_archived_or_sparse_file, which appears to be commented out in namespace_fs.js (lines 2845-2857). Consider updating the documentation to reflect the current usage or removing the stale reference.

The lazy allocation of warmup_buffer is a good optimization for memory efficiency.


410-412: Good use of standardized error creation.

Replacing plain Error with error_utils.new_error_code('EEXIST', ...) provides consistent error handling with proper error codes.

src/util/file_writer.js (4)

7-40: LGTM! Improved constructor clarity and imports.

The updated constructor signature with explicit parameter types improves readability, and using config.NSFS_UPLOAD_STREAM_MEM_THRESHOLD for highWaterMark centralizes configuration.


50-72: Good concurrent execution of MD5 and write operations.

The write_buffers method efficiently runs MD5 calculation and file writes in parallel using Promise.all, improving performance while maintaining correctness since these operations don't interfere with each other.

The finalize method cleanly separates MD5 digest computation.


142-169: LGTM! Good refactoring for separation of concerns.

Extracting the logic into write_buffers and finalize methods improves code organization and makes the business logic reusable outside the stream interface. The stream methods (_writev and _final) now cleanly delegate to these extracted methods.


42-48: stream_utils.pipeline second parameter is correctly used.

The second parameter false to stream_utils.pipeline controls whether the stream will be reused. When false, the implementation may call resume() on the last stream—but only if the stream is readable (line 53: if (last_stream.readable)). Since FileWriter extends stream.Writable and is not readable, resume() will not be called. The parameter choice is correct: false indicates the FileWriter instance won't be reused after the write operation completes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad03de9 and 7d5b3a3.

📒 Files selected for processing (12)
  • config.js (9 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (34 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/for_await_stream.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (2 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (8 hunks)
  • src/util/native_fs_utils.js (8 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/sdk/bucketspace_fs.js
  • src/test/unrelated/stream_pipe_to_multiple_targets.js
  • src/test/unrelated/for_await_stream.js
  • src/sdk/object_io.js
  • src/tools/file_writer_hashing.js
  • src/test/unit_tests/internal/test_file_reader.test.js
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/sdk/namespace_fs.js
📚 Learning: 2025-07-30T10:20:09.402Z
Learnt from: liranmauda
Repo: noobaa/noobaa-core PR: 9174
File: config.js:1044-1046
Timestamp: 2025-07-30T10:20:09.402Z
Learning: The NooBaa config.js file has a built-in environment variable override system via `load_config_env_overrides()` function. Any config property can be overridden using environment variables prefixed with `CONFIG_JS_`. For example, `CONFIG_JS_NC_HEALTH_BUCKETS_COUNT_LIMIT=10000` would override `config.NC_HEALTH_BUCKETS_COUNT_LIMIT`. The system handles type conversion automatically and provides consistent override patterns across all config values.

Applied to files:

  • config.js
🧬 Code graph analysis (3)
src/sdk/namespace_fs.js (2)
src/util/file_reader.js (4)
  • stream (4-4)
  • config (5-5)
  • native_fs_utils (8-8)
  • nb_native (6-6)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (40-40)
src/util/file_reader.js (2)
src/sdk/namespace_fs.js (8)
  • stream (11-11)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • config (14-14)
  • nb_native (24-24)
  • multi_buffer_pool (33-54)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (40-40)
src/util/file_writer.js (2)
src/sdk/object_io.js (4)
  • stream_utils (14-14)
  • require (19-19)
  • require (20-20)
  • require (21-21)
src/util/file_reader.js (1)
  • stream_utils (7-7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (7)
src/util/native_fs_utils.js (7)

13-13: LGTM: error_utils import added.

The import supports consistent error creation patterns used later in the file.


70-76: LGTM: Improved function signature formatting.

The multiline parameter formatting enhances readability for this function with multiple parameters.


82-93: LGTM: Consistent logging and loop syntax.

The logging updates correctly identify the module as "native_fs_utils" and the loop syntax standardization improves consistency.


102-150: LGTM: Excellent resource management abstraction.

The use_file function implements a clean try-finally pattern that guarantees file closure after executing the scoped operation. The error handling is appropriate, logging failures at each stage before re-throwing.


411-411: LGTM: Consistent error creation.

Using error_utils.new_error_code aligns with the codebase's error handling conventions.


865-878: LGTM: Well-documented sparse file detection.

The is_sparse_file function includes excellent documentation explaining the edge cases and limitations. The implementation correctly checks if allocated blocks are less than the file size.


910-910: LGTM: New utilities properly exported.

The new functions use_file, is_sparse_file, and warmup_sparse_file are correctly exported and available for external use.

Also applies to: 950-951

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/util/file_writer.js (1)

142-155: Implement _write; relying solely on _writev is unsafe.

Writable must support single-chunk writes. Without _write, Node can call the base stub and throw ERR_METHOD_NOT_IMPLEMENTED.

Apply:

 class FileWriter extends stream.Writable {
@@
     async _writev(chunks, callback) {
@@
     }
+
+    /**
+     * Implements single-chunk write for Writable.
+     * @param {Buffer} chunk
+     * @param {BufferEncoding} _encoding
+     * @param {(error?: Error|null) => void} callback
+     */
+    async _write(chunk, _encoding, callback) {
+        try {
+            await this.write_buffers([chunk], chunk.length);
+            return callback();
+        } catch (err) {
+            console.error('FileWriter._write: failed', err);
+            return callback(err);
+        }
+    }
♻️ Duplicate comments (2)
src/util/file_reader.js (2)

62-64: Validate and clamp start/end; prevent inverted/negative ranges.

Negative or inverted ranges can set pos < 0 or end < start. Clamp both and coerce to numbers.

Apply:

-        this.start = start || 0;
-        this.end = Math.min(stat.size, end ?? Infinity);
-        this.pos = this.start;
+        const start_num = Number.isFinite(start) ? Number(start) : 0;
+        this.start = Math.max(0, start_num);
+        const requested_end = Number.isFinite(end) ? Number(end) : Infinity;
+        const end_num = Math.min(stat.size, requested_end);
+        this.end = Math.max(this.start, end_num);
+        this.pos = this.start;

171-178: Don’t clear buffer_pool_cleanup before write; avoid buffer leak on sync throw.

If target_stream.write throws synchronously, the buffer isn’t released. Keep cleanup until write succeeds, and release on the exception path.

-                const data = buffer.subarray(0, nread);
-                buffer_pool_cleanup = null; // cleanup is now in the socket responsibility
-                const write_ok = target_stream.write(data, null, callback);
+                const data = buffer.subarray(0, nread);
+                let write_ok;
+                try {
+                    write_ok = target_stream.write(data, null, callback);
+                    buffer_pool_cleanup = null; // cleanup is now in the socket responsibility
+                } catch (err) {
+                    buffer_pool_cleanup = null;
+                    callback();
+                    throw err;
+                }
                 if (!write_ok) {
🧹 Nitpick comments (2)
src/util/file_reader.js (1)

98-100: Prefer this.destroy(err) in streams over emit('error').

Destroying the stream is the idiomatic way to error a Readable.

-        } catch (err) {
-            this.emit('error', err);
-        }
+        } catch (err) {
+            this.destroy(err);
+        }
src/sdk/namespace_fs.js (1)

1151-1157: Guard avg_buffer against divide-by-zero in empty reads.

For zero-length ranges, num_buffers can be 0, producing NaN.

-                avg_buffer: file_reader.num_bytes / file_reader.num_buffers,
+                avg_buffer: file_reader.num_buffers ? (file_reader.num_bytes / file_reader.num_buffers) : 0,
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d5b3a3 and 3e8d1c0.

📒 Files selected for processing (12)
  • config.js (9 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (34 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/for_await_stream.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (2 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (8 hunks)
  • src/util/native_fs_utils.js (8 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/tools/file_writer_hashing.js
  • src/sdk/object_io.js
  • src/sdk/bucketspace_fs.js
  • src/test/unrelated/stream_pipe_to_multiple_targets.js
  • src/util/native_fs_utils.js
  • src/test/unit_tests/internal/test_file_reader.test.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unrelated/for_await_stream.js
🧠 Learnings (2)
📚 Learning: 2025-07-30T10:20:09.402Z
Learnt from: liranmauda
Repo: noobaa/noobaa-core PR: 9174
File: config.js:1044-1046
Timestamp: 2025-07-30T10:20:09.402Z
Learning: The NooBaa config.js file has a built-in environment variable override system via `load_config_env_overrides()` function. Any config property can be overridden using environment variables prefixed with `CONFIG_JS_`. For example, `CONFIG_JS_NC_HEALTH_BUCKETS_COUNT_LIMIT=10000` would override `config.NC_HEALTH_BUCKETS_COUNT_LIMIT`. The system handles type conversion automatically and provides consistent override patterns across all config values.

Applied to files:

  • config.js
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • config.js
  • src/sdk/namespace_fs.js
🧬 Code graph analysis (4)
config.js (2)
src/cmd/manage_nsfs.js (10)
  • config (18-18)
  • data (125-139)
  • data (214-214)
  • data (422-422)
  • data (492-512)
  • data (564-564)
  • data (761-761)
  • data (849-849)
  • data (914-914)
  • config_root (65-65)
src/test/integration_tests/nc/cli/test_nc_health.js (2)
  • fs (12-12)
  • config_root (60-60)
src/util/file_writer.js (2)
src/sdk/object_io.js (5)
  • stream_utils (14-14)
  • require (19-19)
  • require (20-20)
  • require (21-21)
  • config (9-9)
src/util/file_reader.js (2)
  • stream_utils (7-7)
  • config (5-5)
src/util/file_reader.js (1)
src/sdk/namespace_fs.js (7)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • config (14-14)
  • nb_native (24-24)
  • multi_buffer_pool (33-54)
src/sdk/namespace_fs.js (1)
src/util/file_reader.js (2)
  • stream (4-4)
  • nb_native (6-6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (8)
src/util/file_reader.js (1)

206-212: Correct stats method for reads.

Using update_nsfs_read_stats for read operations is correct. LGTM.

src/test/unrelated/for_await_stream.js (1)

69-69: Use of stream.promises.finished is correct.

Change aligns tests with native Promise-based streams.

src/sdk/namespace_fs.js (4)

1119-1131: Good integration of FileReader with abort, buffer pool, and stats.

Instantiation passes fs_context, stat, pool, signal, and stats correctly. LGTM.


1147-1150: Waiting on finished(res) is correct in both paths.

Ensures errors on the writable propagate before returning. LGTM.

Also applies to: 1058-1062


2645-2651: Config gate for bucket path stat is a nice performance lever.

Honors NSFS_CHECK_BUCKET_PATH_EXISTS; clear and safe.


1-3743: Tests exist for reader/writer paths — coverage confirmed.

  • FileReader.read_into_stream is covered by unit tests in src/test/unit_tests/internal/test_file_reader.test.js (includes range and abort cases).
  • FileWriter.write_entire_stream is exercised by FileWriter unit tests (src/test/unit_tests/internal/test_file_writer.test.js) and by NSFS upload/read tests that exercise upload_object, upload_multipart and complete_object_upload (src/test/unit_tests/nsfs/test_namespace_fs.js and src/test/unit_tests/nsfs/test_namespace_fs_mpu.js).
config.js (1)

1183-1183: Stop logging raw config/env values; redact or remove to avoid secret leakage.

Current logs print entire config objects and env override values. This can expose tokens/keys in logs.

Apply minimal safe logging:

-        console.log('load_config_local: LOADED', local_config);
+        console.log('load_config_local: LOADED');
@@
-                console.log(`config: load config from env ${conf_name}=${n} (number)`);
+                console.log(`config: load config from env ${conf_name} (number)`);
@@
-                    console.log(`config: load config from env ${conf_name}=true (bool)`);
+                    console.log(`config: load config from env ${conf_name} (bool=true)`);
@@
-                    console.log(`config: load config from env ${conf_name}=false (bool)`);
+                    console.log(`config: load config from env ${conf_name} (bool=false)`);
@@
-                console.log(`config: load config from env ${conf_name}=${val} (string)`);
+                console.log(`config: load config from env ${conf_name} (string)`);
@@
-                console.log(`config: load config from env ${conf_name}=${val} (object of type ${Array.isArray(prev_val) ? 'array' : 'json'})`);
+                console.log(`config: load config from env ${conf_name} (object of type ${Array.isArray(prev_val) ? 'array' : 'json'})`);
@@
-        console.log(`nsfs: config.json= ${util.inspect(config_data)}`);
-        console.log(`nsfs: merged config.json= ${util.inspect(merged_config)}`);
+        console.log('nsfs: config.json loaded');
+        console.log('nsfs: merged config.json loaded');

Optionally, introduce a small redaction helper if you still want key/value logs.

Based on learnings

Also applies to: 1211-1232, 1342-1344

⛔ Skipped due to learnings
Learnt from: Neon-White
Repo: noobaa/noobaa-core PR: 9196
File: tools/ibm_runner_helpers/run_containerized_warp_on_cloud_runner.sh:2-5
Timestamp: 2025-09-18T07:12:14.304Z
Learning: In the NooBaa project's IBM Cloud Warp testing infrastructure, debug logs with `set -x` are kept unmasked even when containing webhook URLs or other secrets because the logs are only accessible within the provisioned VM by permitted team members, and the debugging value outweighs secret exposure concerns in this controlled internal environment.
src/util/stream_utils.js (1)

28-38: Pipeline refactor looks good; verify no lingering wait_finished imports/usages.

Since wait_finished was removed and native finished/pipeline are used, ensure no stale references remain.

Also applies to: 58-60

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
src/util/file_reader.js (2)

62-65: Validate and clamp start/end to avoid negative or inverted ranges

Coerce to numbers; clamp start >= 0; clamp end to [start, stat.size]. Prevents negative pos and ensures safe reads.

-        this.start = start || 0;
-        this.end = Math.min(stat.size, end ?? Infinity);
+        const s = Number.isFinite(start) ? Number(start) : 0;
+        const e = Number.isFinite(end) ? Number(end) : Infinity;
+        this.start = Math.max(0, s);
+        this.end = Math.min(stat.size, Math.max(this.start, e));
         this.pos = this.start;

171-178: Buffer leak on synchronous write() throw — keep cleanup handle until write succeeds

If target_stream.write() throws synchronously, buffer_pool_cleanup was already nulled, so the buffer isn’t returned to the pool. Wrap write() in try/catch and null the handle only after a successful call.

-                const data = buffer.subarray(0, nread);
-                buffer_pool_cleanup = null; // cleanup is now in the socket responsibility
-                const write_ok = target_stream.write(data, null, callback);
+                const data = buffer.subarray(0, nread);
+                let write_ok;
+                try {
+                    write_ok = target_stream.write(data, null, callback);
+                    buffer_pool_cleanup = null; // cleanup is now in the socket responsibility
+                } catch (err) {
+                    // return buffer to pool before rethrowing
+                    const cleanup = buffer_pool_cleanup;
+                    buffer_pool_cleanup = null;
+                    try { cleanup && cleanup(); } finally { throw err; }
+                }
src/util/native_fs_utils.js (1)

895-903: Add missing JSDoc param and make lazy buffer init race-safe (benign race today)

  • Document @param {string} file_path used in the log.
  • Optionally use local-then-assign to avoid brief double-allocation under concurrent calls.
- * @param {nb.NativeFSStats} stat
- * @param {number} pos
+ * @param {nb.NativeFSStats} stat
+ * @param {number} pos
+ * @param {string} file_path
 ...
-    if (!warmup_buffer) {
-        warmup_buffer = nb_native().fs.dio_buffer_alloc(4096);
-    }
+    if (!warmup_buffer) {
+        const buf = nb_native().fs.dio_buffer_alloc(4096);
+        // best-effort single assignment; harmless if two callers race
+        warmup_buffer = warmup_buffer || buf;
+    }
🧹 Nitpick comments (4)
src/util/stream_utils.js (1)

12-14: JSDoc: document the options parameter

wait_drain accepts options (e.g., AbortSignal) but JSDoc omits it. Add @param {import('events').EventOptions} [options] (or {Abortable}) for clarity.

src/util/file_writer.js (2)

56-63: MD5 update + write in parallel is fine; consider single-call MD5 update for fewer awaits

Running MD5Async.update alongside writev is okay. If MD5Async supports batching (single call with concatenated buffers), prefer that to reduce per-buffer async overhead. Otherwise, keep as-is.


142-169: Use dbg.error instead of console.error for consistency and log routing

Unify logging with the module’s dbg to respect log levels/config.

-            console.error('FileWriter._writev: failed', err);
+            dbg.error('FileWriter._writev: failed', err);
...
-            console.error('FileWriter._final: failed', err);
+            dbg.error('FileWriter._final: failed', err);
src/sdk/namespace_fs.js (1)

1151-1157: Guard avg_buffer against zero division in logs

When no buffers were read (e.g., zero-length object), num_buffers is 0 and avg_buffer becomes NaN. Guard for clarity.

-                avg_buffer: file_reader.num_bytes / file_reader.num_buffers,
+                avg_buffer: file_reader.num_buffers ? (file_reader.num_bytes / file_reader.num_buffers) : 0,
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e8d1c0 and ff37946.

📒 Files selected for processing (12)
  • config.js (2 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (34 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/for_await_stream.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (2 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (8 hunks)
  • src/util/native_fs_utils.js (8 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • src/sdk/bucketspace_fs.js
  • src/sdk/object_io.js
  • src/test/unrelated/for_await_stream.js
  • src/test/unit_tests/internal/test_file_reader.test.js
  • src/test/unrelated/stream_pipe_to_multiple_targets.js
  • src/tools/file_writer_hashing.js
  • config.js
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/sdk/namespace_fs.js
🧬 Code graph analysis (3)
src/util/file_reader.js (3)
src/sdk/namespace_fs.js (9)
  • stream (11-11)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • config (14-14)
  • nb_native (24-24)
  • native_fs_utils (21-21)
  • multi_buffer_pool (33-54)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (40-40)
src/util/native_fs_utils.js (8)
  • config (10-10)
  • nb_native (12-12)
  • nb_native (726-726)
  • fs_context (605-610)
  • file (124-124)
  • file (815-815)
  • buffer_pool_cleanup (163-163)
  • data (182-182)
src/sdk/namespace_fs.js (1)
src/util/file_reader.js (2)
  • stream (4-4)
  • nb_native (6-6)
src/util/file_writer.js (3)
src/sdk/object_io.js (4)
  • stream_utils (14-14)
  • require (19-19)
  • require (20-20)
  • require (21-21)
src/util/file_reader.js (1)
  • stream_utils (7-7)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (1)
src/util/stream_utils.js (1)

33-55: Unable to verify review comment due to inaccessible codebase files

Cannot locate src/util/stream_utils.js, .nvmrc, or package.json engines field to verify Node version baseline and code implementation. The array-form signature for stream.promises.pipeline is supported starting in Node.js v15.0.0, but without access to the repository's actual baseline and implementation, the specific concerns cannot be confirmed.

Please manually verify:

  1. Your repository's minimum Node version (check .nvmrc, package.json engines, or CI config)
  2. Whether resume() is indeed called after the pipeline promise resolves and if this deadlock mitigation is still necessary for your Node baseline
  3. The full implementation of wait_drain() to confirm JSDoc completeness

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/util/file_reader.js (1)

62-63: Validate and clamp start/end range parameters.

Lines 62-63 don't validate the start and end parameters, which can lead to negative or inverted ranges being passed to file operations. This issue was flagged in a previous review and remains unaddressed.

Apply this fix to ensure valid ranges:

-        this.start = start || 0;
-        this.end = Math.min(stat.size, end ?? Infinity);
+        this.start = Math.max(0, Number.isFinite(start) ? start : 0);
+        const requested_end = Number.isFinite(end) ? end : stat.size;
+        this.end = Math.min(stat.size, Math.max(this.start, requested_end));
         this.pos = this.start;

This ensures:

  • start is non-negative and a valid number
  • end doesn't exceed file size
  • end >= start (preventing inverted ranges)
  • pos is initialized to a safe value
🧹 Nitpick comments (3)
src/sdk/object_io.js (1)

438-438: Migration correct; consider removing redundant wait.

The migration to stream.promises.finished is correct. However, the inline comment is accurate: this call is redundant since stream_utils.pipeline (line 437) already waits for all streams to finish.

If you'd like to remove the redundancy:

     await stream_utils.pipeline(transforms);
-    await stream.promises.finished(uploader); // probably redundant and pipeline already does it
src/tools/file_writer_hashing.js (1)

74-79: Type cast to any suggests API mismatch.

The /**@type {any}*/ cast on line 74 works around a type incompatibility between TargetHash and the expected FileWriter target type. While functional, this bypasses type safety.

Consider one of these approaches:

  1. Update TargetHash to implement the expected interface/type that FileWriter expects
  2. Update FileWriter to accept a union type that includes TargetHash's interface
  3. Document why the cast is necessary if the mismatch is intentional

The simplified write_entire_stream call on line 78 is an improvement over the previous two-step pipeline/wait_finished pattern.

src/sdk/namespace_fs.js (1)

1151-1156: Guard avg_buffer calculation for empty reads.

file_reader.num_buffers can stay at 0 for zero-byte objects, so file_reader.num_bytes / file_reader.num_buffers turns into NaN, making the log noisy. A simple guard keeps the metric meaningful without affecting hot paths.

-                num_buffers: file_reader.num_buffers,
-                avg_buffer: file_reader.num_bytes / file_reader.num_buffers,
+                num_buffers: file_reader.num_buffers,
+                avg_buffer: file_reader.num_buffers
+                    ? file_reader.num_bytes / file_reader.num_buffers
+                    : 0,
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff37946 and 37f7624.

📒 Files selected for processing (12)
  • config.js (2 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (34 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/for_await_stream.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (3 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (8 hunks)
  • src/util/native_fs_utils.js (8 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/sdk/bucketspace_fs.js
  • src/test/unit_tests/internal/test_file_reader.test.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unrelated/for_await_stream.js
  • src/test/unrelated/stream_pipe_to_multiple_targets.js
🧠 Learnings (3)
📚 Learning: 2025-07-30T10:20:09.402Z
Learnt from: liranmauda
Repo: noobaa/noobaa-core PR: 9174
File: config.js:1044-1046
Timestamp: 2025-07-30T10:20:09.402Z
Learning: The NooBaa config.js file has a built-in environment variable override system via `load_config_env_overrides()` function. Any config property can be overridden using environment variables prefixed with `CONFIG_JS_`. For example, `CONFIG_JS_NC_HEALTH_BUCKETS_COUNT_LIMIT=10000` would override `config.NC_HEALTH_BUCKETS_COUNT_LIMIT`. The system handles type conversion automatically and provides consistent override patterns across all config values.

Applied to files:

  • config.js
📚 Learning: 2025-06-17T12:59:51.543Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9042
File: src/util/cloud_utils.js:183-194
Timestamp: 2025-06-17T12:59:51.543Z
Learning: In the set_noobaa_s3_connection function in src/util/cloud_utils.js, internal NooBaa S3 endpoints (filtered by 'api': 's3', 'kind': 'INTERNAL') will always be non-secure, so tls: false is the correct setting despite the conditional https:// scheme logic.

Applied to files:

  • src/test/unrelated/stream_pipe_to_multiple_targets.js
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/sdk/namespace_fs.js
🧬 Code graph analysis (5)
src/util/file_reader.js (2)
src/sdk/namespace_fs.js (8)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • config (14-14)
  • nb_native (24-24)
  • native_fs_utils (21-21)
  • multi_buffer_pool (33-54)
src/util/native_fs_utils.js (6)
  • config (10-10)
  • nb_native (12-12)
  • nb_native (726-726)
  • file (124-124)
  • file (815-815)
  • buffer_pool_cleanup (163-163)
src/util/file_writer.js (2)
src/sdk/object_io.js (5)
  • stream_utils (14-14)
  • require (19-19)
  • require (20-20)
  • require (21-21)
  • buffers (785-785)
src/util/file_reader.js (1)
  • stream_utils (7-7)
src/util/native_fs_utils.js (1)
src/sdk/namespace_fs.js (5)
  • error_utils (17-17)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
src/tools/file_writer_hashing.js (1)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
src/sdk/namespace_fs.js (4)
src/sdk/object_io.js (1)
  • stream (6-6)
src/util/file_reader.js (3)
  • stream (4-4)
  • native_fs_utils (8-8)
  • nb_native (6-6)
src/util/file_writer.js (1)
  • stream (4-4)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (40-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (13)
src/test/unrelated/stream_pipe_to_multiple_targets.js (1)

145-146: LGTM! Clean migration to native stream promises.

The replacement of promisified stream.finished with stream.promises.finished is functionally equivalent and aligns with the broader refactor across the codebase.

src/util/stream_utils.js (2)

28-28: LGTM! Migration to native stream promises.

Replacing the promisified pipeline with stream.promises.pipeline is the correct approach and aligns with the broader refactor.


37-55: LGTM! Enhanced pipeline API with proper safeguards.

The updated signature with optional parameters maintains backward compatibility while enabling customization. The addition of the strm.destroy existence check (line 42) is good defensive programming.

src/test/unrelated/for_await_stream.js (1)

69-69: LGTM! Consistent migration to native stream promises.

The change aligns with the broader refactor and is functionally equivalent.

config.js (2)

830-830: LGTM! New feature flag added.

The new NSFS_CHECK_BUCKET_PATH_EXISTS configuration enables bucket path validation.


1013-1014: Config variable renames verified—all references updated correctly.

Both NFSF_UPLOAD_STREAM_MEM_THRESHOLD and NFSF_DOWNLOAD_STREAM_MEM_THRESHOLD have been successfully renamed to NSFS_UPLOAD_STREAM_MEM_THRESHOLD and NSFS_DOWNLOAD_STREAM_MEM_THRESHOLD respectively. No orphaned references remain in the codebase. The new names are properly used in src/util/file_writer.js and src/util/file_reader.js.

src/tools/file_writer_hashing.js (2)

96-96: LGTM! Defensive directory creation.

Using { recursive: true } ensures the target directory exists without failing if it's already present.


115-115: LGTM! Cleaner streaming with write_entire_stream.

The new API simplifies the code by replacing the separate pipeline and wait_finished calls.

src/util/file_reader.js (5)

80-101: LGTM! Clean Readable stream implementation.

The _read implementation correctly handles the readable stream contract, properly allocating buffers and pushing data or EOF signals.


109-118: LGTM! Efficient read with sparse file support.

The read_into_buffer method correctly coordinates warmup, abort checking, and stats tracking.


131-192: LGTM! Well-designed buffer pool streaming with proper cleanup.

The read_into_stream implementation correctly:

  • Prevents misuse with Transform streams (lines 132-134)
  • Manages buffer pool lifecycle (cleanup invoked in write callback or finally block)
  • Handles backpressure with drain promises
  • Transfers cleanup responsibility to the socket after successful write (lines 173-174)

The fix for buffer cleanup ordering (write before nulling cleanup) ensures the finally block can still clean up if write() throws synchronously.


197-212: LGTM! Correct stats tracking for reads.

The use of update_nsfs_read_stats (line 206) is correct for read operations. The histogram and one-time counting logic is well-designed.


217-222: LGTM! Conditional sparse file warmup.

The warmup logic is properly gated by configuration and file characteristics.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/util/file_writer.js (1)

50-62: Consider explicitly handling the conditional Promise in Promise.all.

The pattern this.MD5Async && this._update_md5(buffers, size) in the Promise.all array will evaluate to undefined when MD5 is disabled. While Promise.all handles undefined values (they resolve immediately), this is implicit and may be unclear to future maintainers.

For clarity, consider using a ternary or filtering:

-        await Promise.all([
-            this.MD5Async && this._update_md5(buffers, size),
-            this._write_all_buffers(buffers, size),
-        ]);
+        const tasks = [this._write_all_buffers(buffers, size)];
+        if (this.MD5Async) tasks.push(this._update_md5(buffers, size));
+        await Promise.all(tasks);

Or with a ternary:

-        await Promise.all([
-            this.MD5Async && this._update_md5(buffers, size),
-            this._write_all_buffers(buffers, size),
-        ]);
+        await Promise.all([
+            this.MD5Async ? this._update_md5(buffers, size) : Promise.resolve(),
+            this._write_all_buffers(buffers, size),
+        ]);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37f7624 and f8b89b2.

📒 Files selected for processing (3)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (8 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/util/file_reader.js
🧰 Additional context used
🧬 Code graph analysis (1)
src/util/file_writer.js (2)
src/util/file_reader.js (1)
  • stream_utils (7-7)
src/util/stream_utils.js (1)
  • err (38-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (10)
src/util/stream_utils.js (3)

31-35: LGTM! Good API enhancement with backwards compatibility.

The updated signature properly defaults reuse_last_stream to false and adds an options parameter for enhanced control (e.g., abort signals). The JSDoc accurately reflects these changes.


40-40: Good defensive check before destroying.

The added !strm.destroyed guard prevents attempting to destroy an already-destroyed stream, avoiding potential errors.


48-52: Migration to native stream.promises.pipeline is solid.

The shift from a promisified custom pipeline to the native stream.promises.pipeline aligns with modern Node.js practices. The conditional .resume() logic correctly avoids resuming when reuse_last_stream is true, and the check for 'readable' in last ensures it only applies to readable streams.

src/util/file_writer.js (7)

7-8: Import paths updated correctly.

The relative imports now use ./ prefix consistently for local modules, which is clearer and aligns with the module structure.


18-28: Good type safety improvements and typo fix.

The constructor now uses proper TypeScript-style JSDoc types (nb.NativeFile, nb.NativeFSContext) and correctly marks optional parameters. The fix from NFSF_UPLOAD_STREAM_MEM_THRESHOLD to NSFS_UPLOAD_STREAM_MEM_THRESHOLD corrects what appears to be a typo, aligning with the NSFS naming convention used throughout the codebase.


42-48: Good public API addition for streaming.

The write_entire_stream method provides a clean interface for piping a source stream through the FileWriter. The use of stream_utils.pipeline with reuse_last_stream=false is correct, ensuring the FileWriter stream completes properly after the pipeline finishes.

Based on learnings (past review comment shows this approach was validated).


64-72: Clean extraction of finalization logic.

Extracting the MD5 finalization into a public finalize() method improves reusability and testability. The conditional check ensures it's only called when MD5 is enabled.


146-159: Good refactor delegating to write_buffers.

The _writev method now delegates to the extracted write_buffers method, reducing code duplication and improving maintainability. The error handling remains consistent.


167-175: New _write implementation follows the delegation pattern.

The single-chunk write correctly delegates to write_buffers with a single-element array, maintaining consistency with _writev and reusing the same logic path.


181-189: Good refactor delegating to finalize.

The _final method now delegates to the extracted finalize() method, improving code reuse and making the finalization logic accessible outside the stream lifecycle.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/util/file_reader.js (1)

44-74: Range params: confirm upstream sanitization (or minimally clamp here)

Constructor trusts start/end. If upstream always normalizes, fine; otherwise consider minimal clamping to avoid edge surprises.

Proposed minimal guard if needed:

-        this.start = start || 0;
-        this.end = Math.min(stat.size, end ?? Infinity);
+        this.start = Math.max(0, Number(start) || 0);
+        const req_end = Number.isFinite(end) ? Number(end) : Infinity;
+        this.end = Math.min(stat.size, Math.max(this.start, req_end));
🧹 Nitpick comments (4)
src/util/native_fs_utils.js (1)

882-907: Trim log noise and skip warmup at EOF

Reduce log level and avoid a 1‑byte read when pos >= stat.size. Also keep the shared buffer approach.

Apply:

-    dbg.log0('warmup_sparse_file', {
+    dbg.log1('warmup_sparse_file', {
         file_path, pos, size: stat.size, blocks: stat.blocks,
     });
+    if (pos >= stat.size) return;
@@
-    warmup_buffer ||= nb_native().fs.dio_buffer_alloc(4096);
-    await file.read(fs_context, warmup_buffer, 0, 1, pos);
+    warmup_buffer ||= nb_native().fs.dio_buffer_alloc(4096);
+    await file.read(fs_context, warmup_buffer, 0, 1, pos);
src/util/file_writer.js (2)

42-49: Drop redundant finished() after pipeline

pipeline already resolves when the destination finishes. The extra await is unnecessary.

-    await stream_utils.pipeline([source_stream, this], false, options);
-    await stream.promises.finished(this); // probably redundant and pipeline already does it
+    await stream_utils.pipeline([source_stream, this], false, options);

57-63: Avoid undefined entries in Promise.all

Filter out the MD5 branch when disabled; keeps Promise.all inputs clean.

-        await Promise.all([
-            this.MD5Async && this._update_md5(buffers, size),
-            this._write_all_buffers(buffers, size),
-        ]);
+        const tasks = [this._write_all_buffers(buffers, size)];
+        if (this.MD5Async) tasks.push(this._update_md5(buffers, size));
+        await Promise.all(tasks);
src/util/file_reader.js (1)

131-195: Safer write signature and early exit on destroyed destination

  • Writable.write with Buffer doesn’t need an encoding arg; pass (data, callback).
  • If target is already destroyed, abort immediately instead of continuing reads.
-        if (target_stream instanceof stream.Transform) {
+        if (target_stream instanceof stream.Transform) {
             throw new Error('FileReader read_into_stream must be called with a Writable stream, not a Transform stream');
         }
-        // cheap fast-fail if target_stream is destroyed to stop earlier
-        if (target_stream.destroyed) this.signal.throwIfAborted();
+        // Fast-fail on a destroyed destination
+        if (target_stream.destroyed) throw new Error('FileReader target stream already destroyed');
@@
-                const write_ok = target_stream.write(data, null, callback);
+                const write_ok = target_stream.write(data, callback);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8b89b2 and 58c3260.

📒 Files selected for processing (12)
  • config.js (2 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (26 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/for_await_stream.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (3 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (7 hunks)
  • src/util/native_fs_utils.js (7 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • src/sdk/object_io.js
  • src/sdk/bucketspace_fs.js
  • src/util/stream_utils.js
  • config.js
  • src/test/unrelated/stream_pipe_to_multiple_targets.js
  • src/tools/file_writer_hashing.js
  • src/test/unit_tests/internal/test_file_reader.test.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unrelated/for_await_stream.js
🧠 Learnings (1)
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/sdk/namespace_fs.js
🧬 Code graph analysis (4)
src/util/file_writer.js (3)
src/sdk/object_io.js (6)
  • stream_utils (14-14)
  • require (19-19)
  • require (20-20)
  • require (21-21)
  • config (9-9)
  • stream (6-6)
src/util/file_reader.js (3)
  • stream_utils (7-7)
  • config (5-5)
  • stream (4-4)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (38-38)
src/util/file_reader.js (4)
src/sdk/namespace_fs.js (9)
  • stream (11-11)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • config (14-14)
  • nb_native (24-24)
  • native_fs_utils (21-21)
  • multi_buffer_pool (33-54)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (38-38)
src/util/native_fs_utils.js (8)
  • config (10-10)
  • nb_native (12-12)
  • nb_native (726-726)
  • fs_context (605-610)
  • file (124-124)
  • file (815-815)
  • buffer_pool_cleanup (163-163)
  • data (182-182)
src/test/unit_tests/internal/test_file_reader.test.js (2)
  • native_fs_utils (8-8)
  • fs_context (12-12)
src/util/native_fs_utils.js (2)
src/sdk/namespace_fs.js (5)
  • error_utils (17-17)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
src/sdk/bucketspace_fs.js (5)
  • require (15-23)
  • require (24-24)
  • require (25-25)
  • require (27-27)
  • require (30-30)
src/sdk/namespace_fs.js (2)
src/util/file_reader.js (1)
  • stream (4-4)
src/util/file_writer.js (1)
  • stream (4-4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (18)
src/util/native_fs_utils.js (5)

70-100: open_file retry flow and dir creation look correct

Logic for 'wt'/'w' path creation and ENOENT retries is sound; debug tags are consistent.


102-150: Scoped file helper is solid

Good try/catch/finally discipline; defaults flow through open_file; close errors downgraded to warn.


865-879: Sparse check is OK

Using stat.blocks*512 < stat.size is the right heuristic on Linux.


411-411: Good: standardized error construction

Switch to error_utils.new_error_code('EEXIST', ...) improves consistency.


914-915: Export surface looks right

Exposing use_file, is_sparse_file, and warmup_sparse_file aligns with the new FileReader usage.

Also applies to: 954-955

src/util/file_writer.js (1)

27-40: Stream writer refactor looks good

Constructor thresholds, IOV_MAX batching, MD5 finalize, and Writable overrides are correct.

Also applies to: 93-137, 142-190

src/util/file_reader.js (1)

80-101: Reader core looks correct

Readable flow, pooled path, stats updates, sparse warmup hook, and export are all good.

Also applies to: 103-119, 196-214, 216-224, 441-444

src/sdk/namespace_fs.js (11)

969-996: Abort points added — good

Strategic object_sdk.throw_if_aborted() calls before/after retries reduce wasted work on canceled requests.


1056-1062: Empty directory content short‑circuit is correct

Ending the response and waiting for finished with a signal is the right behavior.


1110-1112: Range parsing: confirm negative/invalid values are sanitized

If callers can pass negative start or end < start, consider clamping or rejecting; otherwise, confirm higher layers sanitize.


1119-1131: FileReader integration and completion wait look good

Using FileReader.read_into_stream with pool, then res.end() and stream.promises.finished(res, { signal }) is appropriate.

Also applies to: 1133-1146, 1147-1150


1605-1626: Upload path: signal‑aware streaming is correct

Passing AbortSignal to write_entire_stream and reusing read_object_stream for copy_source is clean.


1699-1701: Good: create_object_upload uses umasked mode

More consistent permissions handling.


1799-1811: Multipart listing map/filter is fine

Parsing part number, size, etag, mtime is straightforward.


2122-2130: Return version_id on tag delete and set prefixed keys on put

Behavior matches S3 expectations.


2523-2525: Version entry key helper OK

Consistent path construction for .versions entries.


2571-2589: Restore status derivation looks correct

Using file path from _get_file_path is consistent with directory objects handling.


2645-2646: Bucket existence guard behind config

Reasonable toggle; ensure defaults suit production.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/util/file_writer.js (1)

46-49: Consider removing redundant finished() call.

Line 48's stream.promises.finished(this) is likely unnecessary since stream_utils.pipeline already waits for all streams to finish. The inline comment acknowledges this redundancy. Unless there's a specific edge case you're guarding against, removing it would simplify the code.

If you prefer to keep it for extra safety, consider documenting why:

-        await stream.promises.finished(this); // probably redundant and pipeline already does it
+        // Explicitly wait for finalization as a defensive measure
+        await stream.promises.finished(this);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58c3260 and bde7855.

📒 Files selected for processing (12)
  • config.js (2 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (26 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/for_await_stream.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (3 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (7 hunks)
  • src/util/native_fs_utils.js (7 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • src/test/unrelated/for_await_stream.js
  • src/test/unrelated/stream_pipe_to_multiple_targets.js
  • src/sdk/object_io.js
  • src/sdk/bucketspace_fs.js
  • src/test/unit_tests/internal/test_file_reader.test.js
  • src/sdk/namespace_fs.js
  • config.js
🧰 Additional context used
🧬 Code graph analysis (4)
src/util/file_reader.js (1)
src/sdk/namespace_fs.js (7)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • config (14-14)
  • nb_native (24-24)
  • multi_buffer_pool (33-54)
src/tools/file_writer_hashing.js (1)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
src/util/native_fs_utils.js (2)
src/sdk/namespace_fs.js (5)
  • error_utils (17-17)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
src/sdk/bucketspace_fs.js (5)
  • require (15-23)
  • require (24-24)
  • require (25-25)
  • require (27-27)
  • require (30-30)
src/util/file_writer.js (2)
src/util/file_reader.js (3)
  • stream_utils (7-7)
  • config (5-5)
  • stream (4-4)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (38-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (22)
src/util/stream_utils.js (2)

36-43: LGTM – Defensive check before destroy.

The added strm.destroy check on line 40 prevents calling destroy on streams that lack the method, improving robustness.


48-52: LGTM – Clean migration to native stream.promises.pipeline.

The refactor correctly delegates to Node.js's native pipeline implementation and preserves the conditional resume workaround with proper readable checks.

src/util/file_reader.js (6)

44-74: LGTM – Constructor properly initializes reader state.

The constructor sets up all necessary fields for file reading with sensible defaults. The start/end handling was previously discussed and deemed appropriate for this level.


80-101: LGTM – Standard Readable stream implementation.

The _read method correctly handles EOF, partial reads, and errors. The per-chunk allocation is intentional here, with the optimized read_into_stream method avoiding this overhead.


109-118: LGTM – Clean buffered read with abort support.

The method properly sequences warmup, abort checking, and the actual read, updating position and stats as needed.


131-194: LGTM – Well-structured buffer-pooled streaming with proper lifecycle management.

The method correctly:

  • Rejects Transform streams (line 132-134) to ensure buffers aren't held
  • Manages buffer pool cleanup across all paths (fixed per previous review)
  • Handles backpressure with drain promises
  • Checks abort signal at critical points

The buffer cleanup timing on lines 175-176 is now correct—if write() throws, cleanup happens in the finally block.


199-214: LGTM – Correct stats tracking for read operations.

The stats are properly updated with update_nsfs_read_stats (corrected from the previous review), and the histogram tracking is well-implemented.


219-224: LGTM – Conditional warmup with abort support.

The method properly gates warmup behind config and sparse-file checks, with abort signal verification before the expensive operation.

src/util/native_fs_utils.js (5)

70-100: LGTM – Formatting and logging improvements.

The multiline function signature improves readability, and the log messages now correctly reference native_fs_utils.


116-150: LGTM – Clean resource management pattern.

The use_file helper properly ensures file closure in all paths, with appropriate error logging. Swallowing close errors in the finally block is reasonable since the primary operation has already completed or failed.


411-411: LGTM – Consistent error code usage.

Replacing the generic Error with error_utils.new_error_code('EEXIST', ...) ensures consistent error handling across the codebase.


876-878: LGTM – Standard sparse file detection.

The implementation uses the standard Unix formula blocks * 512 < size, and the comprehensive JSDoc correctly explains the limitations for archive/cache scenarios.


896-907: LGTM – Efficient sparse file warmup with shared buffer.

The lazy buffer allocation and 1-byte read are appropriate for triggering file recall/cache loading. The shared buffer is safe since the data isn't examined (confirmed in previous review that dio_buffer_alloc is synchronous).

src/tools/file_writer_hashing.js (3)

61-92: LGTM – Simplified with write_entire_stream.

The refactor correctly uses the new write_entire_stream method, eliminating manual pipeline management. The any cast on line 74 is appropriate for the test mock that implements a compatible interface.


96-96: LGTM – Robust directory creation.

Adding recursive: true ensures the entire path is created, improving robustness when the parent directories don't exist.


94-145: LGTM – Consistent refactor to write_entire_stream.

The file_target function correctly uses the new streaming API, consistent with the hash_target changes.

src/util/file_writer.js (6)

27-40: LGTM – Improved type annotations and consistent config usage.

The constructor now uses precise nb.NativeFile and nb.NativeFSContext types, and the highWaterMark correctly references NSFS_UPLOAD_STREAM_MEM_THRESHOLD.


57-63: LGTM – Well-structured write_buffers with parallel MD5.

The method cleanly separates concerns and runs MD5 calculation in parallel with file writes, optimizing throughput.


68-73: LGTM – Clean MD5 finalization.

The finalize method appropriately extracts and documents the digest calculation step.


110-123: LGTM – Proper IOV_MAX handling.

The method correctly batches buffers to respect platform limits, with accurate size calculation for each batch.


131-136: LGTM – Clean file write with offset management.

The method properly handles both positioned writes (offset >= 0) and appends (offset < 0), with appropriate debug logging.


147-190: LGTM – Clean delegation to extracted methods.

The Writable stream methods (_writev, _write, _final) now properly delegate to the extracted logic with consistent error handling and logging.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/sdk/namespace_fs.js (1)

1798-1810: Avoid unconstrained Promise.all when listing multipart parts.

Promise.all over all part entries can spike concurrent fs.stat calls on large MPU folders. Use the existing P.map (or P.map_with_concurrency) to cap parallelism.

Apply:

-            const multiparts = await Promise.all(entries
-                .filter(e => e.name.startsWith('part-'))
-                .map(async e => {
+            const part_entries = entries.filter(e => e.name.startsWith('part-'));
+            const multiparts = await P.map_with_concurrency(32, part_entries, async e => {
                     const num = Number(e.name.slice('part-'.length));
                     const part_path = path.join(params.mpu_path, e.name);
                     const stat = await nb_native().fs.stat(fs_context, part_path);
                     return {
                         num,
                         size: Number(stat.xattr[XATTR_PART_SIZE]),
                         etag: this._get_etag(stat),
                         last_modified: new Date(stat.mtime),
                     };
-                })
-            );
+            });
♻️ Duplicate comments (1)
src/util/file_reader.js (1)

62-65: Clamp start/end to avoid negative offsets into native read.

Negative start leads to this.pos < 0, and a subsequent native read at a negative position. Clamp here to guarantee safe bounds, even if upstream validates most cases.

Apply:

-        this.start = start || 0;
-        this.end = Math.min(stat.size, end ?? Infinity);
+        const s = Number.isFinite(start) ? start : 0;
+        const e = Number.isFinite(end) ? end : Infinity;
+        this.start = Math.max(0, s);
+        this.end = Math.min(stat.size, Math.max(this.start, e));
         this.pos = this.start;

If you prefer to keep FileReader lean, then clamp at the call site (NamespaceFS.read_object_stream) before constructing FileReader.

🧹 Nitpick comments (4)
config.js (2)

830-831: Make bucket-path stat check toggle explicit in docs/tests.

NSFS_CHECK_BUCKET_PATH_EXISTS defaults true and is honored by NamespaceFS._load_bucket. Please document in config docs and add a smoke test that flips it to false for perf runs, to avoid accidental prod disable.

Would you add/update a short section in README/config docs noting the ENV override name? Based on learnings.


1019-1019: Ignoring EINVAL on list: add light telemetry.

Defaulting NSFS_LIST_IGNORE_ENTRY_ON_EINVAL to true is pragmatic for GPFS (.snapshots). Consider incrementing a counter or log at log1 once per dir to aid diagnosing hidden entries without flooding logs.

src/util/file_reader.js (1)

147-151: Abort-aware buffer acquisition.

Nice TODO. Today, if waiting on the pool/semaphore, abort can’t preempt the wait. Consider propagating AbortSignal into MultiSizeBuffersPool.get_buffer()/semaphore to fail fast on cancellation.

src/sdk/namespace_fs.js (1)

2642-2645: Footgun guard: NSFS_CHECK_BUCKET_PATH_EXISTS skip.

The early return is useful for perf tests; ensure integration/e2e tests cover both true/false. Please add a brief doc note and consider a log1 on disable to surface the behavior at startup.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bde7855 and 6deba71.

📒 Files selected for processing (12)
  • config.js (2 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (26 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/for_await_stream.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (3 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (7 hunks)
  • src/util/native_fs_utils.js (7 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js
  • src/sdk/object_io.js
  • src/test/unit_tests/internal/test_file_reader.test.js
  • src/util/stream_utils.js
  • src/test/unrelated/for_await_stream.js
  • src/tools/file_writer_hashing.js
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/sdk/namespace_fs.js
📚 Learning: 2025-07-30T10:20:09.402Z
Learnt from: liranmauda
Repo: noobaa/noobaa-core PR: 9174
File: config.js:1044-1046
Timestamp: 2025-07-30T10:20:09.402Z
Learning: The NooBaa config.js file has a built-in environment variable override system via `load_config_env_overrides()` function. Any config property can be overridden using environment variables prefixed with `CONFIG_JS_`. For example, `CONFIG_JS_NC_HEALTH_BUCKETS_COUNT_LIMIT=10000` would override `config.NC_HEALTH_BUCKETS_COUNT_LIMIT`. The system handles type conversion automatically and provides consistent override patterns across all config values.

Applied to files:

  • config.js
📚 Learning: 2025-08-08T13:10:36.141Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1314-1317
Timestamp: 2025-08-08T13:10:36.141Z
Learning: In src/server/system_services/pool_server.js (and config usage), the constant config.INTERNAL_STORAGE_POOL_NAME has been removed from the system. Future logic should not depend on this constant and should instead use config.DEFAULT_POOL_NAME or structural markers (e.g., pool.resource_type === 'INTERNAL' or pool.mongo_info) to identify internal/mongo pools.

Applied to files:

  • config.js
🧬 Code graph analysis (5)
src/sdk/namespace_fs.js (3)
src/util/file_reader.js (1)
  • stream (4-4)
src/util/file_writer.js (1)
  • stream (4-4)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (38-38)
src/sdk/bucketspace_fs.js (2)
src/sdk/namespace_fs.js (1)
  • dbg (13-13)
src/util/debug_module.js (2)
  • name (103-103)
  • name (432-432)
src/util/file_reader.js (3)
src/sdk/namespace_fs.js (10)
  • stream (11-11)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • nb_native (24-24)
  • native_fs_utils (21-21)
  • stat (185-185)
  • stat (331-331)
  • stat (366-366)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (38-38)
src/util/native_fs_utils.js (8)
  • nb_native (12-12)
  • nb_native (726-726)
  • fs_context (605-610)
  • file (124-124)
  • file (815-815)
  • stat (457-457)
  • stat (505-505)
  • stat (654-654)
src/util/file_writer.js (3)
src/sdk/object_io.js (5)
  • stream_utils (14-14)
  • require (19-19)
  • require (20-20)
  • require (21-21)
  • stream (6-6)
src/util/file_reader.js (2)
  • stream_utils (7-7)
  • stream (4-4)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (38-38)
src/util/native_fs_utils.js (2)
src/sdk/namespace_fs.js (5)
  • error_utils (17-17)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
src/sdk/bucketspace_fs.js (5)
  • require (15-23)
  • require (24-24)
  • require (25-25)
  • require (27-27)
  • require (30-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: mint-nc-tests / Mint NC Tests
  • GitHub Check: mint-tests / Mint Tests
  • GitHub Check: ceph-s3-tests / Ceph S3 Tests
  • GitHub Check: warp-nc-tests / Warp NC Tests
  • GitHub Check: warp-tests / Warp Tests
  • GitHub Check: ceph-nsfs-s3-tests / NSFS Ceph S3 Tests
  • GitHub Check: run-unit-tests-postgres / Unit Tests with Postgres
  • GitHub Check: run-unit-tests / Unit Tests
  • GitHub Check: run-nc-unit-tests / Non Containerized Unit Tests
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (14)
src/sdk/bucketspace_fs.js (1)

436-436: LGTM: reduced verbosity at lifecycle read.

Raising to log1 is appropriate here and keeps logs cleaner during normal operation.

src/util/file_reader.js (1)

173-181: Good: buffer release ordering prevents leaks on sync write errors.

Calling write() before nulling buffer_pool_cleanup ensures finally can release on synchronous throw; callback handles normal release.

config.js (1)

1013-1015: Verified: no stale references found; implementation is correct.

The search confirms zero references to the old typo name (NFSF_UPLOAD_STREAM_MEM_THRESHOLD). The new constants are properly defined in config.js and correctly used in file_writer.js and file_reader.js. No temporary alias or additional cleanup is needed.

src/util/native_fs_utils.js (7)

13-13: LGTM: Error utilities import added.

The addition of error_utils aligns with the standardized error handling used elsewhere in the codebase (e.g., namespace_fs.js, bucketspace_fs.js).


70-100: LGTM: Improved formatting and logging consistency.

The multiline signature improves readability, and the updated log messages with the native_fs_utils: prefix provide better context for debugging.


102-150: LGTM: Excellent resource management pattern.

The use_file helper provides a clean, scoped file access pattern with guaranteed cleanup. The error handling is comprehensive:

  • Open failures are logged and propagated
  • Scope failures are logged and propagated
  • Close failures are logged but don't suppress scope errors (correct behavior)

This pattern will reduce boilerplate and prevent resource leaks across the codebase.


411-411: LGTM: Standardized error creation.

Using error_utils.new_error_code('EEXIST', ...) is more consistent with the codebase's error handling patterns.


865-878: LGTM: Well-documented heuristic for sparse file detection.

The extensive JSDoc clearly explains the limitations of this heuristic (post-write/pre-sync, archive tier, remote cache) and its intended use case. The formula stat.blocks * 512 < stat.size is the standard POSIX approach for detecting sparse files.


880-907: LGTM: Clever warmup pattern with proper documentation.

The shared warmup_buffer approach is safe because:

  1. The 1-byte read is only to trigger recall/cache load (line 906)
  2. The buffer content is never examined
  3. JavaScript's single-threaded execution with no await point makes line 905's lazy allocation atomic

The extensive JSDoc explains the rationale: avoid tying up large buffers from the pool while waiting for high-latency archive/cache operations. Based on the past review discussion, the concurrency concerns have been properly addressed.


914-914: LGTM: New utilities properly exported.

All three new helpers (use_file, is_sparse_file, warmup_sparse_file) are correctly exported for use across the codebase.

Also applies to: 954-955

src/util/file_writer.js (4)

7-8: LGTM: Import corrections.

The stream_utils import is needed for the new write_entire_stream method, and the debug_module path is corrected from ../util/debug_module to ./debug_module (since file_writer.js is already in src/util).


18-28: LGTM: Constructor refactored appropriately.

The parameter changes align with the PR objectives:

  • Removed unused large_buf_size parameter
  • Fixed config constant typo: NFSF_UPLOAD_STREAM_MEM_THRESHOLDNSFS_UPLOAD_STREAM_MEM_THRESHOLD
  • Refined JSDoc types for better clarity

51-73: LGTM: Clean extraction of reusable methods.

The new write_buffers and finalize methods effectively extract common logic from the stream methods (_writev, _write, _final), making the code more modular and exposing a non-stream API as intended by the PR objectives.

The parallel execution of MD5 update and file writing (line 58-61) is a nice optimization.


143-190: LGTM: Stream methods properly delegate to extracted helpers.

All three stream methods (_writev, _write, _final) now cleanly delegate to the extracted public methods (write_buffers, finalize), with proper error handling and logging. This refactoring:

  • Reduces code duplication
  • Maintains correct stream semantics
  • Enables non-stream usage of the same logic

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/util/native_fs_utils.js (1)

882-895: Add the missing JSDoc parameter for file_path.

In the past review, you acknowledged fixing the missing @param {string} file_path in the JSDoc, but it's still absent from the current code. The parameter is used on line 897 for logging.

Apply this diff:

 /**
  * Our buffer pool keeps large buffers and we want to avoid spending
  * all our large buffers and then have them waiting for high latency calls
  * such as reading from archive/on-demand cache files.
  * Instead, we detect the case where a file is "sparse",
  * and then use just a small buffer to wait for a tiny read,
  * which will recall the file from archive or load from remote into cache,
  * and once it returns we can continue to the full fledged read.
  * @param {nb.NativeFSContext} fs_context
  * @param {nb.NativeFile} file
+ * @param {string} file_path
  * @param {nb.NativeFSStats} stat
  * @param {number} pos
  */
🧹 Nitpick comments (1)
src/util/file_writer.js (1)

46-49: Clarify or remove the uncertain finished() call.

The comment "probably redundant and pipeline already does it" indicates uncertainty. Looking at stream_utils.pipeline, it awaits stream.promises.pipeline(streams, options) which should already wait for all streams to finish. The extra stream.promises.finished(this) call is likely redundant.

If you're confident the extra finished() is needed for a specific edge case, document why. Otherwise, remove it:

 async write_entire_stream(source_stream, options) {
     await stream_utils.pipeline([source_stream, this], false, options);
-    await stream.promises.finished(this); // probably redundant and pipeline already does it
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6deba71 and a48aca8.

📒 Files selected for processing (12)
  • config.js (2 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (26 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/for_await_stream.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (3 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (7 hunks)
  • src/util/native_fs_utils.js (7 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/test/unit_tests/internal/test_file_reader.test.js
  • src/test/unrelated/for_await_stream.js
  • src/sdk/bucketspace_fs.js
  • src/sdk/object_io.js
  • config.js
  • src/sdk/namespace_fs.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unrelated/stream_pipe_to_multiple_targets.js
🧠 Learnings (1)
📚 Learning: 2025-06-17T12:59:51.543Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9042
File: src/util/cloud_utils.js:183-194
Timestamp: 2025-06-17T12:59:51.543Z
Learning: In the set_noobaa_s3_connection function in src/util/cloud_utils.js, internal NooBaa S3 endpoints (filtered by 'api': 's3', 'kind': 'INTERNAL') will always be non-secure, so tls: false is the correct setting despite the conditional https:// scheme logic.

Applied to files:

  • src/test/unrelated/stream_pipe_to_multiple_targets.js
🧬 Code graph analysis (4)
src/util/file_reader.js (4)
src/sdk/namespace_fs.js (9)
  • stream (11-11)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • config (14-14)
  • nb_native (24-24)
  • native_fs_utils (21-21)
  • multi_buffer_pool (33-54)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (38-38)
src/util/native_fs_utils.js (8)
  • config (10-10)
  • nb_native (12-12)
  • nb_native (726-726)
  • fs_context (605-610)
  • file (124-124)
  • file (815-815)
  • buffer_pool_cleanup (163-163)
  • data (182-182)
src/test/unit_tests/internal/test_file_reader.test.js (2)
  • native_fs_utils (8-8)
  • fs_context (12-12)
src/tools/file_writer_hashing.js (1)
src/endpoint/s3/ops/s3_put_object.js (1)
  • source_stream (40-40)
src/util/file_writer.js (1)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (38-38)
src/util/native_fs_utils.js (2)
src/sdk/namespace_fs.js (5)
  • error_utils (17-17)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
src/sdk/bucketspace_fs.js (5)
  • require (15-23)
  • require (24-24)
  • require (25-25)
  • require (27-27)
  • require (30-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (9)
src/test/unrelated/stream_pipe_to_multiple_targets.js (1)

145-146: LGTM! Clean migration to native stream promises API.

The replacement of custom wait_finished with stream.promises.finished aligns perfectly with the broader PR refactor towards native Node.js streaming APIs.

src/util/stream_utils.js (2)

31-35: LGTM! Updated signature improves API clarity.

The explicit default parameters and updated JSDoc make the function signature clearer and easier to use.


48-52: LGTM! Safe property checks before resume.

The duck-typing check 'readable' in last && last.readable before calling resume() prevents errors if the last stream isn't actually a Readable stream.

src/tools/file_writer_hashing.js (1)

74-78: LGTM! Cleaner streaming with write_entire_stream.

Replacing the two-step pipeline + wait_finished pattern with a single write_entire_stream call simplifies the code and aligns with the FileWriter refactor.

src/util/file_reader.js (2)

26-74: LGTM! Well-designed FileReader with dual streaming modes.

The class design is excellent:

  • Standard Readable interface via _read() for easy integration
  • Optimized read_into_stream() using buffer pools for high throughput
  • Clear documentation explaining the performance trade-offs

174-176: Fix the buffer cleanup ordering to prevent leaks.

In the past review thread, you mentioned swapping lines 175 and 176 to fix the buffer leak issue. However, the current code still has the problematic order where buffer_pool_cleanup = null comes after the write() call. If target_stream.write() throws synchronously (e.g., stream already destroyed), the finally block won't be able to release the buffer.

Apply this diff to swap the lines:

 const data = buffer.subarray(0, nread);
+buffer_pool_cleanup = null; // cleanup is now in the socket responsibility
 const write_ok = target_stream.write(data, null, callback);
-buffer_pool_cleanup = null; // cleanup is now in the socket responsibility

Now if write() throws, buffer_pool_cleanup is still set and the finally block will invoke it.

Likely an incorrect or invalid review comment.

src/util/native_fs_utils.js (2)

102-150: LGTM! Useful helper for scoped file operations.

The use_file helper ensures proper file closure even on errors, following a clean resource management pattern similar to Python's with statement or C#'s using.


865-907: LGTM! Smart optimization for sparse file reads.

The sparse file detection and warmup strategy is a clever optimization:

  • is_sparse_file uses block count to detect potential archive/on-demand files
  • warmup_sparse_file triggers a small read to avoid tying up large buffers during slow recalls
  • Module-level buffer sharing is safe since the sync allocation can't race and the buffer content isn't examined

Good performance engineering!

src/util/file_writer.js (1)

51-73: LGTM! Clean separation of stream and non-stream APIs.

The new write_buffers and finalize methods provide a non-stream API that's reused by the Writable stream methods. This design makes the class more flexible and easier to test.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/util/native_fs_utils.js (1)

93-98: Minor typo in log; keep code polish high.

"concurrennt" → "concurrent".

Apply:

-            // case of concurrennt deletion of the dir_path
+            // case of concurrent deletion of the dir_path
♻️ Duplicate comments (1)
src/sdk/namespace_fs.js (1)

1110-1112: Clamp start/end before constructing FileReader.

Prevent negative/inverted ranges and bound end ≤ stat.size.

-            const start = Number(params.start) || 0;
-            const end = isNaN(Number(params.end)) ? Infinity : Number(params.end);
+            let start = Number(params.start);
+            let end = Number(params.end);
+            start = Number.isFinite(start) ? Math.max(0, start) : 0;
+            end = Number.isFinite(end) ? end : Infinity;
+            end = Math.min(stat.size, Math.max(start, end));

Also applies to: 1119-1131

🧹 Nitpick comments (9)
src/util/native_fs_utils.js (3)

70-76: Optionally create parent dirs for 'w'/'w+' as well.*

Some call sites open with 'w*'/'w+' under existing dirs; if a future caller passes a path with a missing parent, this will fail. Consider treating any write mode that starts with 'w' like 'w'/'wt' for dir creation.

-    const should_create_path_dirs = (open_mode === 'wt' || open_mode === 'w') && dir_path !== bucket_path;
+    const should_create_path_dirs = (
+        (open_mode === 'wt' || open_mode === 'w' || (typeof open_mode === 'string' && open_mode.startsWith('w')))
+        && dir_path !== bucket_path
+    );

Also applies to: 81-89


102-150: use_file: good utility; make defaults explicit and harden close.

Explicit defaulting improves readability; also ensure open_mode defaults to read when omitted.

-async function use_file({
-    fs_context,
-    bucket_path,
-    open_path,
-    open_mode,
-    file_permissions,
-    scope,
-}) {
+async function use_file({
+    fs_context,
+    bucket_path,
+    open_path,
+    open_mode = config.NSFS_OPEN_READ_MODE,
+    file_permissions = config.BASE_MODE_FILE,
+    scope,
+}) {

880-907: Warmup: reduce noise, avoid magic number, and guard by config.

Switch log0→log1 to avoid chatty logs, use XS buffer size from config, and make warmup conditional on config.NSFS_BUF_WARMUP_SPARSE_FILE_READS.

-async function warmup_sparse_file(fs_context, file, file_path, stat, pos) {
-    dbg.log0('warmup_sparse_file', {
+async function warmup_sparse_file(fs_context, file, file_path, stat, pos) {
+    if (!config.NSFS_BUF_WARMUP_SPARSE_FILE_READS) return;
+    dbg.log1('warmup_sparse_file', {
         file_path, pos, size: stat.size, blocks: stat.blocks,
     });
     // sync call to alloc will init warmup_buffer once and for all
     // we read just one byte from the file and do not use the data
     // so the buffer can be shared among all async calls. 
     // it might be unnoticeably better if each thread will 
     // allocate its own static buffer.
-    warmup_buffer ||= nb_native().fs.dio_buffer_alloc(4096);
+    warmup_buffer ||= nb_native().fs.dio_buffer_alloc(config.NSFS_BUF_SIZE_XS);
     await file.read(fs_context, warmup_buffer, 0, 1, pos);
 }
docs/NooBaaNonContainerized/ConfigFileCustomizations.md (2)

535-547: Fix spelling and add language to the fenced block.

  • "existance" → "existence".
  • Add json language to the fenced example.
-### 37. Trigger Check bucket path exists -
+### 37. Trigger Check bucket path exists -
 * <u>Key</u>: `NSFS_CHECK_BUCKET_PATH_EXISTS`    
 * <u>Type</u>: Boolean   
 * <u>Default</u>: true  
-* <u>Description</u>: Enable/Disable bucket path existance checks. This is EXPERIMENTAL and NOT recommended for production! When disabled, will reduce some latency on object operations, but calls to non existing bucket paths will result with unexpected behavior (e.g. could return NO_SUCH_OBJECT instead of NO_SUCH_BUCKET).
+* <u>Description</u>: Enable/Disable bucket path existence checks. This is EXPERIMENTAL and NOT recommended for production! When disabled, will reduce some latency on object operations, but calls to non-existing bucket paths will result in unexpected behavior (e.g., could return NO_SUCH_OBJECT instead of NO_SUCH_BUCKET).
 * <u>Steps</u>:  
-    ```
+    ```json
     1. Open /path/to/config_dir/config.json file.
     2. Set the config key -
     Example:
     "NSFS_CHECK_BUCKET_PATH_EXISTS": false
     ```

535-547: Mention ENV override for completeness.

Document that this can also be toggled via the standard ENV override system.

     "NSFS_CHECK_BUCKET_PATH_EXISTS": false
     ```
+
+Also supported via environment variable overrides:
+
+```sh
+export CONFIG_JS_NSFS_CHECK_BUCKET_PATH_EXISTS=false
+```
+Based on learnings
config.js (1)

827-831: NSFS_CHECK_BUCKET_PATH_EXISTS: good; consider adding to ENV docs.

Config is wired and used in NamespaceFS._load_bucket. Ensure docs reflect that ENV override is supported (CONFIG_JS_NSFS_CHECK_BUCKET_PATH_EXISTS).

src/sdk/namespace_fs.js (2)

1133-1157: Wait-finish pattern: OK with FileReader; tiny guard on avg.

If you later change logging semantics, consider guarding division to avoid NaN in edge cases; otherwise fine to keep as-is.

-                avg_buffer: file_reader.num_bytes / file_reader.num_buffers,
+                avg_buffer: file_reader.num_buffers ? (file_reader.num_bytes / file_reader.num_buffers) : 0,

1798-1811: Limit concurrency when listing parts.

Using Promise.all on every part can spike fds/CPU for large MPU (up to 10k parts). Use P.map_with_concurrency.

-            const multiparts = await Promise.all(entries
-                .filter(e => e.name.startsWith('part-'))
-                .map(async e => {
+            const multiparts = await P.map_with_concurrency(64,
+                entries.filter(e => e.name.startsWith('part-')),
+                async e => {
                     const num = Number(e.name.slice('part-'.length));
                     const part_path = path.join(params.mpu_path, e.name);
                     const stat = await nb_native().fs.stat(fs_context, part_path);
                     return {
                         num,
                         size: Number(stat.xattr[XATTR_PART_SIZE]),
                         etag: this._get_etag(stat),
                         last_modified: new Date(stat.mtime),
                     };
-                })
-            );
+                });
src/util/file_writer.js (1)

46-50: Consider removing the redundant finished() call.

stream_utils.pipeline already awaits stream.promises.pipeline() and handles completion. The explicit finished() call at line 49 is redundant. While this defensive approach isn't harmful, it adds unnecessary overhead.

Apply this diff to remove the redundancy:

     async write_entire_stream(source_stream, options) {
         await stream_utils.pipeline([source_stream, this], false, options);
-        // Explicitly wait for finish as a defensive measure although pipeline should do it
-        await stream.promises.finished(this);
     }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a48aca8 and 237aa34.

📒 Files selected for processing (13)
  • config.js (2 hunks)
  • docs/NooBaaNonContainerized/ConfigFileCustomizations.md (1 hunks)
  • src/sdk/bucketspace_fs.js (1 hunks)
  • src/sdk/namespace_fs.js (26 hunks)
  • src/sdk/object_io.js (1 hunks)
  • src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
  • src/test/unrelated/for_await_stream.js (1 hunks)
  • src/test/unrelated/stream_pipe_to_multiple_targets.js (1 hunks)
  • src/tools/file_writer_hashing.js (3 hunks)
  • src/util/file_reader.js (2 hunks)
  • src/util/file_writer.js (7 hunks)
  • src/util/native_fs_utils.js (7 hunks)
  • src/util/stream_utils.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/tools/file_writer_hashing.js
  • src/sdk/object_io.js
  • src/sdk/bucketspace_fs.js
  • src/test/unit_tests/internal/test_file_reader.test.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unrelated/stream_pipe_to_multiple_targets.js
  • src/test/unrelated/for_await_stream.js
🧠 Learnings (4)
📚 Learning: 2025-06-17T12:59:51.543Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9042
File: src/util/cloud_utils.js:183-194
Timestamp: 2025-06-17T12:59:51.543Z
Learning: In the set_noobaa_s3_connection function in src/util/cloud_utils.js, internal NooBaa S3 endpoints (filtered by 'api': 's3', 'kind': 'INTERNAL') will always be non-secure, so tls: false is the correct setting despite the conditional https:// scheme logic.

Applied to files:

  • src/test/unrelated/stream_pipe_to_multiple_targets.js
📚 Learning: 2025-07-30T10:20:09.402Z
Learnt from: liranmauda
Repo: noobaa/noobaa-core PR: 9174
File: config.js:1044-1046
Timestamp: 2025-07-30T10:20:09.402Z
Learning: The NooBaa config.js file has a built-in environment variable override system via `load_config_env_overrides()` function. Any config property can be overridden using environment variables prefixed with `CONFIG_JS_`. For example, `CONFIG_JS_NC_HEALTH_BUCKETS_COUNT_LIMIT=10000` would override `config.NC_HEALTH_BUCKETS_COUNT_LIMIT`. The system handles type conversion automatically and provides consistent override patterns across all config values.

Applied to files:

  • config.js
  • docs/NooBaaNonContainerized/ConfigFileCustomizations.md
📚 Learning: 2025-11-10T09:40:34.683Z
Learnt from: guymguym
Repo: noobaa/noobaa-core PR: 9153
File: src/test/unrelated/for_await_stream.js:69-69
Timestamp: 2025-11-10T09:40:34.683Z
Learning: In the noobaa-core repository, Travis CI (.travis.yml) is unused. The active CI system is GitHub Actions, and the project uses Node.js 22.11.0 as specified in .nvmrc.

Applied to files:

  • src/test/unrelated/for_await_stream.js
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/sdk/namespace_fs.js
🧬 Code graph analysis (4)
src/util/native_fs_utils.js (2)
src/sdk/namespace_fs.js (5)
  • error_utils (17-17)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
src/sdk/bucketspace_fs.js (5)
  • require (15-23)
  • require (24-24)
  • require (25-25)
  • require (27-27)
  • require (30-30)
src/util/file_reader.js (1)
src/sdk/namespace_fs.js (7)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
  • config (14-14)
  • nb_native (24-24)
  • multi_buffer_pool (33-54)
src/sdk/namespace_fs.js (2)
src/util/file_reader.js (1)
  • stream (4-4)
src/util/file_writer.js (1)
  • stream (4-4)
src/util/file_writer.js (2)
src/util/file_reader.js (3)
  • stream_utils (8-8)
  • config (6-6)
  • stream (4-4)
src/util/stream_utils.js (2)
  • stream (4-4)
  • err (38-38)
🪛 LanguageTool
docs/NooBaaNonContainerized/ConfigFileCustomizations.md

[grammar] ~539-~539: Ensure spelling is correct
Context: ...ription: Enable/Disable bucket path existance checks. This is EXPERIMENTAL and NOT re...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
docs/NooBaaNonContainerized/ConfigFileCustomizations.md

541-541: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: run-package-lock-validation
🔇 Additional comments (14)
src/test/unrelated/for_await_stream.js (1)

69-69: LGTM! Migration to native stream.promises API is correct.

The change from a promisified stream.finished to stream.promises.finished() is semantically equivalent and aligns with the broader refactor to use native promise-based stream APIs across the codebase. Node.js 22.11.0 (per .nvmrc) fully supports this API.

src/test/unrelated/stream_pipe_to_multiple_targets.js (1)

145-146: LGTM! Correct migration to stream.promises API.

The replacement of wait_finished() with stream.promises.finished() is semantically equivalent and correctly handles completion of both the hub and cache streams within Promise.allSettled(). This aligns with the PR's goal to remove stream_utils.wait_finished in favor of native APIs.

src/sdk/namespace_fs.js (1)

2643-2645: Bucket stat gating: good.

Short-circuiting _load_bucket by config.NSFS_CHECK_BUCKET_PATH_EXISTS looks right. Consider a short TTL cache if you re-enable the check later.

config.js (1)

1013-1015: Backward-compatibility alias suggestion lacks evidence of usage in codebase.

The suggested alias for NFSF_UPLOAD_STREAM_MEM_THRESHOLD is not required to prevent internal breakage—no references to the old NFSF name exist anywhere in the codebase. The alias would only be beneficial if external consumers (outside this repository) reference the old name; however, this cannot be verified from the codebase alone. If supporting external API compatibility is a goal, add the alias; otherwise, it is unnecessary.

src/util/file_reader.js (4)

63-67: Range validation looks good.

The clamping and assertion approach provides clear error messages while ensuring start and end remain within valid bounds. The defensive programming here prevents negative positions from being passed to warmup/read operations.


113-122: LGTM!

The read path correctly handles sparse file warmup, abort signal checking, and position/stats tracking.


177-180: Buffer cleanup ordering is now correct.

Good fix! By calling write() before nulling buffer_pool_cleanup, the finally block can properly clean up the buffer if write() throws synchronously. This addresses the buffer leak concern from the previous review.


212-212: Stats tracking correctly uses read-specific method.

The use of update_nsfs_read_stats is appropriate for read operations. Good catch and fix from the previous review!

src/util/stream_utils.js (2)

31-35: LGTM!

The addition of default parameters improves API ergonomics while maintaining backward compatibility.


40-48: Good modernization of the pipeline implementation.

The defensive check for the destroy method and migration to stream.promises.pipeline improve code robustness and clarity.

src/util/file_writer.js (4)

7-8: LGTM!

The import additions and constant rename align well with the refactored streaming API.

Also applies to: 27-28


52-74: LGTM!

The new write_buffers and finalize methods provide a clean non-streaming API. Running MD5 updates and file writes in parallel is an efficient design choice.


111-137: LGTM!

The IOV_MAX-aware batching logic correctly splits buffer arrays to avoid system call limits, and offset tracking properly handles both positioned and append-mode writes.


148-191: LGTM!

The delegation of _writev, _write, and _final to the new public methods (write_buffers, finalize) ensures uniform MD5 calculation, stats tracking, and error handling across all write paths.

@guymguym guymguym changed the title nsfs refactor file_reader and file_writer NSFS refactor file_reader and file_writer Nov 10, 2025
@guymguym guymguym merged commit d74ec6e into noobaa:master Nov 10, 2025
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants