Skip to content

Cache service/block foyer cache#21511

Closed
vishwasgarg18 wants to merge 20 commits intoopensearch-project:mainfrom
vishwasgarg18:Cache-Service/block-foyer-cache
Closed

Cache service/block foyer cache#21511
vishwasgarg18 wants to merge 20 commits intoopensearch-project:mainfrom
vishwasgarg18:Cache-Service/block-foyer-cache

Conversation

@vishwasgarg18
Copy link
Copy Markdown
Contributor

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

vishwasgarg18 and others added 5 commits April 21, 2026 13:35
Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Reviewer Guide 🔍

(Review updated until commit 3ac8ecb)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Core: Introduce UnifiedCacheService and BlockCacheProvider SPI

Relevant files:

  • server/src/main/java/org/opensearch/index/store/remote/filecache/UnifiedCacheService.java
  • server/src/main/java/org/opensearch/node/Node.java
  • server/src/main/java/org/opensearch/node/NodeService.java
  • server/src/main/java/org/opensearch/monitor/fs/WarmFsService.java
  • server/src/main/java/org/opensearch/monitor/fs/FsServiceProvider.java
  • server/src/main/java/org/opensearch/plugins/BlockCacheProvider.java
  • server/src/main/java/org/opensearch/plugins/BlockCacheStats.java
  • server/src/main/java/org/opensearch/index/store/remote/filecache/AggregateFileCacheStats.java

Sub-PR theme: Foyer Plugin: Block cache settings, stats, and FFM bridge

Relevant files:

  • sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/BlockCacheFoyerPlugin.java
  • sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBlockCache.java
  • sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBlockCacheSettings.java
  • sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerAggregatedStats.java
  • sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBridge.java
  • sandbox/plugins/block-cache-foyer/src/main/rust/src/foyer/ffm.rs
  • sandbox/plugins/block-cache-foyer/src/main/rust/src/stats.rs
  • sandbox/plugins/block-cache-foyer/src/main/rust/src/foyer_cache.rs
  • sandbox/plugins/block-cache-foyer/src/main/rust/src/range_cache.rs
  • sandbox/plugins/block-cache-foyer/src/main/rust/src/mod.rs

⚡ Recommended focus areas for review

Incomplete Close

The close() method closes block caches but does not close or release the FileCache. If FileCache implements Closeable or holds resources (file handles, off-heap memory), those will be leaked when the node shuts down.

public synchronized void close() throws IOException {
    if (!blockCaches.isEmpty()) {
        IOUtils.closeWhileHandlingException(blockCaches);
        logger.info("Block caches closed ({})", blockCaches.size());
    }
}
Double FsProbe Call

computeTotalBudgetBytes is called once in Node.java to compute totalBudgetBytes, and then create() calls calculateFileCacheSize which internally calls FsProbe.getTotalSize again. This means the SSD size is probed twice and could theoretically return different values between the two calls (e.g. if the filesystem is remounted), leading to an inconsistency between the budget passed to plugins and the budget used inside create().

    // Step 1: Read total SSD capacity from the fileCacheNodePath.
    NodeEnvironment.NodePath fileCacheNodePath = nodeEnvironment.fileCacheNodePath();
    long totalSSDBytes = ExceptionsHelper.catchAsRuntimeException(() -> FsProbe.getTotalSize(fileCacheNodePath));

    // Step 2: Compute the total warm-cache SSD budget from node.search.cache.size.
    String warmCacheSizeRaw = Node.NODE_SEARCH_CACHE_SIZE_SETTING.get(settings);
    long totalBudgetBytes = calculateFileCacheSize(warmCacheSizeRaw, totalSSDBytes);

    // Step 3: Partition the budget — plugins told Node.java how much they need.
    long fileCacheBytes = totalBudgetBytes - blockCacheBytes;

    // Step 4: Validate before creating anything.
    validate(fileCacheBytes, blockCacheBytes, totalSSDBytes);

    // Step 5: Create FileCache.
    FileCache fileCache = FileCacheFactory.createConcurrentLRUFileCache(fileCacheBytes);
    fileCacheNodePath.fileCacheReservedSize = new ByteSizeValue(fileCacheBytes, ByteSizeUnit.BYTES);
    restoreFileCacheFromDisk(settings, fileCacheNodePath, fileCache);

    // Block caches are provided by plugins via addBlockCache() after createComponents().
    return new UnifiedCacheService(fileCache);
}
Unsafe Arc Borrow

In foyer_snapshot_stats, Arc::from_raw is used to borrow the cache and then std::mem::forget is called to prevent the drop. This pattern is fragile: if cache.stats.snapshot() panics, the forget is never reached and the Arc is double-freed. A safer pattern would be Arc::increment_strong_count + Arc::from_raw or using ManuallyDrop.

let cache = Arc::from_raw(ptr as *const FoyerCache);
let single = cache.stats.snapshot();
// Prevent the Arc from being dropped — we don't own it.
std::mem::forget(cache);
Missing Zero Validation

calculateFileCacheSize can return 0 if warmCacheSizeRaw is "0" or "0%" on a warm node, but the old initializeFileCache explicitly threw a SettingsException for a zero cache size. The new validate() method only checks fileCacheBytes <= 0 after subtracting block cache bytes, so a zero total budget (e.g. misconfigured node.search.cache.size=0) would produce a confusing error about block cache allocation rather than a clear message about the missing cache size setting.

String warmCacheSizeRaw = Node.NODE_SEARCH_CACHE_SIZE_SETTING.get(settings);
long totalBudgetBytes = calculateFileCacheSize(warmCacheSizeRaw, totalSSDBytes);

// Step 3: Partition the budget — plugins told Node.java how much they need.
long fileCacheBytes = totalBudgetBytes - blockCacheBytes;

// Step 4: Validate before creating anything.
validate(fileCacheBytes, blockCacheBytes, totalSSDBytes);
Setting Namespace Conflict

Settings like block_cache.size, block_cache.block_size, and block_cache.io_engine use very generic top-level keys. If another plugin or future core feature uses the same key prefix, there will be a conflict at registration time. Consider using a plugin-specific namespace such as foyer.block_cache.size.

public static final Setting<String> CACHE_SIZE_SETTING = new Setting<>(
    "block_cache.size",
    "25%",
    value -> {
        try {
            RatioValue ratio = RatioValue.parseRatioValue(value);
            if (ratio.getAsRatio() < 0 || ratio.getAsRatio() >= 1.0) {
                throw new IllegalArgumentException(
                    "[block_cache.size] must be in [0%, 100%); got: " + value);
            }
            return value;
        } catch (Exception e) {
            throw new IllegalArgumentException(
                "[block_cache.size] must be a percentage (e.g. 25%) or ratio (e.g. 0.25); got: " + value, e);
        }
    },
    Setting.Property.NodeScope
);


/**
 * Block size for the Foyer disk tier.
 *
 * <p>Must be &ge; the largest entry ever put into the cache. DataFusion reads
 * Parquet row groups of up to 64&nbsp;MB; Lucene blocks are also up to 64&nbsp;MB.
 * A block size smaller than an entry causes a silent drop — the put succeeds but
 * the entry is not stored, resulting in a cache miss on the next read.
 *
 * <p>Default: 64&nbsp;MB. Range: [1&nbsp;MB, 256&nbsp;MB].
 *
 * <p>Configure in {@code opensearch.yml}:
 * <pre>{@code
 * block_cache.block_size: 64mb
 * }</pre>
 */
public static final Setting<ByteSizeValue> BLOCK_SIZE_SETTING = Setting.byteSizeSetting(
    "block_cache.block_size",
    new ByteSizeValue(64, ByteSizeUnit.MB),
    new ByteSizeValue(1, ByteSizeUnit.MB),
    new ByteSizeValue(256, ByteSizeUnit.MB),
    Setting.Property.NodeScope
);

/**
 * I/O engine for the Foyer disk tier.
 *
 * <ul>
 *   <li>{@code auto} (default) — selects io_uring on Linux &ge;&nbsp;5.1,
 *       falls back to psync otherwise.</li>
 *   <li>{@code io_uring} — force io_uring regardless of kernel detection.
 *       Fails at startup if io_uring is unavailable (e.g. blocked by seccomp
 *       or AppArmor in locked-down container environments).</li>
 *   <li>{@code psync} — force synchronous pread/pwrite. Use when io_uring is
 *       restricted or when predictable syscall-level profiling is needed.</li>
 * </ul>
 *
 * <p>Configure in {@code opensearch.yml}:
 * <pre>{@code
 * block_cache.io_engine: auto
 * }</pre>
 */
public static final Setting<String> IO_ENGINE_SETTING = new Setting<>("block_cache.io_engine", "auto", value -> {
    if (!Set.of("auto", "io_uring", "psync").contains(value)) {
        throw new IllegalArgumentException("[block_cache.io_engine] must be one of: auto, io_uring, psync; got: " + value);
    }
    return value;
}, Setting.Property.NodeScope);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Code Suggestions ✨

Latest suggestions up to 3ac8ecb

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Check native return code instead of unreachable array length

The out segment was allocated with exactly bufferSize longs via arena.allocateArray,
so out.toArray(ValueLayout.JAVA_LONG) will always return an array of exactly
bufferSize elements. The length check can never be false and silently masks a real
problem: if the native side writes fewer values than expected, the unwritten slots
will just be zero rather than triggering the error. The real guard should be that
the native return value (currently ignored) is checked for errors instead.

sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBridge.java [161-171]

-long[] result = out.toArray(ValueLayout.JAVA_LONG);
-if (result.length != bufferSize) {
-    logger.error(
-        "foyer_snapshot_stats: expected {} values but got {}; "
-        + "BlockCacheStats.Field and Rust snapshot() are out of sync",
-        bufferSize, result.length
-    );
-    return new long[bufferSize];
+try (var call = new NativeCall()) {
+    long rc = (long) call.invoke(FOYER_SNAPSHOT_STATS, ptr, out);
+    if (rc < 0) {
+        logger.error(
+            "foyer_snapshot_stats returned error code {}; returning zero stats",
+            rc
+        );
+        return new long[bufferSize];
+    }
 }
+return out.toArray(ValueLayout.JAVA_LONG);
Suggestion importance[1-10]: 7

__

Why: The out segment is allocated with exactly bufferSize longs, so toArray() will always return exactly bufferSize elements — the length check is dead code. The native return code from FOYER_SNAPSHOT_STATS is currently discarded, which means actual native errors are silently ignored. Checking the return code is the correct guard here.

Medium
Fix overflow validation to compare against budget not raw SSD

The create() method computes fileCacheBytes = totalBudgetBytes - blockCacheBytes,
where totalBudgetBytes is already a fraction of totalSSDBytes. The final check
fileCacheBytes + blockCacheBytes > totalSSDBytes compares the budget-derived
allocation against the raw SSD size, which will almost always be true (e.g. 80%
budget means the sum is always ≤ 80% of SSD). The correct check should compare
against totalBudgetBytes, not totalSSDBytes, or the method signature should receive
totalBudgetBytes as well.

server/src/main/java/org/opensearch/index/store/remote/filecache/UnifiedCacheService.java [121-153]

-static void validate(long fileCacheBytes, long blockCacheBytes, long totalSSDBytes) {
+static void validate(long fileCacheBytes, long blockCacheBytes, long totalBudgetBytes, long totalSSDBytes) {
     if (totalSSDBytes <= 0) {
-        ...
+        throw new IllegalArgumentException(
+            "Unable to determine SSD capacity; got: " + totalSSDBytes
+                + ". Ensure the filecache path is mounted and readable."
+        );
     }
     if (blockCacheBytes < 0) {
-        ...
+        throw new IllegalArgumentException(
+            "Block cache allocation must be >= 0; got: " + blockCacheBytes
+                + ". Check block cache plugin configuration."
+        );
     }
-
     if (fileCacheBytes <= 0) {
-        ...
+        throw new IllegalArgumentException(
+            "After allocating " + new ByteSizeValue(blockCacheBytes)
+                + " to block cache(s), no SSD budget remains for FileCache. "
+                + "Reduce block cache allocation or increase node.search.cache.size."
+        );
     }
-
-    if (fileCacheBytes + blockCacheBytes > totalSSDBytes) {
-        ...
+    if (fileCacheBytes + blockCacheBytes > totalBudgetBytes) {
+        throw new IllegalArgumentException(
+            "Warm node SSD allocation exceeds configured budget. "
+                + "file_cache=" + new ByteSizeValue(fileCacheBytes)
+                + ", block_cache=" + new ByteSizeValue(blockCacheBytes)
+                + ", budget=" + new ByteSizeValue(totalBudgetBytes)
+                + ". Reduce node.search.cache.size or block cache allocation."
+        );
     }
-
 }
Suggestion importance[1-10]: 6

__

Why: The validate() method compares fileCacheBytes + blockCacheBytes > totalSSDBytes, but both values are derived from totalBudgetBytes (a fraction of totalSSDBytes), so this check will almost never trigger. The comparison should be against totalBudgetBytes to correctly detect over-allocation within the configured budget.

Low
General
Fix exception swallowing in settings validator

When the RatioValue.parseRatioValue call succeeds but the range check fails, the
thrown IllegalArgumentException is caught by the outer catch (Exception e) block and
re-wrapped with a misleading "must be a percentage or ratio" message, hiding the
real "must be in [0%, 100%)" error. The range-check exception should not be caught
by the outer handler.

sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBlockCacheSettings.java [56-68]

 value -> {
+    RatioValue ratio;
     try {
-        RatioValue ratio = RatioValue.parseRatioValue(value);
-        if (ratio.getAsRatio() < 0 || ratio.getAsRatio() >= 1.0) {
-            throw new IllegalArgumentException(
-                "[block_cache.size] must be in [0%, 100%); got: " + value);
-        }
-        return value;
+        ratio = RatioValue.parseRatioValue(value);
     } catch (Exception e) {
         throw new IllegalArgumentException(
             "[block_cache.size] must be a percentage (e.g. 25%) or ratio (e.g. 0.25); got: " + value, e);
     }
+    if (ratio.getAsRatio() < 0 || ratio.getAsRatio() >= 1.0) {
+        throw new IllegalArgumentException(
+            "[block_cache.size] must be in [0%, 100%); got: " + value);
+    }
+    return value;
 },
Suggestion importance[1-10]: 7

__

Why: When parseRatioValue succeeds but the range check fails, the IllegalArgumentException is caught by the outer catch (Exception e) block and re-wrapped with a misleading parse-error message. Separating the parse exception from the range-check exception ensures the correct error message is surfaced to the operator.

Medium
Reset stats counter after cache clear completes

key_index.clear() and stats.used_bytes.store(0) are executed before
self.inner.clear().await, so if the Foyer clear completes after new entries are
inserted concurrently, used_bytes will be reset to zero while the cache actually
contains data. The store to used_bytes should happen after the inner.clear()
completes to avoid a race where subsequent put() calls increment used_bytes before
the reset.

sandbox/plugins/block-cache-foyer/src/main/rust/src/foyer_cache.rs [324-328]

 async fn clear(&self) {
     self.key_index.clear();
+    let _ = self.inner.clear().await;
     self.stats.used_bytes.store(0, Ordering::Relaxed);
-    let _ = self.inner.clear().await;
 }
Suggestion importance[1-10]: 5

__

Why: Resetting used_bytes before inner.clear().await creates a race where concurrent put() calls can increment used_bytes between the reset and the actual clear, leaving the counter at zero while the cache contains data. Moving the reset after the await reduces (though doesn't fully eliminate) this window.

Low

Previous suggestions

Suggestions up to commit 5cacfcf
CategorySuggestion                                                                                                                                    Impact
Possible issue
Check native function return code for errors

The native foyer_snapshot_stats returns a status code (0 on success, <0 on error),
but the Java code never checks this return value. A negative return from the native
call indicates an error (invalid pointer or null output buffer), and the stats
buffer may contain garbage. The return value of call.invoke(FOYER_SNAPSHOT_STATS,
ptr, out) should be checked and an error logged or zeros returned if it is negative.

sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBridge.java [155-177]

 public static long[] snapshotStats(long ptr) {
     final int bufferSize = BlockCacheStats.Field.COUNT * 2;
     try (Arena arena = Arena.ofConfined()) {
         var out = arena.allocateArray(ValueLayout.JAVA_LONG, bufferSize);
         try (var call = new NativeCall()) {
-            call.invoke(FOYER_SNAPSHOT_STATS, ptr, out);
+            long rc = call.invoke(FOYER_SNAPSHOT_STATS, ptr, out);
+            if (rc < 0) {
+                logger.error("foyer_snapshot_stats returned error code: {}", rc);
+                return new long[bufferSize];
+            }
         }
         long[] result = out.toArray(ValueLayout.JAVA_LONG);
         if (result.length != bufferSize) {
-            ...
+            logger.error(
+                "foyer_snapshot_stats: expected {} values but got {}; "
+                + "BlockCacheStats.Field and Rust snapshot() are out of sync",
+                bufferSize, result.length
+            );
             return new long[bufferSize];
         }
         return result;
     } catch (Exception e) {
         logger.warn("foyer_snapshot_stats failed: {}", e.getMessage());
         return new long[bufferSize];
     }
 }
Suggestion importance[1-10]: 7

__

Why: The native foyer_snapshot_stats returns a status code that is never checked in the Java code. If the native call returns a negative error code, the stats buffer may contain uninitialized data, leading to corrupted stats being silently returned. Checking the return value is important for correctness.

Medium
Fix exception swallowing in settings validator

The outer catch (Exception e) will also catch the IllegalArgumentException thrown by
the inner range check, wrapping it in a second IllegalArgumentException with a
misleading message about format. The range-check exception should be re-thrown
directly, or the catch should only handle OpenSearchParseException from
parseRatioValue.

sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBlockCacheSettings.java [56-68]

 value -> {
+    RatioValue ratio;
     try {
-        RatioValue ratio = RatioValue.parseRatioValue(value);
-        if (ratio.getAsRatio() < 0 || ratio.getAsRatio() >= 1.0) {
-            throw new IllegalArgumentException(
-                "[block_cache.size] must be in [0%, 100%); got: " + value);
-        }
-        return value;
+        ratio = RatioValue.parseRatioValue(value);
     } catch (Exception e) {
         throw new IllegalArgumentException(
             "[block_cache.size] must be a percentage (e.g. 25%) or ratio (e.g. 0.25); got: " + value, e);
     }
+    if (ratio.getAsRatio() < 0 || ratio.getAsRatio() >= 1.0) {
+        throw new IllegalArgumentException(
+            "[block_cache.size] must be in [0%, 100%); got: " + value);
+    }
+    return value;
 },
Suggestion importance[1-10]: 6

__

Why: The outer catch (Exception e) wraps the inner IllegalArgumentException from the range check with a misleading format error message. The suggested fix correctly separates parse errors from range validation errors, improving error message clarity for operators.

Low
Fix race condition in cache clear stats reset

There is a race condition in clear(): used_bytes is reset to zero before
self.inner.clear() completes. Concurrent put() calls between the store(0) and the
actual Foyer clear could increment used_bytes, and those increments would then be
lost when Foyer evicts those entries via on_leave. The used_bytes reset should
happen after self.inner.clear().await completes.

sandbox/plugins/block-cache-foyer/src/main/rust/src/foyer_cache.rs [324-328]

 async fn clear(&self) {
     self.key_index.clear();
+    let _ = self.inner.clear().await;
     self.stats.used_bytes.store(0, Ordering::Relaxed);
-    let _ = self.inner.clear().await;
 }
Suggestion importance[1-10]: 6

__

Why: Resetting used_bytes before inner.clear() completes creates a race where concurrent put() calls can increment used_bytes and then have those increments lost when Foyer evicts the entries. Moving the reset after the clear reduces (though doesn't fully eliminate) this window, improving stats accuracy.

Low
Prevent integer overflow in cache size validation

The validation check fileCacheBytes + blockCacheBytes > totalSSDBytes can overflow
silently when both values are large longs. Use Long.MAX_VALUE safe arithmetic or
subtract instead of add to avoid integer overflow. Replace the addition with
fileCacheBytes > totalSSDBytes - blockCacheBytes.

server/src/main/java/org/opensearch/index/store/remote/filecache/UnifiedCacheService.java [131-139]

 static void validate(long fileCacheBytes, long blockCacheBytes, long totalSSDBytes) {
     if (totalSSDBytes <= 0) {
-        ...
+        throw new IllegalArgumentException(
+            "Unable to determine SSD capacity; got: " + totalSSDBytes
+                + ". Ensure the filecache path is mounted and readable."
+        );
     }
     if (blockCacheBytes < 0) {
-        ...
+        throw new IllegalArgumentException("blockCacheBytes must be >= 0; got: " + blockCacheBytes);
     }
-    if (fileCacheBytes + blockCacheBytes > totalSSDBytes) {
-        ...
+    if (blockCacheBytes > totalSSDBytes || fileCacheBytes > totalSSDBytes - blockCacheBytes) {
+        throw new IllegalArgumentException(
+            "Warm node SSD allocation exceeds available capacity. "
+                + "file_cache=" + new ByteSizeValue(fileCacheBytes)
+                + ", block_cache=" + new ByteSizeValue(blockCacheBytes)
+                + ", total SSD=" + new ByteSizeValue(totalSSDBytes)
+                + ". Reduce block_cache.size or node.search.cache.size."
+        );
     }
     if (fileCacheBytes <= 0) {
-        ...
+        throw new IllegalArgumentException(
+            "After allocating " + new ByteSizeValue(blockCacheBytes)
+                + " to block_cache, no SSD remains for FileCache. Reduce block_cache.size."
+        );
     }
 }
Suggestion importance[1-10]: 5

__

Why: The overflow concern is valid in theory for very large long values, but in practice fileCacheBytes and blockCacheBytes represent SSD sizes which are far below Long.MAX_VALUE. The fix is correct and improves robustness, but the practical impact is low.

Low
Suggestions up to commit 99f22c0
CategorySuggestion                                                                                                                                    Impact
Possible issue
Check native call return value for errors

The NativeCall is closed (via try-with-resources) before out.toArray() is called,
but out is a segment allocated in the Arena. The Arena is still open at that point
so this is safe. However, the native call's return value (which indicates
success/error per the FFM descriptor) is never checked — a negative return value
indicating an error is silently ignored, and the method returns whatever
(potentially uninitialized) data is in the buffer. The return value of
call.invoke(FOYER_SNAPSHOT_STATS, ptr, out) should be checked.

sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBridge.java [150-177]

 public static long[] snapshotStats(long ptr) {
     final int bufferSize = BlockCacheStats.Field.COUNT * 2;
     try (Arena arena = Arena.ofConfined()) {
         var out = arena.allocateArray(ValueLayout.JAVA_LONG, bufferSize);
         try (var call = new NativeCall()) {
-            call.invoke(FOYER_SNAPSHOT_STATS, ptr, out);
+            long rc = call.invoke(FOYER_SNAPSHOT_STATS, ptr, out);
+            if (rc < 0) {
+                logger.warn("foyer_snapshot_stats returned error code: {}", rc);
+                return new long[bufferSize];
+            }
         }
         long[] result = out.toArray(ValueLayout.JAVA_LONG);
         if (result.length != bufferSize) {
-            ...
+            logger.error(
+                "foyer_snapshot_stats: expected {} values but got {}; "
+                + "BlockCacheStats.Field and Rust snapshot() are out of sync",
+                bufferSize, result.length
+            );
             return new long[bufferSize];
         }
         return result;
     } catch (Exception e) {
         logger.warn("foyer_snapshot_stats failed: {}", e.getMessage());
         return new long[bufferSize];
     }
 }
Suggestion importance[1-10]: 7

__

Why: The native foyer_snapshot_stats function returns 0 on success and < 0 on error, but the return value from call.invoke() is never checked. This means errors from the native side are silently ignored and potentially uninitialized buffer data could be returned as valid stats. Checking the return code is an important correctness fix.

Medium
Fix exception swallowing in settings validator

The outer catch (Exception e) will also catch the IllegalArgumentException thrown by
the inner range check, wrapping it in a second IllegalArgumentException with a
misleading message about format rather than the actual range violation. The
range-check exception should be re-thrown directly, or the catch block should only
catch OpenSearchParseException (or the specific parse exception type thrown by
parseRatioValue).

sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBlockCacheSettings.java [56-68]

 value -> {
+    RatioValue ratio;
     try {
-        RatioValue ratio = RatioValue.parseRatioValue(value);
-        if (ratio.getAsRatio() < 0 || ratio.getAsRatio() >= 1.0) {
-            throw new IllegalArgumentException(
-                "[block_cache.size] must be in [0%, 100%); got: " + value);
-        }
-        return value;
+        ratio = RatioValue.parseRatioValue(value);
     } catch (Exception e) {
         throw new IllegalArgumentException(
             "[block_cache.size] must be a percentage (e.g. 25%) or ratio (e.g. 0.25); got: " + value, e);
     }
+    if (ratio.getAsRatio() < 0 || ratio.getAsRatio() >= 1.0) {
+        throw new IllegalArgumentException(
+            "[block_cache.size] must be in [0%, 100%); got: " + value);
+    }
+    return value;
 },
Suggestion importance[1-10]: 7

__

Why: The outer catch (Exception e) catches the IllegalArgumentException thrown by the range check and wraps it with a misleading format-error message, hiding the actual range violation. The fix correctly separates parse errors from range validation errors, improving debuggability for operators configuring the setting.

Medium
Reset stats counter after cache clear completes

In clear(), used_bytes is reset to zero before self.inner.clear() completes. If
concurrent put() calls are in flight, they may add to used_bytes after the reset but
before the inner cache is actually cleared, leaving used_bytes with a stale positive
value while the cache is empty. The used_bytes reset should happen after
self.inner.clear().await completes.

sandbox/plugins/block-cache-foyer/src/main/rust/src/foyer_cache.rs [324-328]

 async fn clear(&self) {
     self.key_index.clear();
+    let _ = self.inner.clear().await;
     self.stats.used_bytes.store(0, Ordering::Relaxed);
-    let _ = self.inner.clear().await;
 }
Suggestion importance[1-10]: 6

__

Why: Resetting used_bytes before inner.clear().await completes creates a race condition where concurrent put() calls could increment used_bytes after the reset but before the cache is actually cleared, leaving a stale positive value. Moving the reset after the await is a valid correctness improvement.

Low
General
Improve validation error message for negative file cache bytes

The validation check fileCacheBytes + blockCacheBytes > totalSSDBytes may silently
pass when fileCacheBytes is negative (i.e., blockCacheBytes > totalBudgetBytes),
because the sum could still be ≤ totalSSDBytes. The fileCacheBytes <= 0 check
catches this after the fact, but the error message is misleading when
blockCacheBytes exceeds the total budget. Add an explicit check that blockCacheBytes
<= totalBudgetBytes before computing fileCacheBytes.

server/src/main/java/org/opensearch/index/store/remote/filecache/UnifiedCacheService.java [121-146]

 static void validate(long fileCacheBytes, long blockCacheBytes, long totalSSDBytes) {
     if (totalSSDBytes <= 0) {
-        ...
+        throw new IllegalArgumentException(
+            "Unable to determine SSD capacity; got: " + totalSSDBytes
+                + ". Ensure the filecache path is mounted and readable."
+        );
     }
     if (blockCacheBytes < 0) {
-        ...
+        throw new IllegalArgumentException("blockCacheBytes must be >= 0; got: " + blockCacheBytes);
     }
     if (fileCacheBytes + blockCacheBytes > totalSSDBytes) {
-        ...
+        throw new IllegalArgumentException(
+            "Warm node SSD allocation exceeds available capacity. "
+                + "file_cache=" + new ByteSizeValue(fileCacheBytes)
+                + ", block_cache=" + new ByteSizeValue(blockCacheBytes)
+                + ", total SSD=" + new ByteSizeValue(totalSSDBytes)
+                + ". Reduce block_cache.size or node.search.cache.size."
+        );
     }
     if (fileCacheBytes <= 0) {
         throw new IllegalArgumentException(
             "After allocating " + new ByteSizeValue(blockCacheBytes)
-                + " to block_cache, no SSD remains for FileCache. Reduce block_cache.size."
+                + " to block_cache, no SSD remains for FileCache (fileCacheBytes=" + fileCacheBytes
+                + "). Reduce block_cache.size."
         );
     }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion points out a valid edge case where fileCacheBytes could be negative and the error message would be misleading. However, the fileCacheBytes <= 0 check does catch this case, and the improved code only adds the fileCacheBytes value to the error message rather than adding a new explicit check for blockCacheBytes > totalBudgetBytes. The improvement is minor and the improved_code doesn't actually add the suggested explicit check.

Low
Suggestions up to commit aeb286b
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard stats access after cache is closed

foyerStats() is called from stats() without checking whether the cache has been
closed. If stats() is called after close(), cachePtr is a dangling native handle and
the FFM call will cause undefined behaviour or a JVM crash. Add a guard that returns
a zeroed snapshot when the cache is closed.

sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBlockCache.java [110-113]

 public FoyerAggregatedStats foyerStats() {
+    if (closed.get()) {
+        long[] zeros = new long[FoyerAggregatedStats.Field.COUNT * 2];
+        return FoyerAggregatedStats.snapshot(zeros, diskBytes);
+    }
     long[] raw = FoyerBridge.snapshotStats(cachePtr);
     return FoyerAggregatedStats.snapshot(raw, diskBytes);
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid safety concern — calling foyerStats() after close() would use a dangling cachePtr handle, potentially causing a JVM crash. However, the improved_code references FoyerAggregatedStats.Field which is a private enum, making the suggested code uncompilable as-is.

Medium
Fix swallowed range-validation error in setting parser

The IllegalArgumentException thrown inside the try block is caught by the outer
catch (Exception e) and re-wrapped with a misleading "must be a percentage" message,
hiding the real validation error about the out-of-range value. The range-check
exception should be re-thrown directly without being caught, or the catch should
only handle OpenSearchParseException.

server/src/main/java/org/opensearch/index/store/remote/filecache/BlockCacheSettings.java [118-128]

+RatioValue ratio;
 try {
-    RatioValue ratio = RatioValue.parseRatioValue(value);
-    if (ratio.getAsRatio() < 0 || ratio.getAsRatio() >= 1.0) {
-        throw new IllegalArgumentException(
-            "[block_cache.size] must be in [0%, 100%); got: " + value);
-    }
-    return value;
+    ratio = RatioValue.parseRatioValue(value);
 } catch (Exception e) {
     throw new IllegalArgumentException(
         "[block_cache.size] must be a percentage (e.g. 25%) or ratio (e.g. 0.25); got: " + value, e);
 }
+if (ratio.getAsRatio() < 0 || ratio.getAsRatio() >= 1.0) {
+    throw new IllegalArgumentException(
+        "[block_cache.size] must be in [0%, 100%); got: " + value);
+}
+return value;
Suggestion importance[1-10]: 7

__

Why: This is a real bug — the IllegalArgumentException for out-of-range values is caught and re-wrapped with a misleading parse error message. The suggested fix correctly separates parse errors from range validation errors, improving debuggability for operators configuring the setting.

Medium
Fix race condition in cache clear ordering

key_index.clear() and stats.used_bytes.store(0) run before self.inner.clear().await,
so concurrent put() calls between those two lines and the actual Foyer clear can add
entries and increment used_bytes that are then wiped by the store, leaving
used_bytes at 0 while entries are actually present. Reorder so the Foyer clear
happens first, then reset the derived state.

sandbox/plugins/block-cache-foyer/src/main/rust/src/foyer_cache.rs [324-328]

 async fn clear(&self) {
+    let _ = self.inner.clear().await;
     self.key_index.clear();
     self.stats.used_bytes.store(0, Ordering::Relaxed);
-    let _ = self.inner.clear().await;
 }
Suggestion importance[1-10]: 6

__

Why: The ordering issue is valid — resetting used_bytes before the actual Foyer clear can leave stats inconsistent if concurrent put() calls occur between the reset and the clear. Reordering to clear Foyer first reduces (but doesn't eliminate) the race window, making it a meaningful improvement.

Low
Remove unreachable length check on fixed-size allocation

The toArray call on a MemorySegment allocated with
allocateArray(ValueLayout.JAVA_LONG, bufferSize) will always return exactly
bufferSize elements, making the length check dead code that can never trigger. The
real risk is a native buffer overrun if the Rust side writes more values than
allocated. Consider removing the misleading check or replacing it with an
assertion/comment explaining why it cannot fail.

sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBridge.java [161-162]

 long[] result = out.toArray(ValueLayout.JAVA_LONG);
-if (result.length != bufferSize) {
+// result.length is always == bufferSize because the segment was allocated
+// with exactly bufferSize longs. A mismatch between Java Field.COUNT and
+// the Rust snapshot() array would instead cause a native buffer overrun.
+assert result.length == bufferSize : "unexpected array length: " + result.length;
Suggestion importance[1-10]: 5

__

Why: The check is indeed dead code since toArray on a fixed-size MemorySegment always returns exactly bufferSize elements. However, the suggestion's improved_code uses assert which is disabled by default in Java, making it a marginal improvement. The real concern (native buffer overrun) is valid but not addressed by the proposed fix.

Low
Suggestions up to commit e91fc13
CategorySuggestion                                                                                                                                    Impact
Possible issue
Reset stats counter after cache clear completes

used_bytes is reset to zero before self.inner.clear() completes. If concurrent put()
calls are in flight between the store and the actual Foyer clear, those puts will
increment used_bytes and then be silently cleared by Foyer, leaving used_bytes
permanently positive while the cache is actually empty. The store should happen
after inner.clear().await to minimize the race window.

sandbox/plugins/block-cache-foyer/src/main/rust/src/foyer_cache.rs [324-328]

 async fn clear(&self) {
     self.key_index.clear();
+    let _ = self.inner.clear().await;
     self.stats.used_bytes.store(0, Ordering::Relaxed);
-    let _ = self.inner.clear().await;
 }
Suggestion importance[1-10]: 6

__

Why: This is a valid race condition: resetting used_bytes before inner.clear().await completes means concurrent put() calls could increment used_bytes and then be silently cleared by Foyer, leaving the counter permanently inflated. Moving the reset after the await reduces (but doesn't eliminate) this race window.

Low
Guard against out-of-range ratio in block cache byte resolution

CACHE_SIZE_SETTING validates that the ratio is in [0, 1), but
RatioValue.parseRatioValue can also accept absolute byte values (e.g. "200gb"),
which would bypass the percentage validation in the setting's parser and cause
getAsRatio() to return an unexpected value. The resolveBlockCacheBytes method should
guard against a ratio ≥ 1.0 (or ≤ 0) after parsing to prevent allocating more bytes
than the total budget.

server/src/main/java/org/opensearch/index/store/remote/filecache/UnifiedCacheService.java [300-304]

 static long resolveBlockCacheBytes(Settings settings, long totalBudgetBytes) {
     String cacheSizeRaw = BlockCacheSettings.CACHE_SIZE_SETTING.get(settings);
     RatioValue ratio = RatioValue.parseRatioValue(cacheSizeRaw);
-    return Math.round(totalBudgetBytes * ratio.getAsRatio());
+    double r = ratio.getAsRatio();
+    if (r < 0 || r >= 1.0) {
+        throw new IllegalArgumentException(
+            "[block_cache.size] ratio must be in [0, 1); got: " + r);
+    }
+    return Math.round(totalBudgetBytes * r);
 }
Suggestion importance[1-10]: 5

__

Why: The CACHE_SIZE_SETTING validator already enforces the [0, 1) range, so in practice this guard is redundant for normal usage. However, resolveBlockCacheBytes is package-private and could be called directly in tests or other contexts, making the extra validation a reasonable defensive measure. The suggestion is valid but has limited practical impact.

Low
General
Remove unreachable length check after fixed-size allocation

The toArray call on a MemorySegment allocated with
arena.allocateArray(ValueLayout.JAVA_LONG, bufferSize) will always return exactly
bufferSize elements, making the length check dead code that can never trigger. The
real mismatch guard should be a static assertion or a comment clarifying this,
rather than a runtime check that gives false confidence. Remove the unreachable
branch or replace it with a static compile-time constant check to avoid misleading
future maintainers.

sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBridge.java [161-171]

+// toArray always returns exactly bufferSize elements for a fixed-size allocation.
 long[] result = out.toArray(ValueLayout.JAVA_LONG);
-if (result.length != bufferSize) {
-    logger.error(
-        "foyer_snapshot_stats: expected {} values but got {}; "
-        + "BlockCacheStats.Field and Rust snapshot() are out of sync",
-        bufferSize, result.length
-    );
-    return new long[bufferSize];
-}
+assert result.length == bufferSize : "Unexpected buffer size: " + result.length;
+return result;
Suggestion importance[1-10]: 4

__

Why: The suggestion is technically correct — toArray on a fixed-size MemorySegment allocation will always return exactly bufferSize elements, making the length check dead code. However, the check serves as a defensive guard and the improved code uses assert which is disabled by default in Java, making it a weaker replacement. The impact is minor.

Low
Close FileCache resources during service shutdown

FileCache is not closed here, only block caches are. If UnifiedCacheService owns the
FileCache lifecycle (as stated in the class Javadoc), it should also close or at
least clear the FileCache on close(). Leaving it open while block caches are torn
down can cause resource leaks or inconsistent state during node shutdown.

server/src/main/java/org/opensearch/index/store/remote/filecache/UnifiedCacheService.java [260-265]

 @Override
 public synchronized void close() throws IOException {
     if (!blockCaches.isEmpty()) {
         IOUtils.closeWhileHandlingException(blockCaches);
         logger.info("Block caches closed ({})", blockCaches.size());
     }
+    fileCache.clear();
 }
Suggestion importance[1-10]: 3

__

Why: The Javadoc states FileCache is GC'd after close, suggesting it doesn't implement Closeable. Calling fileCache.clear() on shutdown may not be the right approach — FileCache likely manages its own lifecycle. The suggestion may be incorrect about the ownership model, and clear() is not the same as closing/releasing resources.

Low
Suggestions up to commit 796c824
CategorySuggestion                                                                                                                                    Impact
Possible issue
Check native call return value for errors

The return value of FOYER_SNAPSHOT_STATS (0 on success, <0 on error) is never
checked. If the native call returns an error code, the method silently returns a
zero-filled or partially-written buffer, producing corrupted stats without any
warning. The return value from call.invoke(...) should be captured and validated
before using the output buffer.

sandbox/plugins/block-cache-foyer/src/main/java/org/opensearch/blockcache/foyer/FoyerBridge.java [155-176]

 final int bufferSize = BlockCacheStats.Field.COUNT * 2;
 try (Arena arena = Arena.ofConfined()) {
     var out = arena.allocateArray(ValueLayout.JAVA_LONG, bufferSize);
     try (var call = new NativeCall()) {
-        call.invoke(FOYER_SNAPSHOT_STATS, ptr, out);
+        long rc = call.invoke(FOYER_SNAPSHOT_STATS, ptr, out);
+        if (rc < 0) {
+            logger.warn("foyer_snapshot_stats returned error code: {}", rc);
+            return new long[bufferSize];
+        }
     }
     long[] result = out.toArray(ValueLayout.JAVA_LONG);
     if (result.length != bufferSize) {
         ...
     }
     return result;
 } catch (Exception e) {
     logger.warn("foyer_snapshot_stats failed: {}", e.getMessage());
     return new long[bufferSize];
 }
Suggestion importance[1-10]: 7

__

Why: The return value of FOYER_SNAPSHOT_STATS is never checked, meaning a native error would silently produce corrupted stats. Capturing and validating rc before using the output buffer is a meaningful correctness improvement.

Medium
General
Close FileCache on service shutdown

FileCache is never explicitly closed in UnifiedCacheService.close(). If FileCache
implements Closeable or holds resources (file handles, memory-mapped regions), they
will be leaked when the node shuts down. The close() method should also close the
fileCache instance.

server/src/main/java/org/opensearch/index/store/remote/filecache/UnifiedCacheService.java [259-265]

-/** Closes all registered block caches, then lets FileCache be GC'd. */
 @Override
 public synchronized void close() throws IOException {
-    if (!blockCaches.isEmpty()) {
-        IOUtils.closeWhileHandlingException(blockCaches);
-        logger.info("Block caches closed ({})", blockCaches.size());
+    List<Closeable> toClose = new ArrayList<>(blockCaches);
+    if (fileCache instanceof Closeable) {
+        toClose.add((Closeable) fileCache);
+    }
+    if (!toClose.isEmpty()) {
+        IOUtils.closeWhileHandlingException(toClose);
+        logger.info("UnifiedCacheService closed: {} block cache(s) + fileCache", blockCaches.size());
     }
 }
Suggestion importance[1-10]: 4

__

Why: If FileCache holds closeable resources, not closing it in UnifiedCacheService.close() could cause resource leaks on node shutdown. However, whether FileCache actually implements Closeable is unclear from the diff, reducing the certainty of this suggestion's impact.

Low
Clamp block cache bytes to total budget

resolveBlockCacheBytes can return a value larger than totalBudgetBytes if
CACHE_SIZE_SETTING validation is bypassed or if rounding pushes the result over the
budget. The validate() method catches this, but the result should be clamped
defensively before passing to validate() to avoid confusing error messages. More
critically, CACHE_SIZE_SETTING allows 0% (to disable the block cache), but the
validator rejects values >= 1.0 — a value of exactly 100% would pass parseRatioValue
as 1.0 and be rejected, which is correct, but a value like 99.9% would leave almost
nothing for FileCache and only be caught by validate().

server/src/main/java/org/opensearch/index/store/remote/filecache/UnifiedCacheService.java [300-304]

 static long resolveBlockCacheBytes(Settings settings, long totalBudgetBytes) {
     String cacheSizeRaw = BlockCacheSettings.CACHE_SIZE_SETTING.get(settings);
     RatioValue ratio = RatioValue.parseRatioValue(cacheSizeRaw);
-    return Math.round(totalBudgetBytes * ratio.getAsRatio());
+    long bytes = Math.round(totalBudgetBytes * ratio.getAsRatio());
+    return Math.min(bytes, totalBudgetBytes);
 }
Suggestion importance[1-10]: 3

__

Why: The validate() method already catches over-budget allocations, so this defensive clamp provides marginal additional safety. The suggestion is valid but low impact since the existing validation handles the edge case.

Low
Document stats inflation risk on failed inserts

used_bytes is incremented unconditionally in put(), but self.inner.insert() is
fire-and-forget (async, no await). If the insert fails silently or is dropped by
Foyer (e.g. entry too large for block size), used_bytes will be permanently inflated
with no corresponding on_leave event to subtract it. Consider checking the insert
result or only updating used_bytes upon a confirmed successful insert.

sandbox/plugins/block-cache-foyer/src/main/rust/src/foyer_cache.rs [297-307]

 fn put(&self, key: &CacheKey, data: Bytes) {
     let size = data.len() as i64;
     let raw = key.as_str();
     let k = raw.to_string();
+    // Note: insert() is async/fire-and-forget; used_bytes may be transiently
+    // inflated if Foyer silently drops oversized entries (no on_leave fired).
+    // This is a known approximation — monitor eviction_bytes vs used_bytes drift.
     self.inner.insert(k.clone(), data.to_vec());
-    // Track bytes added. If this entry replaces an existing one, on_leave()
-    // will subtract the old size via Event::Replace.
     self.stats.used_bytes.fetch_add(size, Ordering::Relaxed);
     let idx = Self::index_key(raw).to_string();
     self.key_index.entry(idx).or_default().push(k);
 }
Suggestion importance[1-10]: 2

__

Why: The improved_code only adds a comment without changing logic, making this essentially a documentation suggestion. The underlying issue of potential used_bytes inflation is real but the proposed fix doesn't address it.

Low

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

❌ Gradle check result for 217af36: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit b922965

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

❌ Gradle check result for b922965: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit ebaff5d

Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit e1ec26f

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

❌ Gradle check result for e1ec26f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit 45f27ca

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

❌ Gradle check result for 45f27ca: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit 796c824

Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit e91fc13

Vishwas Garg added 2 commits May 6, 2026 20:05
Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit aeb286b

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

❌ Gradle check result for aeb286b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 04f45e0.

PathLineSeverityDescription
sandbox/plugins/block-cache-foyer/src/main/rust/src/ffm.rs99mediumBoth the new top-level src/ffm.rs and the modified src/foyer/ffm.rs define `#[no_mangle] pub unsafe extern "C" fn foyer_snapshot_stats` — a duplicate no_mangle export. If both translation units are linked into the same shared library, the dynamic linker will silently resolve all callers to whichever symbol appears first. An attacker who controls link order could substitute a different implementation; at minimum the non-determinism is a security invariant violation.
sandbox/plugins/block-cache-foyer/src/main/rust/src/ffm.rs103lowfoyer_snapshot_stats writes 14 i64 values to the caller-supplied `out` pointer after only a null check. There is no bounds verification: if the Java caller ever passes a buffer smaller than 14 longs (e.g. if FoyerAggregatedStats.Field.COUNT is changed without updating the Rust constant), this results in an out-of-bounds write into JVM-managed memory. No malicious intent is apparent, but the missing length parameter is a latent memory-safety gap in the FFM contract.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 1 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Vishwas Garg added 2 commits May 6, 2026 20:30
Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit 99f22c0

Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

❌ Gradle check result for 99f22c0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit ef9b78b

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit 5cacfcf

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

❌ Gradle check result for 5cacfcf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Persistent review updated to latest commit 3ac8ecb

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

❌ Gradle check result for 3ac8ecb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vishwas Garg <glvishwa@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant