Skip to content

fix(core): Reduce false positives in truncateBase64 for path-like strings#1307

Merged
yamadashy merged 2 commits into
mainfrom
fix/truncate-base64-false-positives
Mar 26, 2026
Merged

fix(core): Reduce false positives in truncateBase64 for path-like strings#1307
yamadashy merged 2 commits into
mainfrom
fix/truncate-base64-false-positives

Conversation

@yamadashy
Copy link
Copy Markdown
Owner

@yamadashy yamadashy commented Mar 26, 2026

Fixes #1298

truncateBase64 was incorrectly truncating XPath and file-path strings (e.g., postTransactionAmounts/sharesOwnedFollowingTransaction/value) by misidentifying them as standalone base64 data.

Changes:

  • Raise MIN_BASE64_LENGTH_STANDALONE from 60 to 256 — truncating short strings saves negligible tokens, and the feature targets large embedded blobs
  • Require digits in isLikelyBase64() heuristic — real base64-encoded binary data virtually always contains numbers, while path-like strings typically do not
  • Update and expand tests with XPath/path false-positive cases and a 256-char boundary test

Checklist

  • Run npm run test
  • Run npm run lint

Open with Devin

yamadashy and others added 2 commits March 26, 2026 22:52
…ings

Raise MIN_BASE64_LENGTH_STANDALONE from 60 to 256 since truncating short
strings saves negligible tokens. Require digits in isLikelyBase64 heuristic
since real base64-encoded binary data virtually always contains numbers,
while XPath and file path strings typically do not.

Closes #1298

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 13a39b16-eb88-46b1-ab8a-8820faa4e25e

📥 Commits

Reviewing files that changed from the base of the PR and between 1919956 and e2101a5.

📒 Files selected for processing (2)
  • src/core/file/truncateBase64.ts
  • tests/core/file/truncateBase64.test.ts

📝 Walkthrough

Walkthrough

The base64 truncation logic is refined to reduce false positives on path-like and XPath-like strings. The minimum standalone base64 length threshold increases from 60 to 256 characters, and the heuristic now rejects candidates lacking numeric digits before proceeding with character-type validation.

Changes

Cohort / File(s) Summary
Base64 Truncation Logic
src/core/file/truncateBase64.ts
Increased MIN_BASE64_LENGTH_STANDALONE from 60 to 256 to narrow the scope of standalone base64 eligibility. Added a requirement in isLikelyBase64 to reject strings without digits before applying character-type checks, reducing false positives on paths and XPath expressions.
Base64 Truncation Tests
tests/core/file/truncateBase64.test.ts
Introduced a new realistic longBase64 constant and added boundary tests around the 256-character threshold. Added regression test cases to verify truncation does not occur on path-like and XPath-like strings, including those without digits.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • PR #733: Directly introduced the truncateBase64 function and its heuristics, which this PR refines by increasing the threshold and improving the digit-checking logic to address false positives on path-like strings.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: reducing false positives in truncateBase64 for path-like strings, which matches the primary objective of fixing issue #1298.
Description check ✅ Passed The description addresses the linked issue, explains the changes made, and includes a completed checklist with test and lint verification.
Linked Issues check ✅ Passed The PR fulfills both proposed solutions from issue #1298: increasing MIN_BASE64_LENGTH_STANDALONE to 256 and improving isLikelyBase64() by requiring digits to reduce false positives on path-like strings.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the false positive issue: threshold increase, heuristic improvement, and comprehensive test updates including path-like string regression tests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/truncate-base64-false-positives

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where the truncateBase64 utility was incorrectly identifying and truncating path-like strings (e.g., XPath expressions) as base64 data. The changes refine the base64 detection heuristic and adjust the truncation threshold, ensuring that only genuinely large, embedded base64 blobs are truncated, thereby preventing unintended modifications to valid string data.

Highlights

  • Base64 Truncation Threshold: The minimum length for standalone base64 strings to be considered for truncation (MIN_BASE64_LENGTH_STANDALONE) has been increased from 60 to 256 characters to prevent truncating shorter, non-blob-like strings.
  • Base64 Detection Heuristic: The isLikelyBase64() function now requires the presence of digits in a string to identify it as base64, reducing false positives for path-like strings that typically lack numbers.
  • Test Coverage: New test cases were added to specifically address XPath and file-path false positives, along with a test for the new 256-character truncation boundary.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 26, 2026

⚡ Performance Benchmark

Latest commit:e2101a5 test(core): Add boundary test for exactly 256-char base64 string
Status:✅ Benchmark complete!
Ubuntu:2.40s (±0.01s) → 2.40s (±0.02s) · -0.00s (-0.2%)
macOS:1.98s (±0.10s) → 2.06s (±0.16s) · +0.08s (+4.2%)
Windows:3.65s (±0.06s) → 3.67s (±0.07s) · +0.02s (+0.6%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded)
  • Measurement: 10 runs / 20 on macOS (median ± IQR)
  • Workflow run

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.19%. Comparing base (1919956) to head (e2101a5).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1307   +/-   ##
=======================================
  Coverage   87.18%   87.19%           
=======================================
  Files         115      115           
  Lines        4331     4333    +2     
  Branches     1007     1008    +1     
=======================================
+ Hits         3776     3778    +2     
  Misses        555      555           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 26, 2026

Code Review

Overall: Looks good — clean, well-scoped fix for the false positive issue reported in #1298. The two-pronged approach (threshold increase + digits requirement) is complementary and appropriate.

Noteworthy Observations

The belt-and-suspenders approach works well here. The XPath string from #1298 (postTransactionAmounts/sharesOwnedFollowingTransaction/value, 61 chars) would be caught by either change alone — it's under 256 chars AND lacks digits. Having both guards makes the heuristic more robust against different classes of false positives.

Minor observation on isLikelyBase64 logic flow: After the !hasNumbers early return, hasNumbers is always true in the charTypeCount check. This means MIN_CHAR_TYPE_COUNT (3) effectively requires 2 of the remaining 3 types (uppercase, lowercase, +/). This is fine and maintains the existing contract — just noting the implicit tightening.

Potential edge case to consider (low risk)

Base64-encoded ASCII text (e.g., btoa("Hello World")SGVsbG8gV29ybGQ=) in the 60–255 char range will no longer be truncated. This is an acceptable trade-off since:

  • The token savings for strings under 256 chars is minimal (the stated goal)
  • ASCII-text base64 is less common in real codebases than binary blobs
  • False positives on real content (like the XPath case) are worse than missing a small base64 string
Test coverage notes

Good coverage of:

  • Boundary at exactly 256 chars ✓
  • Above threshold (344 chars) ✓
  • Below threshold / medium strings (60 chars) ✓
  • Original false positive (XPath without digits) ✓
  • Long paths without digits ✓

One optional addition for the future: a test with a base64 string at 255 chars (just below threshold) that does contain digits, verifying it's preserved. The current "medium-length" test uses a 60-char string, so there's a gap in boundary testing from the "not truncated" side.

LGTM 👍


Reviewed by Claude

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refines the base64 truncation logic by increasing the minimum length for standalone base64 strings to be truncated from 60 to 256 characters. It also introduces a new heuristic in the isLikelyBase64 function, requiring strings to contain digits to be considered base64, which addresses false positives for path-like or XPath strings (as reported in #1298). The changes include updated and new tests to validate these adjustments and the new truncation threshold. There are no review comments to provide feedback on.

@yamadashy yamadashy merged commit 2d70fc3 into main Mar 26, 2026
64 checks passed
@yamadashy yamadashy deleted the fix/truncate-base64-false-positives branch March 26, 2026 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: truncateBase64 causes false positives on 60+ char XPath / path-like strings

1 participant