Skip to content

Bug: truncateBase64 causes false positives on 60+ char XPath / path-like strings #1298

@NaustudentX14

Description

@NaustudentX14

Description

Description:
When truncateBase64: true is enabled, the internal regex aggressively matches normal application strings (like file paths or XPaths) that contain a mix of letters and forward slashes, assuming they are standalone Base64 data.

For example, I had the following XPath string in a Python file:
".//postTransactionAmounts/sharesOwnedFollowingTransaction/value"

Repomix silently truncated this in the output XML to:
".//postTransactionAmounts/sharesO..."

Root Cause:
Looking at src/core/file/truncateBase64.ts, the issue is a mathematical coincidence with the current constants:

const MIN_BASE64_LENGTH_STANDALONE = 60;
const TRUNCATION_LENGTH = 32;
const MIN_CHAR_TYPE_COUNT = 3;

The string inside the quotes, postTransactionAmounts/sharesOwnedFollowingTransaction/value, happens to be exactly 60 characters long.

Because it hits the 60 character minimum, it gets tested by isLikelyBase64(). It passes the validation because it has:

  1. Uppercase letters
  2. Lowercase letters
  3. Special characters (the / slashes)

This satisfies MIN_CHAR_TYPE_COUNT = 3. As a result, the tool mistakenly identifies the XPath as Base64 and truncates it to 32 characters.

Proposed Solutions:

1. Increase the standalone length threshold (Recommended)
Truncating a 60-character string to 32 characters only saves a handful of tokens, which defeats the purpose of the feature (which is meant to catch massive embedded files). Bumping the minimum length significantly would stop false positives on standard code strings:

// In src/core/file/truncateBase64.ts
const MIN_BASE64_LENGTH_STANDALONE = 256; // or 512

2. Improve the isLikelyBase64 heuristic
Real Base64 encoded data almost always contains numbers and mixes + and / randomly. The validation function could reject strings that look like file paths/XPaths (e.g., they contain / but lack numbers or + signs):

function isLikelyBase64(str: string): boolean {
  // ... existing checks ...
  const hasNumbers = /[0-9]/.test(str);
  
  // Reject strings that look like paths/XPaths
  if (str.includes('/') && !str.includes('+') && !hasNumbers) {
    return false;
  }
  // ...
}

Workaround:
Setting truncateBase64: false in repomix.config.js resolves the issue and leaves the string intact.

Usage Context

Repomix CLI

Repomix Version

1.13.0

Node.js Version

v24.14.0

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions