Description
Description:
When truncateBase64: true is enabled, the internal regex aggressively matches normal application strings (like file paths or XPaths) that contain a mix of letters and forward slashes, assuming they are standalone Base64 data.
For example, I had the following XPath string in a Python file:
".//postTransactionAmounts/sharesOwnedFollowingTransaction/value"
Repomix silently truncated this in the output XML to:
".//postTransactionAmounts/sharesO..."
Root Cause:
Looking at src/core/file/truncateBase64.ts, the issue is a mathematical coincidence with the current constants:
const MIN_BASE64_LENGTH_STANDALONE = 60;
const TRUNCATION_LENGTH = 32;
const MIN_CHAR_TYPE_COUNT = 3;
The string inside the quotes, postTransactionAmounts/sharesOwnedFollowingTransaction/value, happens to be exactly 60 characters long.
Because it hits the 60 character minimum, it gets tested by isLikelyBase64(). It passes the validation because it has:
- Uppercase letters
- Lowercase letters
- Special characters (the
/ slashes)
This satisfies MIN_CHAR_TYPE_COUNT = 3. As a result, the tool mistakenly identifies the XPath as Base64 and truncates it to 32 characters.
Proposed Solutions:
1. Increase the standalone length threshold (Recommended)
Truncating a 60-character string to 32 characters only saves a handful of tokens, which defeats the purpose of the feature (which is meant to catch massive embedded files). Bumping the minimum length significantly would stop false positives on standard code strings:
// In src/core/file/truncateBase64.ts
const MIN_BASE64_LENGTH_STANDALONE = 256; // or 512
2. Improve the isLikelyBase64 heuristic
Real Base64 encoded data almost always contains numbers and mixes + and / randomly. The validation function could reject strings that look like file paths/XPaths (e.g., they contain / but lack numbers or + signs):
function isLikelyBase64(str: string): boolean {
// ... existing checks ...
const hasNumbers = /[0-9]/.test(str);
// Reject strings that look like paths/XPaths
if (str.includes('/') && !str.includes('+') && !hasNumbers) {
return false;
}
// ...
}
Workaround:
Setting truncateBase64: false in repomix.config.js resolves the issue and leaves the string intact.
Usage Context
Repomix CLI
Repomix Version
1.13.0
Node.js Version
v24.14.0
Description
Description:
When
truncateBase64: trueis enabled, the internal regex aggressively matches normal application strings (like file paths or XPaths) that contain a mix of letters and forward slashes, assuming they are standalone Base64 data.For example, I had the following XPath string in a Python file:
".//postTransactionAmounts/sharesOwnedFollowingTransaction/value"Repomix silently truncated this in the output XML to:
".//postTransactionAmounts/sharesO..."Root Cause:
Looking at
src/core/file/truncateBase64.ts, the issue is a mathematical coincidence with the current constants:The string inside the quotes,
postTransactionAmounts/sharesOwnedFollowingTransaction/value, happens to be exactly 60 characters long.Because it hits the
60character minimum, it gets tested byisLikelyBase64(). It passes the validation because it has:/slashes)This satisfies
MIN_CHAR_TYPE_COUNT = 3. As a result, the tool mistakenly identifies the XPath as Base64 and truncates it to 32 characters.Proposed Solutions:
1. Increase the standalone length threshold (Recommended)
Truncating a 60-character string to 32 characters only saves a handful of tokens, which defeats the purpose of the feature (which is meant to catch massive embedded files). Bumping the minimum length significantly would stop false positives on standard code strings:
2. Improve the
isLikelyBase64heuristicReal Base64 encoded data almost always contains numbers and mixes
+and/randomly. The validation function could reject strings that look like file paths/XPaths (e.g., they contain/but lack numbers or+signs):Workaround:
Setting
truncateBase64: falseinrepomix.config.jsresolves the issue and leaves the string intact.Usage Context
Repomix CLI
Repomix Version
1.13.0
Node.js Version
v24.14.0