Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While doing some line diffs, I came across this bug. It was very hard to come up with a local fix: the assumption that one index == one character seems baked into the DiffBisect algorithm. Instead I converted part of the diff machinery to []rune.
I actually think all of the diff code should use []rune instead of string. The public interface like DiffMain can remain string, but the strings should be converted to []rune immediately and used throughout. That way there will be no mismatch between indexing and characters.
I'm sure there are many more utf8 bugs lurking in the code. For example, look at the loop in DiffCommonOverlap. It tries to extract the last character from the string, but actually extracts the last byte. And look at the way diffHalfMatchI indexes its first argument:
seed := l[i : i+len(l)/4]
There is no reason to believe that that slice will be valid utf8.
These bugs don't show up the in the tests because the tests use ASCII mostly. But line diffs involve converting the lines into runes, and that produces a lot of non-ASCII utf8. That is how I found the bug.
So what do you think? Will you consider a pull request that rewrites the diff internals to use []rune?