Skip to content

Fix an offset bug#807

Merged
davidanthoff merged 1 commit intomasterfrom
fix-offset-bug
Jul 22, 2020
Merged

Fix an offset bug#807
davidanthoff merged 1 commit intomasterfrom
fix-offset-bug

Conversation

@davidanthoff
Copy link
Copy Markdown
Member

Next attempt to fix #768.

This is kind of embarrassing because I completely forgot about the whole offset vs offset2 thing in the code base... Here is the background story: we had a lot of text sync errors in the fall, so I rewrote the whole text sync logic, largely just copying the algorithm from the original MS node implementation. One thing I was never able to square was the original get_offset implementation we had and one that followed the MS logic (that is to say, I didn't really understand our implementation :) ). In the end I added the MS based implementation as get_offset2 and used that for all the text sync stuff. That got rid of all the text sync bugs. I didn't remove the original get_offset implementation because I didn't fully understand the ways it was used in various other parts of the LS, and was worried that my new implementation actually used a slightly different definition than our original implementation. So, I left it in there and some of the request handlers still use it today.

So one simple explanation for our problems here is that the old get_offset implementation might just be buggy. The fact that swapping it out for the text sync stuff fixed all the problems around that suggests so :) I don't really know why I never followed up on this and tried to generally use get_offset2 everywhere in the code base, I think I simply forgot...

So this just swaps things out and uses the get_offset2 implementation here. Maybe that will just fix things :)

@davidanthoff davidanthoff self-assigned this Jul 22, 2020
@davidanthoff
Copy link
Copy Markdown
Member Author

Note to myself: if this works, we should make the same change to hover and definition requests, where we are also still seeing offset errors.

@pfitzseb
Copy link
Copy Markdown
Member

I did spend an hour or two playing around with get_offset and it seemed correct to me, but let's see how this pans out.

@davidanthoff davidanthoff merged commit fa14ca1 into master Jul 22, 2020
@davidanthoff davidanthoff deleted the fix-offset-bug branch July 22, 2020 17:42
@davidanthoff
Copy link
Copy Markdown
Member Author

Yeah, I was never able to pin down or identify a bug in the original get_offset, but not using it fixed the sync issues ;) I think if there is a problem, it must be some weird corner case...

@ZacLN
Copy link
Copy Markdown
Contributor

ZacLN commented Jul 22, 2020

If this is unsuccessful and there are still questions as to whether we the root issue is with our incremental document updating we have the option of moving back to 'full text updating'. This would have performance penalties but should give us certainty over the problem..

@davidanthoff
Copy link
Copy Markdown
Member Author

I'm fairly positive that the text sync is correct now, otherwise we should see tons of

throw(LSSyncMismatch("Mismatch between server and client text for $(doc._uri). _open_in_editor is $(doc._open_in_editor). _workspace_file is $(doc._workspace_file). _version is $(doc._version)."))
, right? And I think we haven't seen that at all for a very long time.

@pfitzseb
Copy link
Copy Markdown
Member

pfitzseb commented Jul 23, 2020

We did just get e.g.

get_offset crashed. More diagnostics:
line=176
character=0
position(io)=0
line_offsets='[0]'
text=''

original_error=ERROR: BoundsError: attempt to access 1-element Array{Int64,1} at index [177]

on 0.17.7. That does kinda smell like a sync error unrelated to the document version stuff. Maybe there's an issue where text edits aren't applied properly in very specific circumstances or something...

@pfitzseb
Copy link
Copy Markdown
Member

We haven't gotten a report about this on 0.17.8 yet...

@aviatesk
Copy link
Copy Markdown
Member

I think we have just one crash report (7 hours ago) ?

@pfitzseb
Copy link
Copy Markdown
Member

No, we got a few more that I didn't realize were related:

error("Invalid arguments.")

isn't as easy to make out as LSOffsetError. Ok, so this didn't fix it either.

@aviatesk
Copy link
Copy Markdown
Member

agh, true... #809

@ZacLN
Copy link
Copy Markdown
Contributor

ZacLN commented Jul 24, 2020

So my ideal way forward from here would be to revert to full document syncing on an insiders build release to confirm that is the issue. Then I would like to add some inspection code that checks the full text between client and server after each update, storing text change events (obfuscated) that then are sent as part of the error when we get failures such as the above. This allows us to recreate the failed update.

Alternatively we could do more reviewing of the incremental document update code or chuck it out and start again from scratch.

@davidanthoff
Copy link
Copy Markdown
Member Author

I do have the impression, though, that this PR here reduced the number of users affected by quite a lot? Which makes all of this even more weird, because if this is caused by a sync error, then this PR here should not have made any difference at all...

Agreed that maybe we should try to the non incremental sync for a while. I don't know whether we have enough crash reports for this particular problem on insider, though, to really get a good sense what is going on. Could we switch the release channel over to full sync for a few days at some point and see what happens?

I also think that we probably just have to postpone further attempts to fix this until after Juliacon? I'm a bit worried that any further diagnostic work we might do could introduce other unintended side effects...

@oppo-source oppo-source removed the request for review from a team April 16, 2021 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_offset crashed

4 participants