-
Notifications
You must be signed in to change notification settings - Fork 419
Fix race condition causing async payment failure #4106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition causing async payment failure #4106
Conversation
👋 Thanks for assigning @joostjager as a reviewer! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4106 +/- ##
=========================================
Coverage 88.53% 88.53%
=========================================
Files 175 179 +4
Lines 132702 134407 +1705
Branches 132702 134407 +1705
=========================================
+ Hits 117484 118994 +1510
- Misses 12618 12658 +40
- Partials 2600 2755 +155
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1be75bd
to
5ed1b9f
Compare
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
5ed1b9f
to
a041894
Compare
lock_in_htlc_for_static_invoice(&static_invoice_om, peer_id, sender, sender_lsp); | ||
|
||
// The LSP has not transitioned the HTLC to the intercepts map internally because | ||
// process_pending_htlc_forwards has not been called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual message passing is really an advantage here.
✅ Added second reviewer: @jkczyz |
This PR also unlocks more e2e testing in ldk-node. |
@jkczyz understandable if it doesn't make sense for you to review here, feel free to reroll the bot's assignment |
lightning/src/ln/channelmanager.rs
Outdated
macro_rules! handle_monitor_update_completion { | ||
($self: ident, $peer_state_lock: expr, $peer_state: expr, $per_peer_state_lock: expr, $chan: expr) => { { | ||
let channel_id = $chan.context.channel_id(); | ||
let short_channel_id = $chan.funding.get_short_channel_id().unwrap_or($chan.context().outbound_scid_alias()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just change decode_update_add_htlcs
to use the outbound alias? That won't change and also it looks like the SCID itself is basically unused, so there's no reason to use the real one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh fuck this is a mess wrt splicing. Please definitely do this #4121
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be addressed, though we don't move to (PublicKey, ChannelId)
here
Since we have added/are adding splicing support, the scid of a channel is liable to change post-splice. Some maps in the ChannelManager are keyed by the scid of a channel, which is an issue now -- if we forward an HTLC from a channel and then splice that channel before the HTLC is resolved, we'll end up with an HTLC source with an scid that doesn't correspond to any open channel. This may result in loss of the HTLC resolution. The outbound scid alias of a channel is stable even post-splice, so for the short term here we switch to using that instead. In the medium term we should update these maps to use (PublicKey, ChannelId) like everything else. We don't always use the alias for outbound forwarded HTLCs, since we tend to use whatever outbound scid is in the onion. That's fine because we properly handle the case where the outbound channel cannot be found; the main problem is in inbound HTLCs and forgetting their resolution.
As the LSP of an async sender, when we receive an update_add with the hold_htlc flag set, after its onion is decoded we transition the pending HTLC to the ChannelManager::pending_intercepted_htlcs. However, if we receive the release_held_htlc message from the receiver *before* we've had a chance to make this transition, we'll fail to release the HTLC and it will sit in the pending intercepts map until it is failed backwards. To fix this race condition, if we receive release_held_htlc from the recipient we'll not only check the pending_intercepted_htlcs map for the presence of this HTLC but also check the map where we keep HTLCs prior to their onions being decoded.
a041894
to
ade1f34
Compare
let mut decode_update_add_htlcs = None; | ||
if !pending_update_adds.is_empty() { | ||
decode_update_add_htlcs = Some((short_channel_id, pending_update_adds)); | ||
decode_update_add_htlcs = Some((outbound_scid_alias, pending_update_adds)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the only place we key to-be-decoded update_adds with an scid, which should happen for all inbound HTLCs. So I think the source scid will always be the alias now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second commit re-acked. First commit I need some context, which I'll get offline. Will approve optimistically for now, to keep moving.
As the LSP of an async sender, when we receive an
update_add
with thehold_htlc
flag set, after its onion is decoded we transition the pending HTLC to theChannelManager::pending_intercepted_htlcs
. However, if we receive therelease_held_htlc
message from the receiver before we've had a chance to make this transition, we'll fail to release the HTLC and it will sit in the pending intercepts map until it is failed backwards.To fix this race condition, if we receive
release_held_htlc
from the recipient we'll not only check thepending_intercepted_htlcs
map for the presence of this HTLC but also check the map where we keep HTLCs prior to their onions being decoded.