Skip to content

Conversation

valentinewallace
Copy link
Contributor

As the LSP of an async sender, when we receive an update_add with the hold_htlc flag set, after its onion is decoded we transition the pending HTLC to the ChannelManager::pending_intercepted_htlcs. However, if we receive the release_held_htlc message from the receiver before we've had a chance to make this transition, we'll fail to release the HTLC and it will sit in the pending intercepts map until it is failed backwards.

To fix this race condition, if we receive release_held_htlc from the recipient we'll not only check the pending_intercepted_htlcs map for the presence of this HTLC but also check the map where we keep HTLCs prior to their onions being decoded.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Sep 22, 2025

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Copy link

codecov bot commented Sep 22, 2025

Codecov Report

❌ Patch coverage is 92.85714% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.53%. Comparing base (e82ef2c) to head (ade1f34).
⚠️ Report is 92 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 89.23% 7 Missing ⚠️
lightning/src/ln/async_payments_tests.rs 97.56% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             main    #4106     +/-   ##
=========================================
  Coverage   88.53%   88.53%             
=========================================
  Files         175      179      +4     
  Lines      132702   134407   +1705     
  Branches   132702   134407   +1705     
=========================================
+ Hits       117484   118994   +1510     
- Misses      12618    12658     +40     
- Partials     2600     2755    +155     
Flag Coverage Δ
fuzzing 21.78% <26.76%> (?)
tests 88.37% <92.85%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@valentinewallace valentinewallace force-pushed the 2025-08-async-sender-fix-race branch from 1be75bd to 5ed1b9f Compare September 22, 2025 20:03
@valentinewallace valentinewallace added the weekly goal Someone wants to land this this week label Sep 22, 2025
@TheBlueMatt TheBlueMatt added this to the 0.2 milestone Sep 22, 2025
@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

joostjager
joostjager previously approved these changes Sep 24, 2025
joostjager
joostjager previously approved these changes Sep 24, 2025
lock_in_htlc_for_static_invoice(&static_invoice_om, peer_id, sender, sender_lsp);

// The LSP has not transitioned the HTLC to the intercepts map internally because
// process_pending_htlc_forwards has not been called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual message passing is really an advantage here.

@ldk-reviews-bot
Copy link

✅ Added second reviewer: @jkczyz

@joostjager
Copy link
Contributor

This PR also unlocks more e2e testing in ldk-node.

@valentinewallace
Copy link
Contributor Author

@jkczyz understandable if it doesn't make sense for you to review here, feel free to reroll the bot's assignment

@TheBlueMatt TheBlueMatt requested review from TheBlueMatt and removed request for jkczyz September 24, 2025 19:50
macro_rules! handle_monitor_update_completion {
($self: ident, $peer_state_lock: expr, $peer_state: expr, $per_peer_state_lock: expr, $chan: expr) => { {
let channel_id = $chan.context.channel_id();
let short_channel_id = $chan.funding.get_short_channel_id().unwrap_or($chan.context().outbound_scid_alias());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just change decode_update_add_htlcs to use the outbound alias? That won't change and also it looks like the SCID itself is basically unused, so there's no reason to use the real one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh fuck this is a mess wrt splicing. Please definitely do this #4121

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be addressed, though we don't move to (PublicKey, ChannelId) here

Since we have added/are adding splicing support, the scid of a channel is
liable to change post-splice.

Some maps in the ChannelManager are keyed by the scid of a channel, which is an
issue now -- if we forward an HTLC from a channel and then splice that
channel before the HTLC is resolved, we'll end up with an HTLC source with an
scid that doesn't correspond to any open channel. This may result in loss of
the HTLC resolution.

The outbound scid alias of a channel is stable even post-splice, so for the
short term here we switch to using that instead. In the medium term we should
update these maps to use (PublicKey, ChannelId) like everything else.

We don't always use the alias for outbound forwarded HTLCs, since we tend to
use whatever outbound scid is in the onion. That's fine because we properly
handle the case where the outbound channel cannot be found; the main problem is
in inbound HTLCs and forgetting their resolution.
As the LSP of an async sender, when we receive an update_add with the hold_htlc
flag set, after its onion is decoded we transition the pending HTLC to the
ChannelManager::pending_intercepted_htlcs.  However, if we receive the
release_held_htlc message from the receiver *before* we've had a chance to make
this transition, we'll fail to release the HTLC and it will sit in the pending
intercepts map until it is failed backwards.

To fix this race condition, if we receive release_held_htlc from the recipient
we'll not only check the pending_intercepted_htlcs map for the presence of this
HTLC but also check the map where we keep HTLCs prior to their onions being
decoded.
let mut decode_update_add_htlcs = None;
if !pending_update_adds.is_empty() {
decode_update_add_htlcs = Some((short_channel_id, pending_update_adds));
decode_update_add_htlcs = Some((outbound_scid_alias, pending_update_adds));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the only place we key to-be-decoded update_adds with an scid, which should happen for all inbound HTLCs. So I think the source scid will always be the alias now.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second commit re-acked. First commit I need some context, which I'll get offline. Will approve optimistically for now, to keep moving.

@joostjager joostjager merged commit 9514637 into lightningdevkit:main Sep 25, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
weekly goal Someone wants to land this this week
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants