Skip to content

Conversation

TheBlueMatt
Copy link
Collaborator

Mostly some issues in persistence, but also one doc fix.

@TheBlueMatt TheBlueMatt added this to the 0.2 milestone Sep 24, 2025
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Sep 24, 2025

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@TheBlueMatt TheBlueMatt requested a review from tnull September 24, 2025 19:16
Copy link

codecov bot commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 38.38384% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.53%. Comparing base (e929517) to head (a9ddf3f).
⚠️ Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
lightning-liquidity/src/lsps2/service.rs 40.74% 27 Missing and 5 partials ⚠️
lightning-liquidity/src/lsps5/service.rs 35.55% 24 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4118      +/-   ##
==========================================
+ Coverage   88.39%   88.53%   +0.13%     
==========================================
  Files         178      179       +1     
  Lines      134179   134413     +234     
  Branches   134179   134413     +234     
==========================================
+ Hits       118610   119002     +392     
+ Misses      12895    12653     -242     
- Partials     2674     2758      +84     
Flag Coverage Δ
fuzzing 21.23% <0.00%> (?)
tests 88.37% <38.38%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

.await?;
let mut future_opt = None;
{
let per_peer_state = self.per_peer_state.read().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if we want to check again here, why bother with the two need_remove/need_persist lists in the first place? Wouldn't it be easier to keep the two steps then separate, ie. first just iterate all peers and repersist what needs persisting, then retain and remove any prunable peers in one go? Seems like keeping the two steps separate might be simpler?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this did end up a bit messier than I was hoping. One thing I still want to avoid, though, is holding the top-level write lock while persisting. Its not clear to me how to accomplish that without two passes (unless we want to do a loop-until-we-loop-and-dont-find-peers-to-remove which I guess we could but feels a bit wasteful). Do you have any suggestions there?

LSPS5 doesn't have inner peer locks, so we ultimately take the write lock inside the second loop (and into a persist!) as well. There the three-pass thing is definitely a weaker argument, but I'm still not sure if a loop-until-we-loop-and-dont-find-peers-to-remove is better.

} else {
self.persist_peer_state(counterparty_node_id).await?;

if self.persistence_in_flight.fetch_sub(1, Ordering::AcqRel) != 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fetch_update to avoid the race here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm? Not sure which race you're referring to. I copied this logic from PeerManager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a potential race between fetch_sub and store below? Although granted I don't believe it would have much consequence, doing both in one operation just seemed cleaner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I actually think its correct. If we do a fetch_sub and the number we get back is greater than 1, we still "own" the "mutex" - no other persist should be in the loop. We can thus write one to reset the counter such that we won't loop too many times. fetch_update isn't atomic, its actually a compare_and_swap loop which I think doens't quite work.

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@TheBlueMatt TheBlueMatt force-pushed the 2025-09-lsps-persistence-races branch from 6793e7b to 1cb7cc8 Compare September 25, 2025 12:06
While it would be correct to use `(their_network_key,
intercept_scid)` as the idempotency key when handling
`LSPS2ServiceEvent::OpenChannel`, we don't actually expect anyone
to do so as it would require separate storage to track the
`intercept_scid` -> opened channel mappings.

Thus, we update the documentation to note that the correct
idempotency key is `(their_network_key, user_channel_id)`.
@TheBlueMatt TheBlueMatt force-pushed the 2025-09-lsps-persistence-races branch from 1cb7cc8 to 3b7f95c Compare September 25, 2025 12:07
@TheBlueMatt
Copy link
Collaborator Author

Rebased and addressed the issue @tnull identified.

@TheBlueMatt TheBlueMatt force-pushed the 2025-09-lsps-persistence-races branch 2 times, most recently from 32959fb to b099dd6 Compare September 25, 2025 13:50
@TheBlueMatt
Copy link
Collaborator Author

Squashed without further changes.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, otherwise LGTM

If we note that a peer should be removed in LSPS2/5 handling, we
need to make sure that the peer wasn't re-added between dropping
its state in memory and going to remove its state from disk. If it
is, we need to overwrite the current on-disk state instead.
There are various race conditions between `persist` calls on the
LSPS2 and LSPS5 services. Thus, we simply ensure that no two
`persist` calls can be in-flight at the same time, ensuring that
any new state updates between them will ultimately be persisted by
re-persisting from the top in the originally-running `persist` call
if another is started before it finishes.
@TheBlueMatt TheBlueMatt force-pushed the 2025-09-lsps-persistence-races branch from b099dd6 to a9ddf3f Compare September 25, 2025 14:02
@TheBlueMatt
Copy link
Collaborator Author

Made the removal check in LSPS2 consistent, as well as made the counterparty node id nomenclature consistent:

$ git diff-tree -U2 b099dd6976 a9ddf3f1c9
diff --git a/lightning-liquidity/src/lsps5/service.rs b/lightning-liquidity/src/lsps5/service.rs
index ca04ccf179..1111c682fb 100644
--- a/lightning-liquidity/src/lsps5/service.rs
+++ b/lightning-liquidity/src/lsps5/service.rs
@@ -275,10 +275,10 @@ where
 			}

-			for counterparty_node_id in need_persist.into_iter() {
-				debug_assert!(!need_remove.contains(&counterparty_node_id));
-				self.persist_peer_state(counterparty_node_id).await?;
+			for client_id in need_persist.into_iter() {
+				debug_assert!(!need_remove.contains(&client_id));
+				self.persist_peer_state(client_id).await?;
 			}

-			for counterparty_node_id in need_remove {
+			for client_id in need_remove {
 				let mut future_opt = None;
 				{
@@ -289,9 +289,9 @@ where
 					// itself.
 					let mut per_peer_state = self.per_peer_state.write().unwrap();
-					if let Entry::Occupied(mut entry) = per_peer_state.entry(counterparty_node_id) {
+					if let Entry::Occupied(mut entry) = per_peer_state.entry(client_id) {
 						let state = entry.get_mut();
-						if state.is_prunable() {
+						if state.is_prunable() && !self.client_has_open_channel(&client_id) {
 							entry.remove();
-							let key = counterparty_node_id.to_string();
+							let key = client_id.to_string();
 							future_opt = Some(self.kv_store.remove(
 								LIQUIDITY_MANAGER_PERSISTENCE_PRIMARY_NAMESPACE,
@@ -312,5 +312,5 @@ where
 					future.await?;
 				} else {
-					self.persist_peer_state(counterparty_node_id).await?;
+					self.persist_peer_state(client_id).await?;
 				}
 			}

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also double-checked this builds and tests on LDK Node.

@tnull tnull merged commit a0ca100 into lightningdevkit:main Sep 25, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants