Skip to content

[DB-1760] Fix persistent subscription stall if client sends no Acks or Nacks (#5382) (#5383)#5384

Merged
timothycoleman merged 1 commit intorelease/v24.10from
cherry-pick/5383/cherry-pick/5382/timothycoleman/fix-ps-stall-release/v25.1-release/v24.10
Nov 25, 2025
Merged

[DB-1760] Fix persistent subscription stall if client sends no Acks or Nacks (#5382) (#5383)#5384
timothycoleman merged 1 commit intorelease/v24.10from
cherry-pick/5383/cherry-pick/5382/timothycoleman/fix-ps-stall-release/v25.1-release/v24.10

Conversation

@timothycoleman
Copy link
Contributor

@timothycoleman timothycoleman commented Nov 25, 2025

Fixed: Persistent Subscription stall when client sends no Acks or Naks

User description

cherrypick of #5382

If a client is subscribed with a persistent subscription but neither ACKS or NAKS any events, the server will continue to retry them per the settings and eventually park them. However, during the process no additional reads top up the serverside historic buffer of events. Eventually the buffer will become empty and the subscription will stall until the leader changes or the subsystem is restarted.

This fix adds in the reads that were supposed to happen in these circumstances (there were missing parenthsis in the branch condition), so now the subscription will not stall and will continue to park the events.

Note that the stall can only manifest if the client doesn't send any ACKS or NACKS, which is an indicator of a problem in the client code.

(tests in master)


PR Type

Bug fix


Description

  • Fix persistent subscription stall caused by missing parentheses in branch condition

  • Ensures reads continue when client sends no Acks or Nacks

  • Prevents server-side historic buffer from becoming empty

  • Subscription now continues to park events instead of stalling


Diagram Walkthrough

flowchart LR
  A["Branch Condition<br/>Missing Parentheses"] -->|Fixed| B["Correct Bitwise<br/>Operation"]
  B -->|Enables| C["TryReadingNewBatch<br/>Executes"]
  C -->|Maintains| D["Historic Buffer<br/>Populated"]
  D -->|Prevents| E["Subscription<br/>Stall"]
Loading

File Walkthrough

Relevant files
Bug fix
PersistentSubscription.cs
Add missing parentheses to fix state condition logic         

src/EventStore.Core/Services/PersistentSubscription/PersistentSubscription.cs

  • Added missing parentheses around bitwise OR operation in branch
    condition
  • Corrects operator precedence to properly evaluate subscription state
    flags
  • Ensures TryReadingNewBatch() is called when subscription is Behind or
    has OutstandingPageRequest
  • Fixes logic that was preventing reads from topping up server-side
    historic buffer
+2/-2     

…r Nacks (#5382) (#5383)

cherrypick of #5382

If a client is subscribed with a persistent subscription but neither ACKS or NAKS any events, the server will continue to retry them per the settings and eventually park them. However, during the process no additional reads top up the serverside historic buffer of events. Eventually the buffer will become empty and the subscription will stall until the leader changes or the subsystem is restarted.

This fix adds in the reads that were supposed to happen in these circumstances (there were missing parenthsis in the branch condition), so now the subscription will not stall and will continue to park the events.

Note that the stall can only manifest if the client doesn't send any ACKS or NACKS, which is an indicator of a problem in the client code.

(tests in master)
@timothycoleman timothycoleman requested a review from a team as a code owner November 25, 2025 13:39
@linear
Copy link

linear bot commented Nov 25, 2025

@qodo-code-review
Copy link
Contributor

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing Auditing: The new conditional read trigger does not add or reference any audit logging for critical
subscription actions, leaving uncertainty whether these actions are recorded with required
context.

Referred Code
if ((_state & (PersistentSubscriptionState.Behind |
	 PersistentSubscriptionState.OutstandingPageRequest)) ==
	PersistentSubscriptionState.Behind)
	TryReadingNewBatch();

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing Error Path: The added condition invokes TryReadingNewBatch without any visible error handling or
fallback in this diff, so it is unclear how failures in that call are handled.

Referred Code
	if ((_state & (PersistentSubscriptionState.Behind |
		 PersistentSubscriptionState.OutstandingPageRequest)) ==
		PersistentSubscriptionState.Behind)
		TryReadingNewBatch();
}

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Improve readability by using HasFlag

Refactor the bitwise flag check to use the Enum.HasFlag() method for improved
readability and to make the condition's intent more explicit.

src/EventStore.Core/Services/PersistentSubscription/PersistentSubscription.cs [664-667]

-if ((_state & (PersistentSubscriptionState.Behind |
-     PersistentSubscriptionState.OutstandingPageRequest)) ==
-    PersistentSubscriptionState.Behind)
+if (_state.HasFlag(PersistentSubscriptionState.Behind) &&
+    !_state.HasFlag(PersistentSubscriptionState.OutstandingPageRequest))
     TryReadingNewBatch();
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the bitwise logic can be complex and proposes using Enum.HasFlag(), which is a valid and more readable alternative that makes the code's intent clearer.

Low
  • More

@timothycoleman timothycoleman merged commit 4e20f1b into release/v24.10 Nov 25, 2025
11 of 13 checks passed
@timothycoleman timothycoleman deleted the cherry-pick/5383/cherry-pick/5382/timothycoleman/fix-ps-stall-release/v25.1-release/v24.10 branch November 25, 2025 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants