Skip to content

Fix: resolve async communication deadlock and daemon stability issuesnication core#91

Merged
bfly123 merged 3 commits intobfly123:mainfrom
daniellee2015:fix/async-communication-core
Feb 23, 2026
Merged

Fix: resolve async communication deadlock and daemon stability issuesnication core#91
bfly123 merged 3 commits intobfly123:mainfrom
daniellee2015:fix/async-communication-core

Conversation

@daniellee2015
Copy link
Copy Markdown
Contributor

Summary

This PR fixes critical bugs that cause async requests to get stuck in
"processing" state and daemon crashes. These issues affect OpenCode
and Gemini providers, causing communication failures and system-wide
breakdowns.

Issues Fixed

1. OpenCode SQLite Support (commit 8afd215)

  • Problem: OpenCode 0.29.0+ migrated from JSON to SQLite, breaking
    session discovery
  • Solution: Add full SQLite database support with backward
    compatibility
  • Impact: Fixes communication detection where OpenCode completes
    tasks but CCB doesn't receive replies

2. Async Communication Deadlock (commit 30efee8)

  • Problem: Requests get stuck in "processing" state permanently
    • OpenCode second call always fails (100% reproducible)
    • Gemini intermittent failures
    • Strict req_id matching causes permanent deadlock
  • Solution:
    • Fix OpenCode session ID pinning to detect newer sessions
    • Enhance state synchronization in _read_since()
    • Add degraded completion detection (accept CCB_DONE even with
      req_id mismatch)
  • Impact: Resolves permanent "processing" state for both OpenCode
    and Gemini

3. Daemon Stability (commit 8c24969)

  • Problem:
    • Daemon becomes zombie process (defunct)
    • _parent_monitor thread crashes due to indentation bug
    • Unified askd not used in background mode
    • Gemini hash overflow causing message detection failures
  • Solution:
    • Fix _parent_monitor thread start indentation
    • Remove foreground_mode requirement for unified askd
    • Add None check for msg_id comparison in Gemini
  • Impact: Prevents daemon crashes and ensures askd is used in all
    modes

Testing

  • 18+ concurrent/sequential calls across 3 LLMs (OpenCode, Gemini,
    Codex)
  • All tests successful with no deadlocks or crashes
  • Test script: test_minimal_fix.sh

Files Changed

  • lib/opencode_comm.py: SQLite support + session detection fixes
  • lib/askd/adapters/opencode.py: Degraded completion detection
  • lib/askd/adapters/gemini.py: Degraded completion detection
  • bin/ask: Enable unified askd in all modes
  • lib/askd_server.py: Fix _parent_monitor thread crash
  • lib/gemini_comm.py: Add None check for msg_id

Backward Compatibility

All changes maintain backward compatibility:

  • OpenCode: Falls back to JSON file storage if SQLite not available
  • Degraded completion: Only triggers on timeout with CCB_DONE present
  • Unified askd: Works in both foreground and background modes

Co-authored

Co-analyzed-by: Gemini, OpenCode, Codex (multi-model collaborative
debugging)

daniellee2015 and others added 3 commits February 20, 2026 10:12
OpenCode 0.29.0+ migrated from JSON file storage to SQLite database.
This commit adds full SQLite support with backward compatibility.

Changes:
- Add SQLite database reading for sessions, messages, and parts
- Implement session discovery from database with improved matching
  - Query LIMIT increased from 50 to 200 sessions
  - Find most recent matching session instead of first match
  - Fixes issue where other projects' sessions pushed target out of results
- Enable reasoning fallback for text extraction
  - Handles OpenCode responses in "reasoning" type parts
- Maintain backward compatibility with JSON file storage
- Add comprehensive test coverage for SQLite operations

Fixes communication detection issue where OpenCode completes tasks
but CCB doesn't receive replies.

Co-authored-by: Codex <codex@ccb>
Co-authored-by: Gemini <gemini@ccb>
This commit fixes three critical issues that caused async requests to
gemini and opencode to get stuck in "processing" state:

1. OpenCode session ID pinning: Modified _get_latest_session_from_db()
   to detect and switch to newer sessions even when session_id_filter
   is set. This fixes the "second call always fails" issue.

2. Incomplete state updates: Enhanced _read_since() to update all state
   fields (assistant_count, last_assistant_id, etc.) when session_updated
   changes, preventing stale state comparisons.

3. Strict completion detection: Added degraded completion detection in
   both OpenCode and Gemini adapters. When timeout occurs but reply
   contains any CCB_DONE marker, accept as completed even if req_id
   doesn't match (with warning log).

These minimal changes resolve:
- OpenCode second call failure (100% reproducible)
- Gemini intermittent failures
- Permanent "processing" state when req_id mismatches

Files changed:
- lib/opencode_comm.py: Session detection and state sync fixes
- lib/askd/adapters/opencode.py: Degraded completion detection
- lib/askd/adapters/gemini.py: Degraded completion detection

Test: ./test_minimal_fix.sh
Documentation: ISSUE_ANALYSIS.md, PR_MINIMAL_FIX.md

Co-analyzed-by: Gemini, OpenCode, Codex
This commit fixes three critical bugs that cause daemon crashes and
communication failures:

1. Unified askd not used in background mode: Removed foreground_mode
   requirement from _use_unified_daemon() check. This ensures askd is
   used in all modes (foreground and background), fixing the core issue
   where CCB_CALLER triggers background mode but askd was not used.

2. _parent_monitor thread crash: Fixed indentation bug where
   threading.Thread(target=_parent_monitor).start() was outside the
   if block where _parent_monitor was defined. This caused NameError
   when parent_pid was not set, leading to daemon crashes and zombie
   processes.

3. Gemini hash overflow: Added None check for msg_id before comparison
   in GeminiLogReader. When msg_id is None, skip comparison to prevent
   hash overflow issues that cause message detection failures.

These fixes resolve:
- Requests not using askd in background mode (root cause)
- Daemon becoming zombie process (defunct)
- Gemini intermittent message detection failures
- System-wide communication breakdowns

Tested: 18 concurrent/sequential calls across 3 LLMs, all successful.

Files changed:
- bin/ask: Enable unified askd in all modes
- lib/askd_server.py: Fix _parent_monitor thread start indentation
- lib/gemini_comm.py: Add None check for msg_id comparison

Related to commit aad38e3 (async communication fixes)
@bfly123 bfly123 merged commit 6ec8303 into bfly123:main Feb 23, 2026
3 of 13 checks passed
bfly123 added a commit that referenced this pull request Feb 24, 2026
Remove temporary analysis docs (ISSUE_ANALYSIS.md, PR_MINIMAL_FIX.md)
that should not live in the repo, move test_minimal_fix.sh to test/,
and update _REQ_ID_RE in opencode_comm.py to match both old hex and
new timestamp-based req_id formats.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bfly123 added a commit that referenced this pull request Feb 24, 2026
Covers merged PRs #87, #91, #92, #96, #97: Gemini CLI 0.29.0
dual-hash support, OpenCode async deadlock fix, mail setup v3
compat, lpend stale-registry fallback, and autostart routing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bfly123 added a commit that referenced this pull request Feb 26, 2026
Move notify_mode check before unified daemon path in ask command.
PR #91 moved unified daemon check to catch all modes, but this
caused --notify (fire-and-forget) to go through full request-response
cycle, triggering notify_completion -> ask -> daemon -> notify loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants