Fix streaming error handling and frontend reconnection by Mng-dev-ai · Pull Request #382 · Mng-dev-ai/agentrove

Mng-dev-ai · 2026-03-14T01:57:50Z

Summary

Backend: Setup failures in execute_chat() (session creation, model config, transport errors) now emit a proper error event instead of silently crashing and leaving the SSE client hanging forever. New _emit_bootstrap_error() persists the error event to DB, publishes to Redis pub/sub, and updates the message snapshot with consistent last_seq / FAILED status. Preserves any partial response already flushed.
Backend: Scheduler's direct execute_chat() call now has the same bootstrap error handling, preventing scheduled-task chats from getting stuck in streaming state.
Frontend: Remove lastConnectedStreamRef guard that blocked callback refresh when callbacks changed but the stream ID stayed the same.
Frontend: Add direct store check in useStreamReconnect to prevent double EventSource connections, making the 100ms setTimeout a performance optimization rather than a correctness requirement.
Frontend: Replace ES2020-incompatible findLast with a reverse for loop.

Test plan

Simulate a setup failure (e.g., invalid model that causes set_model to throw) and verify the SSE client receives an error event, the message is marked FAILED, and the frontend shows an error state
Verify partial responses are preserved when a mid-stream failure occurs after content has been flushed
Toggle notification settings while streaming and verify permission request callbacks still work
Navigate rapidly between a chat with an active stream and other chats — verify no duplicate EventSource connections in Network tab
Verify scheduled task failures properly mark assistant messages as FAILED

Backend: - Setup failures in execute_chat (session creation, model config) now emit a proper error event via emit_bootstrap_error: persists the error to DB, publishes to Redis pub/sub, and updates the message snapshot with consistent last_seq and FAILED status. Preserves any partial response already flushed and appends a renderable assistant_text error event to content_render so the error is visible on reload. - Scheduler's direct execute_chat call gets the same error handling. - Extract duplicated QUEUE_PROCESSING sniffing in chat.py into _extract_queue_processing_message_id helper. - Remove orphaned outer try block from execute_chat after moving failure handling to _bootstrap_and_execute. Frontend: - Remove lastConnectedStreamRef guard that blocked callback refresh when callbacks changed but the stream ID stayed the same. Add activeStreams identity guard so the subscription only fires when the stream set actually changes. - Add direct store check in useStreamReconnect to prevent double EventSource connections. Use getStreamByChat instead of Array.from(...).some(...). - onError now marks the assistant message as failed instead of removing it from cache/UI, and flushes buffered accumulator content before clearing the stream session. - Seed off-screen stream accumulators from the query cache so the first off-screen chunk doesn't overwrite cached content. - Replace ES2020-incompatible findLast with a reverse for loop.

Document the requirement to trace callback lifecycles before flagging closure bugs — prevents false positives where a helper appears to use the current chatId but is actually frozen from an earlier render.

Fill in the harness content layer that PR 1 routed to. Each artifact doc captures the rules an agent must know before writing that artifact type; each domain map gives entry points, vocabulary, gotchas, and verified prior-art PRs. Backend artifacts (docs/artifacts/backend/): - models.md — SQLAlchemy 2.x async, Mapped[], custom column types, Alembic workflow, anti-patterns - endpoints.md — FastAPI route handlers, deps.py wiring, domain exceptions, auth dependencies, SSE/WS conventions - services.md — class-based service shape, BaseDbService, exception discipline, integration boundaries - tests.md — endpoint-tests-only, conftest fixtures, fake providers, stub-vs-real boundaries - integrations.md — Docker, GitHub, ACP, SMTP, Redis ownership matrix Frontend artifacts (docs/artifacts/frontend/): - components.md — React 19 patterns, primitives, contexts/providers, hooks discipline, anti-patterns - state.md — Zustand vs context vs useState decision tree - data-fetching.md — TanStack Query setup, query-key factory, prefix keys for cwd-scoped invalidation, mutation patterns - styling.md — Tailwind tokens, monochrome palette, primitives, typography/icon/animation rules Domain maps (docs/domains/): - chat.md — entities, message state machine, queue/send-now, cross-domain edges; PRs Mng-dev-ai#592, Mng-dev-ai#593, Mng-dev-ai#594, Mng-dev-ai#560, Mng-dev-ai#419, Mng-dev-ai#251, Mng-dev-ai#454 - sandbox.md — Docker vs Host providers, lifecycle; PRs Mng-dev-ai#590, Mng-dev-ai#588, Mng-dev-ai#531, Mng-dev-ai#505, Mng-dev-ai#551, Mng-dev-ai#594 - providers.md — ACP adapter registry, per-agent quirks, persona gating; PRs Mng-dev-ai#591, Mng-dev-ai#589, Mng-dev-ai#528, Mng-dev-ai#499, Mng-dev-ai#542, Mng-dev-ai#538, Mng-dev-ai#541, Mng-dev-ai#465, Mng-dev-ai#537 - streaming.md — StreamEnvelope, seq-based reconnection, snapshot vs control events; PRs Mng-dev-ai#370, #173, #190, Mng-dev-ai#382, Mng-dev-ai#432, Mng-dev-ai#346, Mng-dev-ai#214, Mng-dev-ai#524, Mng-dev-ai#471, #192 - auth.md — fastapi-users, refresh tokens, encrypted-at-rest, WS auth handshake; PRs Mng-dev-ai#586, Mng-dev-ai#587, Mng-dev-ai#589, Mng-dev-ai#449, Mng-dev-ai#550, Mng-dev-ai#469 - git.md — GitService surface, worktrees, ChatCheckpoint; PRs Mng-dev-ai#592, Mng-dev-ai#594, Mng-dev-ai#593, Mng-dev-ai#596, Mng-dev-ai#527, Mng-dev-ai#398, Mng-dev-ai#402 - workspace.md — workspaces, skills, personas, slash commands; PRs Mng-dev-ai#598, Mng-dev-ai#597, Mng-dev-ai#596, Mng-dev-ai#506, Mng-dev-ai#510, Mng-dev-ai#537, Mng-dev-ai#542, Mng-dev-ai#561, Mng-dev-ai#414, Mng-dev-ai#464, Mng-dev-ai#476, Mng-dev-ai#563 Each prior-art PR was verified via gh pr view (state=MERGED, files within the cited domain) before citation. Cleanup: - AGENTS.md: removed PR-1's "Doc status" footer; routing tables now point at real files. The github.md route was rolled into git.md and workspace.md (no separate domain map needed at current surface area). - docs/legacy.md deleted — content now lives in the per-artifact docs it sourced.

Mng-dev-ai force-pushed the fix/streaming-error-handling-and-reconnect branch from d2f7bf5 to 3f8672e Compare March 14, 2026 02:51

Mng-dev-ai force-pushed the fix/streaming-error-handling-and-reconnect branch from 3f8672e to 210af81 Compare March 14, 2026 03:06

Mng-dev-ai added 2 commits March 14, 2026 05:15

Add callback closure analysis guidelines to code review section

c6e5f48

Document the requirement to trace callback lifecycles before flagging closure bugs — prevents false positives where a helper appears to use the current chatId but is actually frozen from an earlier render.

Add failure-path control flow guidelines to code review section

f35713e

Mng-dev-ai merged commit da30486 into main Mar 14, 2026
3 checks passed

This was referenced May 8, 2026

docs: write artifact docs and domain maps (PR 2/2) #604

Merged

docs: write artifact docs and domain maps (PR 2/2) #606

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix streaming error handling and frontend reconnection#382

Fix streaming error handling and frontend reconnection#382
Mng-dev-ai merged 3 commits into
mainfrom
fix/streaming-error-handling-and-reconnect

Mng-dev-ai commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mng-dev-ai commented Mar 14, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant