test(cloudflare): Reduce flakiness for cloudflare with sub workers#20632
test(cloudflare): Reduce flakiness for cloudflare with sub workers#20632
Conversation
size-limit report 📦
|
|
quite interesting - a delay could mitigate it, but still won't guarantee a successful start. Maybe it would be better to have another |
|
@mydea I initially changed the logic to match the main workers functionality which was technically almost the same as before. I then checked how the I added it, should work now a little better 🤞 |
| stdout.on('data', (chunk: Buffer) => { | ||
| const text = chunk.toString(); | ||
| if (process.env.DEBUG) process.stdout.write(text); | ||
| output += text; | ||
|
|
||
| const match = output.match(/Ready on (https?:\/\/[^\s]+)/); | ||
| if (match?.[1]) { | ||
| resolve(parseInt(new URL(match[1]).port, 10)); | ||
| } | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Bug: The waitForReady function can hang indefinitely because it lacks an exit event handler for the child process. If the process exits before it's ready, tests will time out.
Severity: MEDIUM
Suggested Fix
Add an on('exit', ...) event handler to the child process within the waitForReady function. This handler should reject the promise if the process exits before the 'Ready on' message is received, ensuring that tests fail fast with a clear error instead of timing out. A similar implementation exists for the childSubWorker which can be used as a reference.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: dev-packages/cloudflare-integration-tests/runner.ts#L218-L237
Potential issue: The `waitForReady` function lacks a handler for the child process's
'exit' event. If the `wrangler` process exits prematurely (e.g., due to a configuration
error or port conflict) before emitting its "Ready on" message, the promise returned by
`waitForReady` will never resolve or reject. This causes any test calls to `makeRequest`
to also hang, leading to a test timeout instead of a fast failure with a clear error
message. The existing 'error' event handler is insufficient to catch all premature exit
scenarios.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 952baff. Configure here.
| childSubWorker?.on('error', onChildError); | ||
| child.on('error', onChildError); | ||
| child.on('message', (msg: string) => onChildMessage(msg, setWorkerPort)); | ||
| waitForReady(child).then(setWorkerPort).catch(reject); |
There was a problem hiding this comment.
Duplicate error handler registered on sub-worker process
Low Severity
The onChildError handler is registered on childSubWorker's 'error' event twice — once at line 259 inside the if (existsSync(...)) block, and again at line 293 outside it via optional chaining. When a sub-worker exists and emits an error, onChildError fires twice, logging the error and calling reject redundantly.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 952baff. Configure here.
There was a problem hiding this comment.
Whoops, forgot to change that part of the code
| resolve(parseInt(new URL(match[1]).port, 10)); | ||
| } | ||
| }); | ||
| }); |
There was a problem hiding this comment.
waitForReady hangs forever if process exits early
Medium Severity
waitForReady only resolves when "Ready on" appears in stdout but never rejects if the process exits or errors before that. At line 264, await waitForReady(childSubWorker) blocks indefinitely if the sub-worker crashes during startup. The outer onChildError handler rejects the isComplete promise, but waitForReady's own promise (which shadows reject in its constructor) never settles, so the async function remains stuck and the main worker is never spawned. The old code properly rejected the awaited startup promise on process exit/error. This is a regression that can cause test hangs instead of fast, clear failures.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 952baff. Configure here.
There was a problem hiding this comment.
It would only "hang" for as long as the test timeout. Also if the subworker process would exit, then it is flaky anyways.
|
Sounds good to me! |


Fixes #20626
It seems that cloudflare integration tests can be flaky.
After some digging into this with some clanker help, the idea came up that this is due to sub workers not being ready when DEV_SERVER_READY is emitted. To accomodate this, in the case of a sub worker existing this just adds a little delay (100ms, random number) to wait for before we continue, hopefully reducing flakes. AI wanted to look at more complex things like retrying requests etc. but this felt a bit overkill, I think this is likely good enough if that is indeed the problem...?