Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It all started with this strange failure in a clkhash test:
It turns out, that the entity service is naughty.
The problem is around the detection of the race condition to start a run twice.
The idea is that a run's state is set to
createdafter creation, and only once all data provider have uploaded the necessary clks, the run state gets promoted toqueuedand, consequently, the run execution gets started.However, some naughty code in
views/run/list.py:postdid some premature clk checking and run state changing. After all that, it would still call thecheck_for_executable_runstask, which will then do exactly the same again.Now, as we previously "fixed" the
get_created_runs_and_queueby allowing to return runs that are eithercreatedORqueued, together with the fact thatcheck_for_executable_runscan be called twice almost in parallel, led to double queuing of the run and double stage progressing.To stop all this nonsense, I propose to not include run logic in the views of ngnix. If we always call the same task as entrypoint (
check_for_executable_runs), it will, most likely, keep as saner for longer. Hopefully.