Fix starting runs #459

wilko77 · 2019-11-01T06:28:31Z

It all started with this strange failure in a clkhash test:

State: queued
Stage (1/3): waiting for CLKs
Progress: 100.00%
State: queued
Stage (1/3): waiting for CLKs
Progress: 100.00%
State: queued
Stage (2/3): compute similarity scores
State: running
Stage (3/3): compute output
State: running
Stage (4/3): there is no description for this stage

It turns out, that the entity service is naughty.

The problem is around the detection of the race condition to start a run twice.
The idea is that a run's state is set to created after creation, and only once all data provider have uploaded the necessary clks, the run state gets promoted to queued and, consequently, the run execution gets started.

However, some naughty code in views/run/list.py:post did some premature clk checking and run state changing. After all that, it would still call the check_for_executable_runs task, which will then do exactly the same again.
Now, as we previously "fixed" the get_created_runs_and_queue by allowing to return runs that are either created OR queued, together with the fact that check_for_executable_runs can be called twice almost in parallel, led to double queuing of the run and double stage progressing.

To stop all this nonsense, I propose to not include run logic in the views of ngnix. If we always call the same task as entrypoint (check_for_executable_runs), it will, most likely, keep as saner for longer. Hopefully.

…es are too new.

did half of the work of `check_for_executable_runs`. And as a side effect, run stages were increased too many times, broke the race condition disabler code.

gusmith

Nicely spotted.
Only things to close this PR is to update the changelog.

gusmith · 2019-11-04T00:41:17Z

backend/entityservice/database/insertions.py

              state = 'queued'
            WHERE
-              state IN ('created', 'queued') AND project = %s
+              state = 'created' AND project = %s


gusmith · 2019-11-04T00:47:30Z

backend/requirements.txt

 PyYAML==5.1
 redis==3.2.1
-requests==2.21.0
+requests==2.22.0


Need to add this dependency update in the changelog.

wilko added 2 commits November 1, 2019 17:02

requests was complaining when building a docker image that dependenci…

d6a1429

…es are too new.

removed code which led to trouble.

636afd1

did half of the work of `check_for_executable_runs`. And as a side effect, run stages were increased too many times, broke the race condition disabler code.

wilko77 requested a review from gusmith November 1, 2019 06:28

gusmith approved these changes Nov 4, 2019

View reviewed changes

wilko and others added 2 commits November 4, 2019 18:18

updated changelog

8fb0a5d

Merge branch 'develop' into fix_starting_runs

53efd73

wilko77 merged commit 7f2dad0 into develop Nov 4, 2019

wilko77 deleted the fix_starting_runs branch November 4, 2019 10:48

hardbyte mentioned this pull request Feb 10, 2020

Release: app version 1.13.0-beta, frontend version 1.4.6-beta #498

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix starting runs #459

Fix starting runs #459

Uh oh!

wilko77 commented Nov 1, 2019

Uh oh!

gusmith left a comment

Uh oh!

gusmith Nov 4, 2019

Uh oh!

gusmith Nov 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix starting runs #459

Fix starting runs #459

Uh oh!

Conversation

wilko77 commented Nov 1, 2019

Uh oh!

gusmith left a comment

Choose a reason for hiding this comment

Uh oh!

gusmith Nov 4, 2019

Choose a reason for hiding this comment

Uh oh!

gusmith Nov 4, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants