Skip to content

Conversation

@shuangkun
Copy link
Member

I found that when the database was locked, it blocked the execution of a large number of our workflows.

Add timeout control for GetWorkflowForEstimator database query to prevent workflow execution from being blocked when database is slow or locked.

  • Add default 5 second timeout for database queries
  • Configurable via WORKFLOW_ESTIMATION_DB_QUERY_TIMEOUT environment variable
  • Return default estimator on timeout/error to ensure workflow continues
  • Add warning logs for timeout and error cases
  • Add documentation for new environment variable

Motivation

Modifications

Verification

Documentation

@shuangkun shuangkun force-pushed the fix/estimation-db-query-timeout branch 4 times, most recently from c1938dc to bd2c203 Compare December 7, 2025 04:51
@Joibel Joibel added the cherry-pick/3.7 Cherry-pick this to release-3.7 label Dec 8, 2025
shuangkun and others added 11 commits December 11, 2025 23:27
… blocking

Add timeout control for GetWorkflowForEstimator database query to prevent
workflow execution from being blocked when database is slow or locked.

- Add default 5 second timeout for database queries
- Configurable via WORKFLOW_ESTIMATION_DB_QUERY_TIMEOUT environment variable
- Return default estimator on timeout/error to ensure workflow continues
- Add warning logs for timeout and error cases
- Add documentation for new environment variable

Signed-off-by: shuangkun <[email protected]>
Use WithContext(ctx) when building SQL selector to ensure the timeout
context from estimator_factory.go is properly propagated to the database
query execution. This allows the query timeout to be respected and
prevents blocking workflow execution when database is slow or locked.

Signed-off-by: shuangkun <[email protected]>
Move database query timeout handling from estimator_factory.go into
GetWorkflowForEstimator method in workflow_archive.go. This improves
code organization by:

- Encapsulating timeout logic at the database layer where it belongs
- Simplifying caller code (estimator_factory.go) by removing timeout
  setup and error handling complexity
- Making timeout protection available to all callers automatically
- Maintaining backward compatibility: respects existing context deadlines
  if set by caller, otherwise applies default timeout from environment
  variable or 5 second default

The timeout is configurable via WORKFLOW_ESTIMATION_DB_QUERY_TIMEOUT
environment variable and defaults to 5 seconds if not set.

Signed-off-by: shuangkun <[email protected]>
Signed-off-by: shuangkun <[email protected]>
Signed-off-by: shuangkun <[email protected]>
Signed-off-by: shuangkun <[email protected]>
Co-authored-by: Mason Malone <[email protected]>
Signed-off-by: shuangkun tian <[email protected]>
Signed-off-by: shuangkun <[email protected]>
When ctx is nil, explicitly create context.Background() first, then add
logger to it. This makes the code more readable and clear about what
happens in each step, while avoiding panic in env.LookupEnvDurationOr
which requires a logger in the context.

Signed-off-by: shuangkun <[email protected]>
Add nolint comment for contextcheck linter. When ctx is nil, we need to
create a new context with logger, which is a valid use case. The context
is immediately used to derive a new context with timeout, maintaining
proper context inheritance.

Signed-off-by: shuangkun <[email protected]>
… DB query timeout

- Change from time.Duration to int (seconds) for timeout configuration
- Rename env var from WORKFLOW_ESTIMATION_DB_QUERY_TIMEOUT to WORKFLOW_ESTIMATION_DB_QUERY_TIMEOUT_SECONDS
- Remove nil context handling, require valid ctx parameter
- Use ctx.WithTimeout directly instead of creating background context

Signed-off-by: shuangkun <[email protected]>
@shuangkun shuangkun force-pushed the fix/estimation-db-query-timeout branch from dd28cf6 to a9fec7d Compare December 11, 2025 15:31
@shuangkun
Copy link
Member Author

/retest

@MasonM MasonM requested a review from Joibel December 15, 2025 00:01
@Joibel Joibel merged commit 0c77432 into argoproj:main Dec 16, 2025
63 of 69 checks passed
@argo-cd-cherry-pick-bot
Copy link

❌ Cherry-pick failed for 3.7. Please check the workflow logs for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick/3.7 Cherry-pick this to release-3.7

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants