-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted #25743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@juliuszsompolski add a new pr for problem we have discussed . |
juliuszsompolski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks @AngersZhuuuu !
|
ok to test |
|
Test build #110513 has finished for PR 25743 at commit
|
| // Actually do need to catch Throwable as some failures don't inherit from Exception and | ||
| // HiveServer will silently swallow them. | ||
| case e: Throwable => | ||
| if (statementId != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment explaining why we need this change?
|
Test build #110519 has finished for PR 25743 at commit
|
|
Test build #110644 has finished for PR 25743 at commit
|
juliuszsompolski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from me. cc @wangyum
|
retest this please |
|
Test build #111216 has finished for PR 25743 at commit
|
|
Merged to master. |
What changes were proposed in this pull request?
Discuss in #25611
If cancel() and close() is called very quickly after the query is started, then they may both call cleanup() before Spark Jobs are started. Then sqlContext.sparkContext.cancelJobGroup(statementId) does nothing.
But then the execute thread can start the jobs, and only then get interrupted and exit through here. But then it will exit here, and no-one will cancel these jobs and they will keep running even though this execution has exited.
So when execute() was interrupted by
cancel(), when get into catch block, we should call canJobGroup again to make sure the job was canceled.Why are the changes needed?
Does this PR introduce any user-facing change?
NO
How was this patch tested?
MT