[SPARK-46308] Forbid recursive error handling by adding recursion guards #44210

cdkrot · 2023-12-06T12:49:16Z

What changes were proposed in this pull request?

Revert #44144, and introduce a forbid recursion guard as previously proposed. This way the infinite error handling recursion is still prevented, but the client-side knob is still present.

Why are the changes needed?

Previously proposed as part of #44144, however was discussed in favour of something else. However it seems (proposal by @grundprinzip) that the original proposal was more correct, since it seems driver stacktrace is decided on client not server (see #43667)

Does this PR introduce any user-facing change?

No

How was this patch tested?

Hand testing

Was this patch authored or co-authored using generative AI tooling?

No

…it wouldn't fall into infinite recursion" This reverts commit c9df53f.

cdkrot · 2023-12-06T12:52:43Z

FYI @heyihong. @grundprinzip proposed to take that change we discussed back and change to my original version

heyihong · 2023-12-06T13:07:03Z

@cdkrot I don't have strong opinion on either approach. But we probably need to implement similar logics for Scala client as well https://github.com/apache/spark/blob/master/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala#L339-L347 (not necessarily in this pr)

grundprinzip · 2023-12-06T16:45:37Z

python/pyspark/sql/tests/connect/client/test_client.py

        client = SparkConnectClient(chan)
        self.assertEqual(client._session_id, chan.session_id)

+    def test_forbid_recursion(self):


this test does not test directly the scenario, we're talking about. Ideally you can just use the mock tests we have to fail any query and see that the recursion guard works.

I actually tried, but it seems hard to make the mock test for this because it needs to pass through this pieces of code:

spark/python/pyspark/sql/connect/client/core.py

Line 1545 in 348d881

status = rpc_status.from_call(cast(grpc.Call, rpc_error))

This seems hard to create a mock exception which would pass this without poking grpc's internals significantly. Alternatively we could introduce some testing clutches here, i.e. check if exception is from testing code, but that's not great either.

@grundprinzip

python/pyspark/sql/connect/client/core.py

grundprinzip

I requested changes to the code that I believe make it more readable and require less changes in general. Explicitly, I think we should use a simpler flag mechanisms to track if we're processing an exception or not instead of adding a new class.

cdkrot · 2023-12-06T17:20:54Z

changed based on @grundprinzip request

python/pyspark/sql/connect/client/core.py

cdkrot · 2023-12-07T09:50:26Z

removed

cdkrot · 2023-12-07T10:43:31Z

cc @HyukjinKwon

grundprinzip · 2023-12-07T11:13:39Z

Please update PR title

heyihong · 2023-12-07T11:26:37Z

python/pyspark/sql/connect/client/core.py

+            raise error
+
+        try:
+            self._inside_error_handling = True


I know Python has Global Interpreter Lock but is this thread-safe?

grundprinzip · 2023-12-07T11:27:26Z

Dang, I wanted to leave a comment on that. It's not thread safe.

cdkrot · 2023-12-07T11:36:27Z

Ah, I'm a simple person, and was simply following your review. Original proposal was thread safe.

cdkrot · 2023-12-07T12:08:51Z

updated title; added threadlocals

cdkrot · 2023-12-15T09:58:05Z

It seems we forgot to merge this because there were some unrelated test fails again. I retriggered those and test pass.

HyukjinKwon · 2023-12-19T01:24:48Z

Merged to master.

…threadlocal ### What changes were proposed in this pull request? This PR changes the `thread.local` in `SparkConnectClient` to be used properly to fix the bug caused by #44210. It mistakenly used `thread.local` wrongly by inheriting `thread.local` and setting the class-level variables which always exist. ### Why are the changes needed? So users can properly use thread-based `interruptTag`. Now the code below cancels both queries: ```python import concurrent.futures import time import threading from pyspark.sql.functions import udf def run_query_with_tag(query, tag): try: spark.addTag(tag) print(f"starting query {tag}") df = spark.sql(query).select(udf(lambda: time.sleep(10))()) print(f"collecting query {tag}") res = df.collect() print(f"done with query {tag}") finally: spark.removeTag(tag) queries_with_tags = [ ("SELECT * FROM range(100)", "tag1"), ("SELECT * FROM range(100)", "tag2"), ] with concurrent.futures.ThreadPoolExecutor() as executor: futures = {executor.submit(run_query_with_tag, query, tag): (query, tag) for query, tag in queries_with_tags} time.sleep(5) print("Interrupting tag1") print(spark.interruptTag("tag1")) for f in futures: try: f.result() print(f"done with {f.result()}") except: print(f"failed with {f.exception()}") ``` ### Does this PR introduce _any_ user-facing change? No, this was caused by #44210 but the change has not been released out. ### How was this patch tested? Unittest was added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47005 from HyukjinKwon/thread-local. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

cdkrot added 5 commits December 4, 2023 03:38

test

285b85c

minor

eaa16e6

test coverage

8b707fd

Merge branch 'master' into forbid_recursive_error_handling_2

3565443

Revert "[SPARK-46241][PYTHON][CONNECT] Fix error handling routine so …

8ed9134

…it wouldn't fall into infinite recursion" This reverts commit c9df53f.

github-actions bot added SQL PYTHON CONNECT labels Dec 6, 2023

heyihong approved these changes Dec 6, 2023

View reviewed changes

grundprinzip reviewed Dec 6, 2023

View reviewed changes

python/pyspark/sql/connect/client/core.py Show resolved Hide resolved

slight rewrite

a9433bd

grundprinzip requested changes Dec 6, 2023

View reviewed changes

changed to be martin way

c7f2130

grundprinzip reviewed Dec 6, 2023

View reviewed changes

python/pyspark/sql/connect/client/core.py Outdated Show resolved Hide resolved

grundprinzip reviewed Dec 6, 2023

View reviewed changes

python/pyspark/sql/connect/client/core.py Outdated Show resolved Hide resolved

remove

e4f9280

cdkrot requested a review from grundprinzip December 7, 2023 09:50

grundprinzip approved these changes Dec 7, 2023

View reviewed changes

heyihong reviewed Dec 7, 2023

View reviewed changes

thread locals

8220c46

cdkrot changed the title ~~[SPARK-TBD] Forbid recursive error handling 2~~ [SPARK-TBD] Forbid recursive error handling by adding recursion guards Dec 7, 2023

cdkrot changed the title ~~[SPARK-TBD] Forbid recursive error handling by adding recursion guards~~ [SPARK-46308] Forbid recursive error handling by adding recursion guards Dec 7, 2023

whitespace

e0dfa86

HyukjinKwon approved these changes Dec 19, 2023

View reviewed changes

HyukjinKwon closed this in 8cd4661 Dec 19, 2023

HyukjinKwon mentioned this pull request Jun 18, 2024

[SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal #47005

Closed

[SPARK-46308] Forbid recursive error handling by adding recursion guards #44210

[SPARK-46308] Forbid recursive error handling by adding recursion guards #44210

Uh oh!

Conversation

cdkrot commented Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cdkrot commented Dec 6, 2023

Uh oh!

heyihong commented Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grundprinzip Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

cdkrot Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

grundprinzip left a comment

Choose a reason for hiding this comment

Uh oh!

cdkrot commented Dec 6, 2023

Uh oh!

Uh oh!

Uh oh!

cdkrot commented Dec 7, 2023

Uh oh!

cdkrot commented Dec 7, 2023

Uh oh!

grundprinzip commented Dec 7, 2023

Uh oh!

heyihong Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grundprinzip commented Dec 7, 2023

Uh oh!

cdkrot commented Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdkrot commented Dec 7, 2023

Uh oh!

cdkrot commented Dec 15, 2023

Uh oh!

HyukjinKwon commented Dec 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cdkrot commented Dec 6, 2023 •

edited

Loading

heyihong commented Dec 6, 2023 •

edited

Loading

cdkrot Dec 6, 2023 •

edited

Loading

heyihong Dec 7, 2023 •

edited

Loading

cdkrot commented Dec 7, 2023 •

edited

Loading