[SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal #47005

HyukjinKwon · 2024-06-18T03:41:46Z

What changes were proposed in this pull request?

This PR changes the thread.local in SparkConnectClient to be used properly to fix the bug caused by #44210. It mistakenly used thread.local wrongly by inheriting thread.local and setting the class-level variables which always exist.

Why are the changes needed?

So users can properly use thread-based interruptTag. Now the code below cancels both queries:

import concurrent.futures
import time
import threading
from pyspark.sql.functions import udf

def run_query_with_tag(query, tag):
    try:
        spark.addTag(tag)
        print(f"starting query {tag}")
        df = spark.sql(query).select(udf(lambda: time.sleep(10))())
        print(f"collecting query {tag}")
        res = df.collect()
        print(f"done with query {tag}")
    finally:
        spark.removeTag(tag)

queries_with_tags = [
    ("SELECT * FROM range(100)", "tag1"),
    ("SELECT * FROM range(100)", "tag2"),
]

with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = {executor.submit(run_query_with_tag, query, tag): (query, tag) for query, tag in queries_with_tags}
    time.sleep(5)
    print("Interrupting tag1")
    print(spark.interruptTag("tag1"))
    for f in futures:
        try:
            f.result()
            print(f"done with {f.result()}")
        except:
            print(f"failed with {f.exception()}")

Does this PR introduce any user-facing change?

No, this was caused by #44210 but the change has not been released out.

How was this patch tested?

Unittest was added.

Was this patch authored or co-authored using generative AI tooling?

No.

HyukjinKwon · 2024-06-18T05:11:57Z

Merged to master.

Make tags properly threadlocal

05bcf4c

github-actions bot added SQL PYTHON CONNECT labels Jun 18, 2024

grundprinzip approved these changes Jun 18, 2024

View reviewed changes

HyukjinKwon closed this in 738acd1 Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal #47005

[SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal #47005

Uh oh!

HyukjinKwon commented Jun 18, 2024 •

edited

Loading

Uh oh!

HyukjinKwon commented Jun 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal #47005

[SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal #47005

Uh oh!

Conversation

HyukjinKwon commented Jun 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HyukjinKwon commented Jun 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HyukjinKwon commented Jun 18, 2024 •

edited

Loading