Skip to content

Conversation

@yaooqinn
Copy link
Member

What changes were proposed in this pull request?

In SPARK-47911, we introduced a universal BinaryFormatter to make binary output consistent
across all clients, such as beeline, spark-sql, and spark-shell, for both primitive and nested binaries.

But unfortunately, to_csv and csv writer have interceptors for binary output which is hard-coded to use SparkStringUtils.getHexString. In this PR we make it also configurable.

Why are the changes needed?

feature parity

Does this PR introduce any user-facing change?

Yes, we have make spark.sql.binaryOutputStyle work for csv but the AS-IS behavior is kept.

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Jun 12, 2024
@yaooqinn
Copy link
Member Author

@yaooqinn yaooqinn closed this in ea2bca7 Jun 13, 2024
@yaooqinn yaooqinn deleted the SPARK-48602 branch June 13, 2024 03:51
@yaooqinn
Copy link
Member Author

Merged to master, thank you @ulysses-you

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants