Skip to content

Conversation

@dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Jun 13, 2024

What changes were proposed in this pull request?

This PR creates two new SQL functions "to_avro" and "from_avro" to match existing DataFrame equivalents.

For example:

sql(
  """
    |create table t as
    |  select named_struct('u', named_struct('member0', member0, 'member1', member1)) as s
    |  from values (1, null), (null,  'a') tab(member0, member1)
    |""".stripMargin)

val jsonFormatSchema =
  """
    |{
    |  "type": "record",
    |  "name": "struct",
    |  "fields": [{
    |    "name": "u",
    |    "type": ["int","string"]
    |  }]
    |}
    |""".stripMargin

spark.sql(
  s"""
    |select from_avro(result, '$jsonFormatSchema', map()).u from (
    |  select to_avro(s, '$jsonFormatSchema') as result from t
    |)")
  .collect()

> {1, NULL}
  {NULL, "a"}

Why are the changes needed?

This brings parity between SQL and DataFrame APIs in Apache Spark.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

This PR adds extra unit tests, and I also checked that the functions work with spark-shell.

Was this patch authored or co-authored using generative AI tooling?

No GitHub copilot usage this time

commit

commit

commit

commit

commit

commit
@dtenedor dtenedor requested a review from allisonwang-db June 15, 2024 00:10
@dtenedor
Copy link
Contributor Author

Thanks @allisonwang-db for your review, followed through on your comments.

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! cc @cloud-fan

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI passes

@dtenedor
Copy link
Contributor Author

cc @cloud-fan the CI is passing now :)

@gengliangwang
Copy link
Member

Thanks, merging to master

HyukjinKwon pushed a commit that referenced this pull request Jun 23, 2024
…nd from_avro functions but Avro is not loaded by default

### What changes were proposed in this pull request?

This PR updates the new `to_avro` and `from_avro` SQL functions added in #46977 to return reasonable errors when Avro is not loaded by default.

### Why are the changes needed?

According to the [Apache Spark Avro Data Source Guide](https://spark.apache.org/docs/latest/sql-data-sources-avro.html), Avro is not loaded into Spark by default. With this change, users get reasonable error messages if they try to call the `to_avro` or `from_avro` SQL functions in this case with instructions telling them what to do, rather than obscure Java `ClassNotFoundException`s.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds golden file based test coverage.

### Was this patch authored or co-authored using generative AI tooling?

No GitHub copilot this time.

Closes #47063 from dtenedor/to-from-avro-error-not-loaded.

Authored-by: Daniel Tenedorio <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Aug 22, 2024
…functions

### What changes were proposed in this pull request?

This PR proposes to support `from_protobuf` and `to_protobuf` for SQL functions

Similar to #46977

### Why are the changes needed?

For improving feature parity with DataFrame API

### Does this PR introduce _any_ user-facing change?

This enables `from_protobuf` and `to_protobuf` from SQL functions

### How was this patch tested?

Added UTs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47716 from itholic/from_to_protobuf.

Authored-by: Haejoon Lee <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants