Skip to content

Conversation

@itholic
Copy link
Contributor

@itholic itholic commented Aug 12, 2024

What changes were proposed in this pull request?

This PR proposes to support from_protobuf and to_protobuf for SQL functions

Similar to #46977

Why are the changes needed?

For improving feature parity with DataFrame API

Does this PR introduce any user-facing change?

This enables from_protobuf and to_protobuf from SQL functions

How was this patch tested?

Added UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Aug 12, 2024
@itholic itholic marked this pull request as ready for review August 20, 2024 00:15
@itholic itholic changed the title [WIP][SPARK-49121][SQL] Support from_protobuf and to_protobuf for SQL functions [SPARK-49121][SQL] Support from_protobuf and to_protobuf for SQL functions Aug 20, 2024
@itholic
Copy link
Contributor Author

itholic commented Aug 20, 2024

cc @HyukjinKwon @hvanhovell I think now it is ready for review. Please take a look when you find some time :-)

Also cc @cloud-fan FYI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
"The .. argument of .. SQL function must be a (constant string).."

Those 6 error messages might be able to reuse a template string and/or a centralized mapping for clarity. Just a thought for minor improvement no need to be in this pr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let me address it from follow-up PR since we have more similar cases from from_avro and to_avro, and I plan to remove those duplications from follow-ups

@HyukjinKwon
Copy link
Member

Let's also fix the test failure:

[info] - SPARK-49121: from_protobuf and to_protobuf SQL functions *** FAILED *** (622 milliseconds)
[info]   152 did not equal 153 Invalid stopIndex of a query context. Actual:SQLQueryContext(Some(2),Some(7),Some(8),Some(152),Some(
[info]   SELECT from_protobuf(protobuf_data, 42, '/home/runner/work/spark/spark/connector/protobuf/target/generated-test-sources/descriptor-set-sbt.desc', map())
[info]   FROM (
[info]   SELECT
[info]     to_protobuf(
[info]       complex_struct, 'SimpleMessageJavaTypes', '/home/runner/work/spark/spark/connector/protobuf/target/generated-test-sources/descriptor-set-sbt.desc', map()
[info]     ) AS protobuf_data
[info]   FROM protobuf_test_table
[info]   )
[info]   ),None,None) (SparkFunSuite.scala:376)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.SparkFunSuite.$anonfun$checkError$4(SparkFunSuite.scala:376)
[info]   at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323)
[info]   at org.apache.spark.SparkFunSuite.checkError(SparkFunSuite.scala:366)
[info]   at org.apache.spark.sql.protobuf.ProtobufFunctionsSuite.$anonfun$new$151(ProtobufFunctionsSuite.scala:2159)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[info]   at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info]   at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withTable(SQLTestUtils.scala:308)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withTable$(SQLTestUtils.scala:306)
[info]   at org.apache.spark.sql.protobuf.ProtobufFunctionsSuite.withTable(ProtobufFunctionsSuite.scala:37)
[info]   at org.apache.spark.sql.protobuf.ProtobufFunctionsSuite.$anonfun$new$150(ProtobufFunctionsSuite.scala:2046)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)

@HyukjinKwon
Copy link
Member

[info] - SPARK-49121: from_protobuf and to_protobuf SQL functions *** FAILED *** (607 milliseconds)
[info]   173 did not equal 174 Invalid stopIndex of a query context. Actual:SQLQueryContext(Some(3),Some(2),Some(10),Some(173),Some(
[info]   SELECT
[info]     from_protobuf(protobuf_data, 'SimpleMessageJavaTypes', '/home/runner/work/spark/spark/connector/protobuf/target/generated-test-sources/descriptor-set-sbt.desc', 42)
[info]   FROM (
[info]   SELECT
[info]     to_protobuf(
[info]       complex_struct, 'SimpleMessageJavaTypes', '/home/runner/work/spark/spark/connector/protobuf/target/generated-test-sources/descriptor-set-sbt.desc', map()
[info]     ) AS protobuf_data
[info]   FROM protobuf_test_table
[info]   )
[info]   ),None,None) (SparkFunSuite.scala:376)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.SparkFunSuite.$anonfun$checkError$4(SparkFunSuite.scala:376)
[info]   at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323)
[info]   at org.apache.spark.SparkFunSuite.checkError(SparkFunSuite.scala:366)
[info]   at org.apache.spark.sql.protobuf.ProtobufFunctionsSuite.$anonfun$new$151(ProtobufFunctionsSuite.scala:2196)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[info]   at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info]   at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
[info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withTable(SQLTestUtils.scala:308)
[info]   at org.apache.spark.sql.test.SQLTestUtilsBase.withTable$(SQLTestUtils.scala:306)

@HyukjinKwon
Copy link
Member

[info] - SPARK-49121: from_protobuf and to_protobuf SQL functions *** FAILED *** (616 milliseconds)
[info]   "...r-set-sbt.desc', 42)[]" did not equal "...r-set-sbt.desc', 42)[)]" Invalid fragment of a query context. Actual:SQLQueryContext(Some(3),Some(2),Some(10),Some(173),Some(
[info]   SELECT
[info]     from_protobuf(protobuf_data, 'SimpleMessageJavaTypes', '/home/runner/work/spark/spark/connector/protobuf/target/generated-test-sources/descriptor-set-sbt.desc', 42)
[info]   FROM (
[info]   SELECT
[info]     to_protobuf(
[info]       complex_struct, 'SimpleMessageJavaTypes', '/home/runner/work/spark/spark/connector/protobuf/target/generated-test-sources/descriptor-set-sbt.desc', map()
[info]     ) AS protobuf_data
[info]   FROM protobuf_test_table
[info]   )
[info]   ),None,None) (SparkFunSuite.scala:378)
[info]   Analysis:
[info]   "...r-set-sbt.desc', 42)[]" -> "...r-set-sbt.desc', 42)[)]"
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.SparkFunSuite.$anonfun$checkError$4(SparkFunSuite.scala:378)
[info]   at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323)
[info]   at org.apache.spark.SparkFunSuite.checkError(SparkFunSuite.scala:366)
[info]   at org.apache.spark.sql.protobuf.ProtobufFunctionsSuite.$anonfun$new$151(ProtobufFunctionsSuite.scala:2196)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[info]   at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
[info]   at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)

@HyukjinKwon
Copy link
Member

Merged to master.

}
}

test("SPARK-49121: from_protobuf and to_protobuf SQL functions") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic It seems that this test cannot be successfully tested in the Maven daily test, both Java 17 and Java 21:

image

also cc @HyukjinKwon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic mind taking a look please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic @HyukjinKwon Attempting to fix: #47855

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants