-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-53524][CONNECT][SQL] Fix temporal value conversion in LiteralValueProtoConverter #52270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need a bit more context to understand it better. So at the client side, fn.typedLit takes java classes like LocalDate, and convert it to int before putting it into the protobuf message. At the server side, v.getDate returns int and we need to convert it back to the LocalDate.
This looks a bit messy to me. If the client side already knows how to convert external java objects to internal catalyst values, why does server need this conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I am not sure if the client side knows how to convert external Java objects into internal Catalyst values. The conversion occurs here in LiteralValueProtoConverter and the server just converts the Proto value back to the Java object.
This code path is also needed in the Scala client, since the client may need to convert temporal values from proto observed metrics received from the server into Java objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's rename def getConverter to getScalaConverter
bac37a9 to
315850d
Compare
|
thanks, merging to master! |
|
@heyihong can you open a backport PR for 4.0? |
…alueProtoConverter
### What changes were proposed in this pull request?
This PR fixes temporal value conversion issues in the `LiteralValueProtoConverter` for Spark Connect. The main changes include:
1. **Fixed temporal value conversion in `getConverter` method**: Updated the conversion logic for temporal data types (DATE, TIMESTAMP, TIMESTAMP_NTZ, DAY_TIME_INTERVAL, YEAR_MONTH_INTERVAL, TIME) to use proper utility methods from `SparkDateTimeUtils` and `SparkIntervalUtils` instead of directly returning raw protobuf values.
2. **Added comprehensive test coverage**: Extended the `PlanGenerationTestSuite` with a new test case that includes a tuple containing all temporal types to ensure proper conversion and serialization.
3. **Updated test expectations**: Modified the expected explain output and query test files to reflect the corrected temporal value handling.
### Why are the changes needed?
The struct type in typedlit doesn't work well with temporal values due to bugs in type conversions. For example, the code below fails:
```scala
import org.apache.spark.sql.functions.typedlit
spark.sql("select 1").select(typedlit((1, java.time.LocalDate.of(2020, 10, 10)))).collect()
"""
org.apache.spark.SparkIllegalArgumentException: The value (18545) of the type (java.lang.Integer) cannot be converted to the DATE type.
org.apache.spark.sql.catalyst.CatalystTypeConverters$DateConverter$.toCatalystImpl(CatalystTypeConverters.scala:356)
org.apache.spark.sql.catalyst.CatalystTypeConverters$DateConverter$.toCatalystImpl(CatalystTypeConverters.scala:347)
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:110)
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:271)
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:110)
org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:532)
org.apache.spark.sql.connect.planner.LiteralExpressionProtoConverter$.toCatalystExpression(LiteralExpressionProtoConverter.scala:116)
"""
```
### Does this PR introduce _any_ user-facing change?
**Yes.** This PR fixes temporal value conversion in LiteralValueProtoConverter, allowing the struct type in typedlit to work with temporal values.
### How was this patch tested?
`build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite"`
`build/sbt "connect/testOnly org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite"`
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.5.11
Closes apache#52270 from heyihong/SPARK-53524.
Authored-by: Yihong He <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR fixes temporal value conversion issues in the
LiteralValueProtoConverterfor Spark Connect. The main changes include:Fixed temporal value conversion in
getConvertermethod: Updated the conversion logic for temporal data types (DATE, TIMESTAMP, TIMESTAMP_NTZ, DAY_TIME_INTERVAL, YEAR_MONTH_INTERVAL, TIME) to use proper utility methods fromSparkDateTimeUtilsandSparkIntervalUtilsinstead of directly returning raw protobuf values.Added comprehensive test coverage: Extended the
PlanGenerationTestSuitewith a new test case that includes a tuple containing all temporal types to ensure proper conversion and serialization.Updated test expectations: Modified the expected explain output and query test files to reflect the corrected temporal value handling.
Why are the changes needed?
The struct type in typedlit doesn't work well with temporal values due to bugs in type conversions. For example, the code below fails:
Does this PR introduce any user-facing change?
Yes. This PR fixes temporal value conversion in LiteralValueProtoConverter, allowing the struct type in typedlit to work with temporal values.
How was this patch tested?
build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite"build/sbt "connect/testOnly org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite"Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.5.11