Skip to content

Conversation

@heyihong
Copy link
Contributor

@heyihong heyihong commented Sep 8, 2025

What changes were proposed in this pull request?

This PR fixes temporal value conversion issues in the LiteralValueProtoConverter for Spark Connect. The main changes include:

  1. Fixed temporal value conversion in getConverter method: Updated the conversion logic for temporal data types (DATE, TIMESTAMP, TIMESTAMP_NTZ, DAY_TIME_INTERVAL, YEAR_MONTH_INTERVAL, TIME) to use proper utility methods from SparkDateTimeUtils and SparkIntervalUtils instead of directly returning raw protobuf values.

  2. Added comprehensive test coverage: Extended the PlanGenerationTestSuite with a new test case that includes a tuple containing all temporal types to ensure proper conversion and serialization.

  3. Updated test expectations: Modified the expected explain output and query test files to reflect the corrected temporal value handling.

Why are the changes needed?

The struct type in typedlit doesn't work well with temporal values due to bugs in type conversions. For example, the code below fails:

import org.apache.spark.sql.functions.typedlit

spark.sql("select 1").select(typedlit((1, java.time.LocalDate.of(2020, 10, 10)))).collect()
"""
org.apache.spark.SparkIllegalArgumentException: The value (18545) of the type (java.lang.Integer) cannot be converted to the DATE type.
  org.apache.spark.sql.catalyst.CatalystTypeConverters$DateConverter$.toCatalystImpl(CatalystTypeConverters.scala:356)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$DateConverter$.toCatalystImpl(CatalystTypeConverters.scala:347)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:110)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:271)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:110)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:532)
  org.apache.spark.sql.connect.planner.LiteralExpressionProtoConverter$.toCatalystExpression(LiteralExpressionProtoConverter.scala:116)
"""

Does this PR introduce any user-facing change?

Yes. This PR fixes temporal value conversion in LiteralValueProtoConverter, allowing the struct type in typedlit to work with temporal values.

How was this patch tested?

build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite"
build/sbt "connect/testOnly org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite"

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.5.11

@heyihong
Copy link
Contributor Author

heyihong commented Sep 8, 2025

@heyihong heyihong changed the title [SPARK-53524] Fix temporal value conversion in LiteralValueProtoConverter [SPARK-53524][CONNECT][SQL] Fix temporal value conversion in LiteralValueProtoConverter Sep 8, 2025
Copy link
Contributor

@cloud-fan cloud-fan Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a bit more context to understand it better. So at the client side, fn.typedLit takes java classes like LocalDate, and convert it to int before putting it into the protobuf message. At the server side, v.getDate returns int and we need to convert it back to the LocalDate.

This looks a bit messy to me. If the client side already knows how to convert external java objects to internal catalyst values, why does server need this conversion?

Copy link
Contributor Author

@heyihong heyihong Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I am not sure if the client side knows how to convert external Java objects into internal Catalyst values. The conversion occurs here in LiteralValueProtoConverter and the server just converts the Proto value back to the Java object.

This code path is also needed in the Scala client, since the client may need to convert temporal values from proto observed metrics received from the server into Java objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rename def getConverter to getScalaConverter

@heyihong heyihong requested a review from cloud-fan September 10, 2025 16:05
@cloud-fan
Copy link
Contributor

cloud-fan commented Sep 11, 2025

thanks, merging to master!

@cloud-fan cloud-fan closed this in 1817e67 Sep 11, 2025
@cloud-fan
Copy link
Contributor

@heyihong can you open a backport PR for 4.0?

huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…alueProtoConverter

### What changes were proposed in this pull request?

This PR fixes temporal value conversion issues in the `LiteralValueProtoConverter` for Spark Connect. The main changes include:

1. **Fixed temporal value conversion in `getConverter` method**: Updated the conversion logic for temporal data types (DATE, TIMESTAMP, TIMESTAMP_NTZ, DAY_TIME_INTERVAL, YEAR_MONTH_INTERVAL, TIME) to use proper utility methods from `SparkDateTimeUtils` and `SparkIntervalUtils` instead of directly returning raw protobuf values.

2. **Added comprehensive test coverage**: Extended the `PlanGenerationTestSuite` with a new test case that includes a tuple containing all temporal types to ensure proper conversion and serialization.

3. **Updated test expectations**: Modified the expected explain output and query test files to reflect the corrected temporal value handling.

### Why are the changes needed?

The struct type in typedlit doesn't work well with temporal values due to bugs in type conversions. For example, the code below fails:

```scala
import org.apache.spark.sql.functions.typedlit

spark.sql("select 1").select(typedlit((1, java.time.LocalDate.of(2020, 10, 10)))).collect()
"""
org.apache.spark.SparkIllegalArgumentException: The value (18545) of the type (java.lang.Integer) cannot be converted to the DATE type.
  org.apache.spark.sql.catalyst.CatalystTypeConverters$DateConverter$.toCatalystImpl(CatalystTypeConverters.scala:356)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$DateConverter$.toCatalystImpl(CatalystTypeConverters.scala:347)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:110)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:271)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:110)
  org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:532)
  org.apache.spark.sql.connect.planner.LiteralExpressionProtoConverter$.toCatalystExpression(LiteralExpressionProtoConverter.scala:116)
"""
```

### Does this PR introduce _any_ user-facing change?

**Yes.** This PR fixes temporal value conversion in LiteralValueProtoConverter, allowing the struct type in typedlit to work with temporal values.

### How was this patch tested?

`build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite"`
`build/sbt "connect/testOnly org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite"`

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.5.11

Closes apache#52270 from heyihong/SPARK-53524.

Authored-by: Yihong He <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants