Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

What changes were proposed in this pull request?

For case

withTempDir { dir =>
      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
        withTable("test_precision") {
          val df = sql("SELECT 'dummy' AS name, 1000000000000000000010.7000000000000010 AS value")
          df.write.mode("Overwrite").parquet(dir.getAbsolutePath)
          sql(
            s"""
               |CREATE EXTERNAL TABLE test_precision(name STRING, value DECIMAL(18,6))
               |STORED AS PARQUET LOCATION '${dir.getAbsolutePath}'
               |""".stripMargin)
          checkAnswer(sql("SELECT * FROM test_precision"), Row("dummy", null))
        }
      }
    }

We write a data with schema

It's caused by you create a df with

root
 |-- name: string (nullable = false)
 |-- value: decimal(38,16) (nullable = false)

but create table schema

root
 |-- name: string (nullable = false)
 |-- value: decimal(18,6) (nullable = false)

This will cause enforcePrecisionScale return null

  public HiveDecimal getPrimitiveJavaObject(Object o) {
    return o == null ? null : this.enforcePrecisionScale(((HiveDecimalWritable)o).getHiveDecimal());
  }

Then throw NPE when call toCatalystDecimal

We should judge if the return value is null to avoid throw NPE

Why are the changes needed?

Fix bug

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

@github-actions github-actions bot added the SQL label Nov 8, 2021
@AngersZhuuuu
Copy link
Contributor Author

ping @dongjoon-hyun @cloud-fan

@SparkQA
Copy link

SparkQA commented Nov 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49470/

@SparkQA
Copy link

SparkQA commented Nov 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49469/

@SparkQA
Copy link

SparkQA commented Nov 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49470/

@SparkQA
Copy link

SparkQA commented Nov 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49469/

|CREATE EXTERNAL TABLE test_precision(name STRING, value DECIMAL(18,6))
|STORED AS PARQUET LOCATION '${dir.getAbsolutePath}'
|""".stripMargin)
checkAnswer(sql("SELECT * FROM test_precision"), Row("dummy", null))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior of builtin file source tables? do we also return null?

Copy link
Contributor

@cloud-fan cloud-fan Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what's the behavior if we do it purely in Hive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior of builtin file source tables? do we also return null?

Hmmm, non vectorized parquet reader return null as well, vectorized reader throw below exception

[info]   Cause: org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file file:///Users/yi.zhu/Documents/project/Angerszhuuuu/spark/sql/hive/target/tmp/hive_execution_test_group/spark-628e3c21-15ee-4473-b207-60a530ced804/part-00000-3f384838-d11f-4c3b-83a1-396efad6df79-c000.snappy.parquet. Column: [value], Expected: decimal(18,6), Found: FIXED_LEN_BYTE_ARRAY
[info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:635)
[info]   at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:195)
[info]   at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:104)
[info]   at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:531)
[info]   at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(generated.java:29)
[info]   at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:42)
[info]   at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info]   at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
[info]   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
[info]   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
[info]   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1895)
[info]   at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1274)
[info]   at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1274)
[info]   at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2267)
[info]   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[info]   at org.apache.spark.scheduler.Task.run(Task.scala:136)
[info]   at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507)
[info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)
[info]   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)
[info]   Cause: org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException:
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.constructConvertNotSupportedException(ParquetVectorUpdaterFactory.java:1079)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.getUpdater(ParquetVectorUpdaterFactory.java:174)
[info]   at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:154)
[info]   at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:296)
[info]   at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:194)
[info]   at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
[info]   at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:104)
[info]   at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:191)
[info]   at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:104)
[info]   at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:531)
[info]   at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(generated.java:29)
[info]   at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:42)
[info]   at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info]   at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
[info]   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
[info]   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
[info]   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1895)
[info]   at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1274)
[info]   at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1274)
[info]   at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2267)
[info]   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[info]   at org.apache.spark.scheduler.Task.run(Task.scala:136)
[info]   at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507)
[info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)
[info]   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive return NULL too

image

@SparkQA
Copy link

SparkQA commented Nov 8, 2021

Test build #144998 has finished for PR 34519 at commit 2da2b49.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

withTempDir { dir =>
withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
withTable("test_precision") {
val df = sql("SELECT 'dummy' AS name, 1000000000000000000010.7000000000000010 AS value")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use CAST(1.2 AS DECIMAL(38, 16)) instead of writing a long decimal literal?

Copy link
Contributor Author

@AngersZhuuuu AngersZhuuuu Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAST(1.2 AS DECIMAL(38, 16))

Can't, it's an enforce convert.

@SparkQA
Copy link

SparkQA commented Nov 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49478/

@SparkQA
Copy link

SparkQA commented Nov 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49478/

@SparkQA
Copy link

SparkQA commented Nov 8, 2021

Test build #145005 has finished for PR 34519 at commit 20048fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

dongjoon-hyun pushed a commit that referenced this pull request Nov 8, 2021
### What changes were proposed in this pull request?
For case
```
withTempDir { dir =>
      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
        withTable("test_precision") {
          val df = sql("SELECT 'dummy' AS name, 1000000000000000000010.7000000000000010 AS value")
          df.write.mode("Overwrite").parquet(dir.getAbsolutePath)
          sql(
            s"""
               |CREATE EXTERNAL TABLE test_precision(name STRING, value DECIMAL(18,6))
               |STORED AS PARQUET LOCATION '${dir.getAbsolutePath}'
               |""".stripMargin)
          checkAnswer(sql("SELECT * FROM test_precision"), Row("dummy", null))
        }
      }
    }
```

We write a data with schema

It's caused by you create a df with
```
root
 |-- name: string (nullable = false)
 |-- value: decimal(38,16) (nullable = false)
```
but create table schema

```
root
 |-- name: string (nullable = false)
 |-- value: decimal(18,6) (nullable = false)
```

This will cause enforcePrecisionScale return `null`
```
  public HiveDecimal getPrimitiveJavaObject(Object o) {
    return o == null ? null : this.enforcePrecisionScale(((HiveDecimalWritable)o).getHiveDecimal());
  }
```
Then throw NPE when call `toCatalystDecimal `

We should judge if the return value is `null` to avoid throw NPE

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes #34519 from AngersZhuuuu/SPARK-37196.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit a4f8ffb)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Nov 8, 2021
For case
```
withTempDir { dir =>
      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
        withTable("test_precision") {
          val df = sql("SELECT 'dummy' AS name, 1000000000000000000010.7000000000000010 AS value")
          df.write.mode("Overwrite").parquet(dir.getAbsolutePath)
          sql(
            s"""
               |CREATE EXTERNAL TABLE test_precision(name STRING, value DECIMAL(18,6))
               |STORED AS PARQUET LOCATION '${dir.getAbsolutePath}'
               |""".stripMargin)
          checkAnswer(sql("SELECT * FROM test_precision"), Row("dummy", null))
        }
      }
    }
```

We write a data with schema

It's caused by you create a df with
```
root
 |-- name: string (nullable = false)
 |-- value: decimal(38,16) (nullable = false)
```
but create table schema

```
root
 |-- name: string (nullable = false)
 |-- value: decimal(18,6) (nullable = false)
```

This will cause enforcePrecisionScale return `null`
```
  public HiveDecimal getPrimitiveJavaObject(Object o) {
    return o == null ? null : this.enforcePrecisionScale(((HiveDecimalWritable)o).getHiveDecimal());
  }
```
Then throw NPE when call `toCatalystDecimal `

We should judge if the return value is `null` to avoid throw NPE

Fix bug

No

Added UT

Closes #34519 from AngersZhuuuu/SPARK-37196.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit a4f8ffb)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Merged to master/3.2/3.1/3.0. Thank you, @AngersZhuuuu and @cloud-fan .

dongjoon-hyun pushed a commit that referenced this pull request Nov 8, 2021
For case
```
withTempDir { dir =>
      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
        withTable("test_precision") {
          val df = sql("SELECT 'dummy' AS name, 1000000000000000000010.7000000000000010 AS value")
          df.write.mode("Overwrite").parquet(dir.getAbsolutePath)
          sql(
            s"""
               |CREATE EXTERNAL TABLE test_precision(name STRING, value DECIMAL(18,6))
               |STORED AS PARQUET LOCATION '${dir.getAbsolutePath}'
               |""".stripMargin)
          checkAnswer(sql("SELECT * FROM test_precision"), Row("dummy", null))
        }
      }
    }
```

We write a data with schema

It's caused by you create a df with
```
root
 |-- name: string (nullable = false)
 |-- value: decimal(38,16) (nullable = false)
```
but create table schema

```
root
 |-- name: string (nullable = false)
 |-- value: decimal(18,6) (nullable = false)
```

This will cause enforcePrecisionScale return `null`
```
  public HiveDecimal getPrimitiveJavaObject(Object o) {
    return o == null ? null : this.enforcePrecisionScale(((HiveDecimalWritable)o).getHiveDecimal());
  }
```
Then throw NPE when call `toCatalystDecimal `

We should judge if the return value is `null` to avoid throw NPE

Fix bug

No

Added UT

Closes #34519 from AngersZhuuuu/SPARK-37196.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit a4f8ffb)
Signed-off-by: Dongjoon Hyun <[email protected]>
sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
### What changes were proposed in this pull request?
For case
```
withTempDir { dir =>
      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
        withTable("test_precision") {
          val df = sql("SELECT 'dummy' AS name, 1000000000000000000010.7000000000000010 AS value")
          df.write.mode("Overwrite").parquet(dir.getAbsolutePath)
          sql(
            s"""
               |CREATE EXTERNAL TABLE test_precision(name STRING, value DECIMAL(18,6))
               |STORED AS PARQUET LOCATION '${dir.getAbsolutePath}'
               |""".stripMargin)
          checkAnswer(sql("SELECT * FROM test_precision"), Row("dummy", null))
        }
      }
    }
```

We write a data with schema

It's caused by you create a df with
```
root
 |-- name: string (nullable = false)
 |-- value: decimal(38,16) (nullable = false)
```
but create table schema

```
root
 |-- name: string (nullable = false)
 |-- value: decimal(18,6) (nullable = false)
```

This will cause enforcePrecisionScale return `null`
```
  public HiveDecimal getPrimitiveJavaObject(Object o) {
    return o == null ? null : this.enforcePrecisionScale(((HiveDecimalWritable)o).getHiveDecimal());
  }
```
Then throw NPE when call `toCatalystDecimal `

We should judge if the return value is `null` to avoid throw NPE

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes apache#34519 from AngersZhuuuu/SPARK-37196.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit a4f8ffb)
Signed-off-by: Dongjoon Hyun <[email protected]>
fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
For case
```
withTempDir { dir =>
      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
        withTable("test_precision") {
          val df = sql("SELECT 'dummy' AS name, 1000000000000000000010.7000000000000010 AS value")
          df.write.mode("Overwrite").parquet(dir.getAbsolutePath)
          sql(
            s"""
               |CREATE EXTERNAL TABLE test_precision(name STRING, value DECIMAL(18,6))
               |STORED AS PARQUET LOCATION '${dir.getAbsolutePath}'
               |""".stripMargin)
          checkAnswer(sql("SELECT * FROM test_precision"), Row("dummy", null))
        }
      }
    }
```

We write a data with schema

It's caused by you create a df with
```
root
 |-- name: string (nullable = false)
 |-- value: decimal(38,16) (nullable = false)
```
but create table schema

```
root
 |-- name: string (nullable = false)
 |-- value: decimal(18,6) (nullable = false)
```

This will cause enforcePrecisionScale return `null`
```
  public HiveDecimal getPrimitiveJavaObject(Object o) {
    return o == null ? null : this.enforcePrecisionScale(((HiveDecimalWritable)o).getHiveDecimal());
  }
```
Then throw NPE when call `toCatalystDecimal `

We should judge if the return value is `null` to avoid throw NPE

Fix bug

No

Added UT

Closes apache#34519 from AngersZhuuuu/SPARK-37196.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit a4f8ffb)
Signed-off-by: Dongjoon Hyun <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
### What changes were proposed in this pull request?
For case
```
withTempDir { dir =>
      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
        withTable("test_precision") {
          val df = sql("SELECT 'dummy' AS name, 1000000000000000000010.7000000000000010 AS value")
          df.write.mode("Overwrite").parquet(dir.getAbsolutePath)
          sql(
            s"""
               |CREATE EXTERNAL TABLE test_precision(name STRING, value DECIMAL(18,6))
               |STORED AS PARQUET LOCATION '${dir.getAbsolutePath}'
               |""".stripMargin)
          checkAnswer(sql("SELECT * FROM test_precision"), Row("dummy", null))
        }
      }
    }
```

We write a data with schema

It's caused by you create a df with
```
root
 |-- name: string (nullable = false)
 |-- value: decimal(38,16) (nullable = false)
```
but create table schema

```
root
 |-- name: string (nullable = false)
 |-- value: decimal(18,6) (nullable = false)
```

This will cause enforcePrecisionScale return `null`
```
  public HiveDecimal getPrimitiveJavaObject(Object o) {
    return o == null ? null : this.enforcePrecisionScale(((HiveDecimalWritable)o).getHiveDecimal());
  }
```
Then throw NPE when call `toCatalystDecimal `

We should judge if the return value is `null` to avoid throw NPE

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes apache#34519 from AngersZhuuuu/SPARK-37196.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit a4f8ffb)
Signed-off-by: Dongjoon Hyun <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
### What changes were proposed in this pull request?
For case
```
withTempDir { dir =>
      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
        withTable("test_precision") {
          val df = sql("SELECT 'dummy' AS name, 1000000000000000000010.7000000000000010 AS value")
          df.write.mode("Overwrite").parquet(dir.getAbsolutePath)
          sql(
            s"""
               |CREATE EXTERNAL TABLE test_precision(name STRING, value DECIMAL(18,6))
               |STORED AS PARQUET LOCATION '${dir.getAbsolutePath}'
               |""".stripMargin)
          checkAnswer(sql("SELECT * FROM test_precision"), Row("dummy", null))
        }
      }
    }
```

We write a data with schema

It's caused by you create a df with
```
root
 |-- name: string (nullable = false)
 |-- value: decimal(38,16) (nullable = false)
```
but create table schema

```
root
 |-- name: string (nullable = false)
 |-- value: decimal(18,6) (nullable = false)
```

This will cause enforcePrecisionScale return `null`
```
  public HiveDecimal getPrimitiveJavaObject(Object o) {
    return o == null ? null : this.enforcePrecisionScale(((HiveDecimalWritable)o).getHiveDecimal());
  }
```
Then throw NPE when call `toCatalystDecimal `

We should judge if the return value is `null` to avoid throw NPE

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes apache#34519 from AngersZhuuuu/SPARK-37196.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit a4f8ffb)
Signed-off-by: Dongjoon Hyun <[email protected]>
wangyum pushed a commit that referenced this pull request May 26, 2023
For case
```
withTempDir { dir =>
      withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") {
        withTable("test_precision") {
          val df = sql("SELECT 'dummy' AS name, 1000000000000000000010.7000000000000010 AS value")
          df.write.mode("Overwrite").parquet(dir.getAbsolutePath)
          sql(
            s"""
               |CREATE EXTERNAL TABLE test_precision(name STRING, value DECIMAL(18,6))
               |STORED AS PARQUET LOCATION '${dir.getAbsolutePath}'
               |""".stripMargin)
          checkAnswer(sql("SELECT * FROM test_precision"), Row("dummy", null))
        }
      }
    }
```

We write a data with schema

It's caused by you create a df with
```
root
 |-- name: string (nullable = false)
 |-- value: decimal(38,16) (nullable = false)
```
but create table schema

```
root
 |-- name: string (nullable = false)
 |-- value: decimal(18,6) (nullable = false)
```

This will cause enforcePrecisionScale return `null`
```
  public HiveDecimal getPrimitiveJavaObject(Object o) {
    return o == null ? null : this.enforcePrecisionScale(((HiveDecimalWritable)o).getHiveDecimal());
  }
```
Then throw NPE when call `toCatalystDecimal `

We should judge if the return value is `null` to avoid throw NPE

Fix bug

No

Added UT

Closes #34519 from AngersZhuuuu/SPARK-37196.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants