Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Nov 24, 2021

What changes were proposed in this pull request?

We will store table schema in table properties for the read-side to restore. In Spark 3.1, we add char/varchar support natively. In some commands like create table, alter table with these types, the char(x) or varchar(x) will be stored directly to those properties. If a user uses Spark 2 to read such a table it will fail to parse the schema.

FYI, https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L136

A table can be a newly created one by Spark 3.1 and later or an existing one modified by Spark 3.1 and on.

Why are the changes needed?

backward compatibility

Does this PR introduce any user-facing change?

That's not necessarily user-facing as a bugfix and only related to internal table properties.

How was this patch tested?

manully

@github-actions github-actions bot added the SQL label Nov 24, 2021
@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50049/

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50049/

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Test build #145576 has finished for PR 34697 at commit 1934285.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

cc @cloud-fan thanks

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50052/

val schemaJson = CatalogTable.readLargeTableProp(table.properties, DATASOURCE_SCHEMA)
if (schemaJson.isDefined) {
val schemaFromTableProps = DataType.fromJson(schemaJson.get).asInstanceOf[StructType]
val rawSchema = CharVarcharUtils.getRawSchema(schemaFromTableProps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should create a method getSchemaFromTableProperties and call it in these places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50052/

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Test build #145580 has finished for PR 34697 at commit 799a761.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn yaooqinn changed the title [SPARK-37452][SQL] Char and Varchar breaks backward compatibility between v3.1 and v2 [SPARK-37452][SQL] Char and Varchar break backward compatibility between v3.1 and v2 Nov 25, 2021
@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Test build #145603 has finished for PR 34697 at commit 0334309.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50075/

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50079/

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Test build #145605 has finished for PR 34697 at commit d3a284f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50079/

@mridulm
Copy link
Contributor

mridulm commented Nov 25, 2021

FYI @xkrogen, @shardulm94

@yaooqinn
Copy link
Member Author

any more concerns from the CCers?

@yaooqinn yaooqinn closed this in 0c3c4e2 Nov 29, 2021
yaooqinn added a commit that referenced this pull request Nov 29, 2021
…een v3.1 and v2

### What changes were proposed in this pull request?

We will store table schema in table properties for the read-side to restore. In Spark 3.1, we add char/varchar support natively. In some commands like `create table`, `alter table` with these types,  the `char(x)` or `varchar(x)` will be stored directly to those properties. If a user uses Spark 2 to read such a table it will fail to parse the schema.

FYI, https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L136

A table can be a newly created one by Spark 3.1 and later or an existing one modified by Spark 3.1 and on.

### Why are the changes needed?

backward compatibility

### Does this PR introduce _any_ user-facing change?

That's not necessarily user-facing as a bugfix and only related to internal table properties.

### How was this patch tested?

manully

Closes #34697 from yaooqinn/SPARK-37452.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
(cherry picked from commit 0c3c4e2)
Signed-off-by: Kent Yao <[email protected]>
@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 29, 2021

merged to master and branch 3.2, will create a backport PR to 3.1 and link it to this later

dongjoon-hyun pushed a commit that referenced this pull request Nov 30, 2021
… between v3.1 and v2

This backports #34697 to 3.1

### What changes were proposed in this pull request?

We will store table schema in table properties for the read-side to restore. In Spark 3.1, we add char/varchar support natively. In some commands like `create table`, `alter table` with these types,  the `char(x)` or `varchar(x)` will be stored directly to those properties. If a user uses Spark 2 to read such a table it will fail to parse the schema.

FYI, https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L136

A table can be a newly created one by Spark 3.1 and later or an existing one modified by Spark 3.1 and on.

### Why are the changes needed?

backward compatibility

### Does this PR introduce _any_ user-facing change?

That's not necessarily user-facing as a bugfix and only related to internal table properties.

### How was this patch tested?

manully

Closes #34736 from yaooqinn/PR_TOOL_PICK_PR_34697_BRANCH-3.1.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
…een v3.1 and v2

### What changes were proposed in this pull request?

We will store table schema in table properties for the read-side to restore. In Spark 3.1, we add char/varchar support natively. In some commands like `create table`, `alter table` with these types,  the `char(x)` or `varchar(x)` will be stored directly to those properties. If a user uses Spark 2 to read such a table it will fail to parse the schema.

FYI, https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L136

A table can be a newly created one by Spark 3.1 and later or an existing one modified by Spark 3.1 and on.

### Why are the changes needed?

backward compatibility

### Does this PR introduce _any_ user-facing change?

That's not necessarily user-facing as a bugfix and only related to internal table properties.

### How was this patch tested?

manully

Closes apache#34697 from yaooqinn/SPARK-37452.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
(cherry picked from commit 0c3c4e2)
Signed-off-by: Kent Yao <[email protected]>
fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
… between v3.1 and v2

This backports apache#34697 to 3.1

### What changes were proposed in this pull request?

We will store table schema in table properties for the read-side to restore. In Spark 3.1, we add char/varchar support natively. In some commands like `create table`, `alter table` with these types,  the `char(x)` or `varchar(x)` will be stored directly to those properties. If a user uses Spark 2 to read such a table it will fail to parse the schema.

FYI, https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L136

A table can be a newly created one by Spark 3.1 and later or an existing one modified by Spark 3.1 and on.

### Why are the changes needed?

backward compatibility

### Does this PR introduce _any_ user-facing change?

That's not necessarily user-facing as a bugfix and only related to internal table properties.

### How was this patch tested?

manully

Closes apache#34736 from yaooqinn/PR_TOOL_PICK_PR_34697_BRANCH-3.1.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…een v3.1 and v2

### What changes were proposed in this pull request?

We will store table schema in table properties for the read-side to restore. In Spark 3.1, we add char/varchar support natively. In some commands like `create table`, `alter table` with these types,  the `char(x)` or `varchar(x)` will be stored directly to those properties. If a user uses Spark 2 to read such a table it will fail to parse the schema.

FYI, https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L136

A table can be a newly created one by Spark 3.1 and later or an existing one modified by Spark 3.1 and on.

### Why are the changes needed?

backward compatibility

### Does this PR introduce _any_ user-facing change?

That's not necessarily user-facing as a bugfix and only related to internal table properties.

### How was this patch tested?

manully

Closes apache#34697 from yaooqinn/SPARK-37452.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
(cherry picked from commit 0c3c4e2)
Signed-off-by: Kent Yao <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…een v3.1 and v2

### What changes were proposed in this pull request?

We will store table schema in table properties for the read-side to restore. In Spark 3.1, we add char/varchar support natively. In some commands like `create table`, `alter table` with these types,  the `char(x)` or `varchar(x)` will be stored directly to those properties. If a user uses Spark 2 to read such a table it will fail to parse the schema.

FYI, https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L136

A table can be a newly created one by Spark 3.1 and later or an existing one modified by Spark 3.1 and on.

### Why are the changes needed?

backward compatibility

### Does this PR introduce _any_ user-facing change?

That's not necessarily user-facing as a bugfix and only related to internal table properties.

### How was this patch tested?

manully

Closes apache#34697 from yaooqinn/SPARK-37452.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
(cherry picked from commit 0c3c4e2)
Signed-off-by: Kent Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants