Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
enable 'alter table replace columns' command in sql parser
  • Loading branch information
manu-olx committed Jun 4, 2019
commit a99e3624031f9a7230e0eac1906809e555ad6bbe
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,8 @@ statement
(identifier | FOR COLUMNS identifierSeq | FOR ALL COLUMNS)? #analyze
| ALTER TABLE tableIdentifier
ADD COLUMNS '(' columns=colTypeList ')' #addTableColumns
| ALTER TABLE tableIdentifier
REPLACE COLUMNS '(' columns=colTypeList ')' #replaceTableColumns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine to add this syntax, but this only supports changing column comments, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping the column is still risky to us. When you adding the columns back, we might see the unexpected values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile the idea is actually to implement alter table replace colu�mns fully:

  • drop column if not preset in the new schema
  • add column, if not preset in the table’s schema
  • keep column if colum_name and colum_type match between schemas
  • raise exception if trying to replace an existing column with a different data type

I believe this is a much needed feature to be able to manage meta-stores fully from Spark; however I understand the complexity and risks of this operation, so please let’s take the time to ensure it’s the right thing to do and all possibile ramifications.

I tried to make the commit history as clean and as descriptive as possibile and just added some more details in the PR description known issues section.

I've also added an extra validation step to prevent columns to be replaced if the data type doesn't match (8b6da23)

Let me know if anything isn’t clear and I'll be happy add more details.

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me give a typical scenario. For an existing table, if we drop the column col1, the data will not be visible to the end users. This is expected. However, if we add col1 back, the previous values of the column col1 should not be visible to the end users too. This is how the other database systems work. What is your thought?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand your point, it makes total sense and i would completely agree if hive was like the other dbms. according to your scenario, also the below should not be possibile, but it is:

>>> spark.sql("CREATE EXTERNAL TABLE test_table (c1 string, c2 int ) STORED AS PARQUET LOCATION '/tmp/test_table'")
DataFrame[]

>>> spark.sql("INSERT INTO test_table VALUES ('c1', 2)")
DataFrame[]

>>> spark.sql("SELECT * FROM test_table").show()
+---+---+
| c1| c2|
+---+---+
| c1|  2|
+---+---+

>>> spark.sql("DROP TABLE test_table")
DataFrame[]

>>> spark.catalog.listTables()
[]

>>> spark.sql("CREATE EXTERNAL TABLE test_table (c1 string, c2 int, c3 boolean) STORED AS PARQUET LOCATION '/tmp/test_table'")
DataFrame[]

>>> spark.sql("SELECT * FROM test_table").show()
+---+---+----+
| c1| c2|  c3|
+---+---+----+
| c1|  2|null|
+---+---+----+

do you agree it's the same behaviour you explained in the "drop / re-create column" example?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile any thoughts about my previous comment?

i was wandering if this could be just a miscommunication issue as you are referring to the behaviour of alter table replace columns with "normal tables", while i'm manly talking about "external tables"?

maybe replace columns could be enabled only for external tables? would that be acceptable / less risky in your opinion?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile any thoughts about my previous comment?

i was wandering if this could be just a miscommunication issue as you are referring to the behaviour of alter table replace columns with "normal tables", while i'm manly talking about "external tables"?

maybe replace columns could be enabled only for external tables? would that be acceptable / less risky in your opinion?

| ALTER (TABLE | VIEW) from=tableIdentifier
RENAME TO to=tableIdentifier #renameTable
| ALTER (TABLE | VIEW) tableIdentifier
Expand Down Expand Up @@ -232,7 +234,6 @@ unsupportedHiveNativeCommands
| kw1=ALTER kw2=TABLE tableIdentifier partitionSpec? kw3=COMPACT
| kw1=ALTER kw2=TABLE tableIdentifier partitionSpec? kw3=CONCATENATE
| kw1=ALTER kw2=TABLE tableIdentifier partitionSpec? kw3=SET kw4=FILEFORMAT
| kw1=ALTER kw2=TABLE tableIdentifier partitionSpec? kw3=REPLACE kw4=COLUMNS
| kw1=START kw2=TRANSACTION
| kw1=COMMIT
| kw1=ROLLBACK
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -677,6 +677,23 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
)
}

/**
* Create a [[AlterTableReplaceColumnsCommand]] command.
*
* For example:
* {{{
* ALTER TABLE table1
* REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...);
* }}}
*/
override def visitReplaceTableColumns(ctx: ReplaceTableColumnsContext):
LogicalPlan = withOrigin(ctx) {
AlterTableReplaceColumnsCommand(
visitTableIdentifier(ctx.tableIdentifier),
visitColTypeList(ctx.columns)
)
}

/**
* Create an [[AlterTableSetPropertiesCommand]] command.
*
Expand Down