[SPARK-24327][SQL] Verify and normalize a partition column name based on the JDBC resolved schema #21379

maropu · 2018-05-21T03:38:53Z

What changes were proposed in this pull request?

This pr modified JDBC datasource code to verify and normalize a partition column based on the JDBC resolved schema before building JDBCRelation.

Closes #20370

How was this patch tested?

Added tests in JDBCSuite.

maropu · 2018-05-21T03:45:48Z

@gatorsmile @conorbmurphy

SparkQA · 2018-05-21T07:05:01Z

Test build #90874 has finished for PR 21379 at commit 8d97b0d.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-05-21T07:31:39Z

retest this please

SparkQA · 2018-05-21T11:14:43Z

Test build #90882 has finished for PR 21379 at commit 8d97b0d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-05-21T21:24:56Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

If the input partition.column is already quoted, can we avoid adding the quotes again?

You point out the case users explicitly add quotes in partitionColumn? e.g.,

+ val df = spark.read.format("jdbc") + ... + .option("partitionColumn", """"THEID"""") + ... + .option("quotePartitionColumnName", quotePartitionColumnName) + .load()

The latest fix changes an existing behaviour (when quoting non-partition column names), so I'm not sure this fix is acceptable. Any suggestion?

SparkQA · 2018-05-28T07:05:01Z

Test build #91213 has finished for PR 21379 at commit 8d5fa9a.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-05-30T04:55:34Z

Sorry, I just realized this is a wrong direction. Instead of trusting the user inputs, we should verify and normalize the user-specified partition columns by using the already fetched table schema info val tableSchema = JDBCRDD.resolveTable(jdbcOptions) when building JDBCRelation

maropu · 2018-05-30T05:03:01Z

ok. If the schema does not have the column, throws AnalysisException or something?

gatorsmile · 2018-05-30T06:59:57Z

Yeah, we should do it.

maropu · 2018-05-30T07:29:23Z

ok, I will re-check.

This reverts commit 9335b7c.

This reverts commit 858d4dc.

gatorsmile · 2018-06-14T06:27:13Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

+    }
+
+    testIncorrectJdbcPartitionColumn("NoExistingColumn")
+    withSQLConf("spark.sql.caseSensitive" -> "true") {


SQLConf.CASE_SENSITIVE.key

gatorsmile · 2018-06-14T06:32:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

    ans.toArray
  }
+
+  def getSchema(jdbcOptions: JDBCOptions, resolver: Resolver): StructType = {


Add the function description?

gatorsmile · 2018-06-14T06:32:33Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

   * Null value predicate is added to the first partition where clause to include
   * the rows with null value for the partitions column.
   *
   * @param partitioning partition information to generate the where clause for each partition


Add the other two @param for the new parameters?

gatorsmile · 2018-06-14T06:33:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala

+    }.map(dialect.quoteIdentifier).getOrElse {
+      throw new AnalysisException(s"User-defined partition column ${partitioning.column} not " +
+        s"found in the JDBC relation: ${schema.simpleString(Utils.maxNumToStringFields)}")
+    }


Create a new private function for the above resolution and checking logic?

gatorsmile · 2018-06-14T06:34:19Z

@maropu The fix looks good to me. Thanks for changing the solution. Could you update the PR title and description?

maropu · 2018-06-14T06:42:30Z

oh, I forgot to update the title..yea, I'll do soon.

SparkQA · 2018-06-14T07:05:01Z

Test build #91813 has finished for PR 21379 at commit d76bc7f.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-14T07:05:02Z

Test build #91812 has finished for PR 21379 at commit a9b0306.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-14T07:05:02Z

Test build #91814 has finished for PR 21379 at commit a3be215.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-14T10:40:16Z

Test build #91835 has finished for PR 21379 at commit 28e811e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-14T15:41:48Z

Test build #91838 has finished for PR 21379 at commit d2ef95c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-06-14T15:59:31Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

    }
-    val relation = JDBCRelation(parts, options)(sparkSession)
+    val schema = JDBCRelation.getSchema(sparkSession.sessionState.conf.resolver, options)
+    val relation = JDBCRelation(schema, parts, options)(sparkSession)


We do not need to change this. Add an apply function to object JDBCRelation.scala

def apply(parts: Array[Partition], jdbcOptions: JDBCOptions)( sparkSession: SparkSession): JDBCRelation = { val schema = getSchema(jdbcOptions, sparkSession.sessionState.conf.resolver) JDBCRelation(schema, parts, jdbcOptions)(sparkSession) }

gatorsmile · 2018-06-14T16:00:10Z

LGTM except one minor comment.

SparkQA · 2018-06-15T01:48:33Z

Test build #91867 has finished for PR 21379 at commit 614c87a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-06-24T02:42:43Z

ping

gatorsmile · 2018-06-25T05:39:29Z

retest this please

gatorsmile · 2018-06-25T06:13:47Z

The build passed. The tests have passed in the previous run. The current tests will be killed at the midnight.

LGTM

Thanks! Merged to master.

SparkQA · 2018-06-25T07:05:02Z

Test build #92285 has finished for PR 21379 at commit 614c87a.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile reviewed May 21, 2018

View reviewed changes

maropu added 4 commits June 14, 2018 09:55

Fix

858d4dc

Fix

9335b7c

Revert "Fix"

f9992b5

This reverts commit 9335b7c.

Revert "Fix"

5b5b8ef

This reverts commit 858d4dc.

maropu force-pushed the SPARK-24327 branch 2 times, most recently from a9b0306 to d76bc7f Compare June 14, 2018 04:50

Fix

a3be215

maropu force-pushed the SPARK-24327 branch from d76bc7f to a3be215 Compare June 14, 2018 05:04

gatorsmile reviewed Jun 14, 2018

View reviewed changes

maropu changed the title ~~[SPARK-24327][SQL] Add an option to quote a partition column name in JDBCRelation~~ [SPARK-24327][SQL] Verify and normalize a partition column name based on the JDBC resolved schema Jun 14, 2018

Fix

d2ef95c

maropu force-pushed the SPARK-24327 branch from 28e811e to d2ef95c Compare June 14, 2018 11:27

gatorsmile reviewed Jun 14, 2018

View reviewed changes

Fix

614c87a

asfgit closed this in f596ebe Jun 25, 2018

[SPARK-24327][SQL] Verify and normalize a partition column name based on the JDBC resolved schema #21379

[SPARK-24327][SQL] Verify and normalize a partition column name based on the JDBC resolved schema #21379

Uh oh!

Conversation

maropu commented May 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

maropu commented May 21, 2018

Uh oh!

SparkQA commented May 21, 2018

Uh oh!

maropu commented May 21, 2018

Uh oh!

SparkQA commented May 21, 2018

Uh oh!

gatorsmile May 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 28, 2018

Uh oh!

gatorsmile commented May 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented May 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented May 30, 2018

Uh oh!

maropu commented May 30, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jun 14, 2018

Uh oh!

maropu commented Jun 14, 2018

Uh oh!

SparkQA commented Jun 14, 2018

Uh oh!

SparkQA commented Jun 14, 2018

Uh oh!

SparkQA commented Jun 14, 2018

Uh oh!

SparkQA commented Jun 14, 2018

Uh oh!

SparkQA commented Jun 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jun 14, 2018

Uh oh!

SparkQA commented Jun 15, 2018

Uh oh!

maropu commented Jun 24, 2018

Uh oh!

gatorsmile commented Jun 25, 2018

Uh oh!

gatorsmile commented Jun 25, 2018

Uh oh!

SparkQA commented Jun 25, 2018

maropu commented May 21, 2018 •

edited

Loading

gatorsmile May 21, 2018 •

edited

Loading

gatorsmile commented May 30, 2018 •

edited

Loading

maropu commented May 30, 2018 •

edited

Loading