Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented May 21, 2018

What changes were proposed in this pull request?

This pr modified JDBC datasource code to verify and normalize a partition column based on the JDBC resolved schema before building JDBCRelation.

Closes #20370

How was this patch tested?

Added tests in JDBCSuite.

@maropu
Copy link
Member Author

maropu commented May 21, 2018

@gatorsmile @conorbmurphy

@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90874 has finished for PR 21379 at commit 8d97b0d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 21, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90882 has finished for PR 21379 at commit 8d97b0d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input partition.column is already quoted, can we avoid adding the quotes again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You point out the case users explicitly add quotes in partitionColumn? e.g.,

+      val df = spark.read.format("jdbc")
+        ...
+        .option("partitionColumn", """"THEID"""")
+        ...
+        .option("quotePartitionColumnName", quotePartitionColumnName)
+        .load()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest fix changes an existing behaviour (when quoting non-partition column names), so I'm not sure this fix is acceptable. Any suggestion?

@SparkQA
Copy link

SparkQA commented May 28, 2018

Test build #91213 has finished for PR 21379 at commit 8d5fa9a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

gatorsmile commented May 30, 2018

Sorry, I just realized this is a wrong direction. Instead of trusting the user inputs, we should verify and normalize the user-specified partition columns by using the already fetched table schema info val tableSchema = JDBCRDD.resolveTable(jdbcOptions) when building JDBCRelation

@maropu
Copy link
Member Author

maropu commented May 30, 2018

ok. If the schema does not have the column, throws AnalysisException or something?

@gatorsmile
Copy link
Member

Yeah, we should do it.

@maropu
Copy link
Member Author

maropu commented May 30, 2018

ok, I will re-check.

maropu added 4 commits June 14, 2018 09:55
This reverts commit 9335b7c.
This reverts commit 858d4dc.
@maropu maropu force-pushed the SPARK-24327 branch 2 times, most recently from a9b0306 to d76bc7f Compare June 14, 2018 04:50
}

testIncorrectJdbcPartitionColumn("NoExistingColumn")
withSQLConf("spark.sql.caseSensitive" -> "true") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLConf.CASE_SENSITIVE.key

ans.toArray
}

def getSchema(jdbcOptions: JDBCOptions, resolver: Resolver): StructType = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the function description?

* Null value predicate is added to the first partition where clause to include
* the rows with null value for the partitions column.
*
* @param partitioning partition information to generate the where clause for each partition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the other two @param for the new parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}.map(dialect.quoteIdentifier).getOrElse {
throw new AnalysisException(s"User-defined partition column ${partitioning.column} not " +
s"found in the JDBC relation: ${schema.simpleString(Utils.maxNumToStringFields)}")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a new private function for the above resolution and checking logic?

@gatorsmile
Copy link
Member

@maropu The fix looks good to me. Thanks for changing the solution. Could you update the PR title and description?

@maropu
Copy link
Member Author

maropu commented Jun 14, 2018

oh, I forgot to update the title..yea, I'll do soon.

@SparkQA
Copy link

SparkQA commented Jun 14, 2018

Test build #91813 has finished for PR 21379 at commit d76bc7f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 14, 2018

Test build #91812 has finished for PR 21379 at commit a9b0306.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 14, 2018

Test build #91814 has finished for PR 21379 at commit a3be215.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu changed the title [SPARK-24327][SQL] Add an option to quote a partition column name in JDBCRelation [SPARK-24327][SQL] Verify and normalize a partition column name based on the JDBC resolved schema Jun 14, 2018
@SparkQA
Copy link

SparkQA commented Jun 14, 2018

Test build #91835 has finished for PR 21379 at commit 28e811e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 14, 2018

Test build #91838 has finished for PR 21379 at commit d2ef95c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
val relation = JDBCRelation(parts, options)(sparkSession)
val schema = JDBCRelation.getSchema(sparkSession.sessionState.conf.resolver, options)
val relation = JDBCRelation(schema, parts, options)(sparkSession)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to change this. Add an apply function to object JDBCRelation.scala

  def apply(parts: Array[Partition], jdbcOptions: JDBCOptions)(
      sparkSession: SparkSession): JDBCRelation = {
    val schema = getSchema(jdbcOptions, sparkSession.sessionState.conf.resolver)
    JDBCRelation(schema, parts, jdbcOptions)(sparkSession)
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@gatorsmile
Copy link
Member

LGTM except one minor comment.

@SparkQA
Copy link

SparkQA commented Jun 15, 2018

Test build #91867 has finished for PR 21379 at commit 614c87a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jun 24, 2018

ping

@gatorsmile
Copy link
Member

retest this please

@gatorsmile
Copy link
Member

The build passed. The tests have passed in the previous run. The current tests will be killed at the midnight.

LGTM

Thanks! Merged to master.

@asfgit asfgit closed this in f596ebe Jun 25, 2018
@SparkQA
Copy link

SparkQA commented Jun 25, 2018

Test build #92285 has finished for PR 21379 at commit 614c87a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants