[SPARK-19059] [SQL] Unable to retrieve data from parquet table whose name startswith underscore #16635

jayadevanmurali · 2017-01-18T19:02:31Z

What changes were proposed in this pull request?

The initial shouldFilterOut() method invocation filter the root path name(table name in the intial call) and remove if it contains _. I moved the check one level below, so it first list files/directories in the given root path and then apply filter.
(Please fill in changes proposed in this fix)

How was this patch tested?

Added new test case for this scenario
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Update from original

jayadevanmurali · 2017-01-18T19:10:26Z

@uncleGen @eric Liang @cloud-fan

Could you please review this PR

gatorsmile

See the related JIRA: https://issues.apache.org/jira/browse/HIVE-6431

By default, FileInputFormat(which is the super class of various formats) in hadoop ignores file name starts with "_" or ".", and hard to walk around this in hive codebase.

We are also using FileInputFormat . Thus, it means we are facing the same issue.

cloud-fan · 2017-01-19T03:06:28Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

        newSession.sessionState.conf.setConf(SQLConf.RUN_SQL_ON_FILES, originalValue)
      }
-    }
-  }


does is still compile?

cloud-fan · 2017-01-19T03:06:41Z

ok to test

SparkQA · 2017-01-19T03:10:29Z

Test build #71628 has finished for PR 16635 at commit a09523e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

jayadevanmurali · 2017-01-19T03:37:32Z

retest this please

SparkQA · 2017-01-19T03:40:26Z

Test build #71629 has finished for PR 16635 at commit 499711d.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

jayadevanmurali · 2017-01-19T03:44:09Z

retest this please

cloud-fan · 2017-01-19T04:14:44Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

  }
+
+  test(
+    "SPARK-19059: Unable to retrieve data from parquet table whose name startswith underscore") {


this bug is not parquet only right? how about SPARK-19059: read file based table whose name starts with underscore

Yes, it is not parquet only. I think SPARK-19059: read file based table whose name starts with underscore is fine.

cloud-fan · 2017-01-19T04:15:29Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+
+  test(
+    "SPARK-19059: Unable to retrieve data from parquet table whose name startswith underscore") {
+    sql("CREATE TABLE `_tbl`(i INT) USING parquet")


use withTable("tbl", "_tbl") {...}

I will do this change

cloud-fan · 2017-01-19T04:18:35Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+    "SPARK-19059: Unable to retrieve data from parquet table whose name startswith underscore") {
+    sql("CREATE TABLE `_tbl`(i INT) USING parquet")
+    sql("INSERT INTO `_tbl` VALUES (1), (2), (3)")
+    checkAnswer( sql("SELECT * FROM `_tbl`"), Row(1) :: Row(2) :: Row(3) :: Nil)


I think we can stop here, create a table with underscore, then insert into it, then read it, that's enough to prove we can support table with underscore.

It should be fine.

SparkQA · 2017-01-19T06:04:51Z

Test build #71630 has finished for PR 16635 at commit 71be60f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jayadevanmurali · 2017-01-19T06:19:09Z

@cloud-fan
Incorporated code review comments

jayadevanmurali · 2017-01-19T06:19:19Z

retest this please

SparkQA · 2017-01-19T06:22:44Z

Test build #71635 has started for PR 16635 at commit ea6bd7d.

uncleGen · 2017-01-19T08:15:38Z

process was terminated by signal 9

uncleGen · 2017-01-19T08:15:51Z

retest this please.

SparkQA · 2017-01-19T10:36:52Z

Test build #71644 has finished for PR 16635 at commit ea6bd7d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-19T12:16:26Z

thanks, merging to master!

…ame startswith underscore ## What changes were proposed in this pull request? The initial shouldFilterOut() method invocation filter the root path name(table name in the intial call) and remove if it contains _. I moved the check one level below, so it first list files/directories in the given root path and then apply filter. (Please fill in changes proposed in this fix) ## How was this patch tested? Added new test case for this scenario (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: jayadevanmurali <[email protected]> Author: jayadevan <[email protected]> Closes apache#16635 from jayadevanmurali/branch-0.1-SPARK-19059.

springcoil · 2017-05-31T12:18:40Z

Came across this error today in Spark 2.1.0. Just adding this as a remark. I had code like nz_exp_test = spark.read.parquet("dbfs:/experience_test/_NZ_2017-05-31-11-59-05.parquet") the error reported wasn't very useful it was AnalysisException: unable to infer schema for Parquet. It must be specified manually.;'

jayadevanmurali added 10 commits January 31, 2016 07:58

Merge pull request #1 from apache/master

290b86b

Update from original

Merge remote-tracking branch 'upstream/master'

9a7f87f

Merge remote-tracking branch 'upstream/master'

d8ed586

Merge remote-tracking branch 'upstream/master'

cffbec2

Merge remote-tracking branch 'upstream/master'

e672750

Added test case to handle SPARK-19059

f5999b9

Updated listLeafFiles() handile SPARK-19059

74e4a1a

Update PartitioningAwareFileIndex.scala

dec96d8

Update PartitioningAwareFileIndex.scala

4334b2b

Merge branch 'master' into branch-0.1-SPARK-19059

a09523e

gatorsmile reviewed Jan 18, 2017

View reviewed changes

cloud-fan reviewed Jan 19, 2017

View reviewed changes

corrected style error

499711d

corrected style error

71be60f

cloud-fan reviewed Jan 19, 2017

View reviewed changes

incorporated code review comments

ea6bd7d

asfgit closed this in 064fadd Jan 19, 2017

[SPARK-19059] [SQL] Unable to retrieve data from parquet table whose name startswith underscore #16635

[SPARK-19059] [SQL] Unable to retrieve data from parquet table whose name startswith underscore #16635

Uh oh!

Conversation

jayadevanmurali commented Jan 18, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

jayadevanmurali commented Jan 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 19, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

jayadevanmurali commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

jayadevanmurali commented Jan 19, 2017

Uh oh!

cloud-fan Jan 19, 2017

Choose a reason for hiding this comment

Uh oh!

jayadevanmurali Jan 19, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 19, 2017

Choose a reason for hiding this comment

Uh oh!

jayadevanmurali Jan 19, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 19, 2017

Choose a reason for hiding this comment

Uh oh!

jayadevanmurali Jan 19, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

jayadevanmurali commented Jan 19, 2017

Uh oh!

jayadevanmurali commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

uncleGen commented Jan 19, 2017

Uh oh!

uncleGen commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

cloud-fan commented Jan 19, 2017

Uh oh!

springcoil commented May 31, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jayadevanmurali commented Jan 18, 2017 •

edited

Loading