Skip to content

Conversation

@jayadevanmurali
Copy link
Contributor

What changes were proposed in this pull request?

The initial shouldFilterOut() method invocation filter the root path name(table name in the intial call) and remove if it contains _. I moved the check one level below, so it first list files/directories in the given root path and then apply filter.
(Please fill in changes proposed in this fix)

How was this patch tested?

Added new test case for this scenario
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

@jayadevanmurali
Copy link
Contributor Author

jayadevanmurali commented Jan 18, 2017

@uncleGen @eric Liang @cloud-fan

Could you please review this PR

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the related JIRA: https://issues.apache.org/jira/browse/HIVE-6431

By default, FileInputFormat(which is the super class of various formats) in hadoop ignores file name starts with "_" or ".", and hard to walk around this in hive codebase.

We are also using FileInputFormat . Thus, it means we are facing the same issue.

newSession.sessionState.conf.setConf(SQLConf.RUN_SQL_ON_FILES, originalValue)
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does is still compile?

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jan 19, 2017

Test build #71628 has finished for PR 16635 at commit a09523e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jayadevanmurali
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 19, 2017

Test build #71629 has finished for PR 16635 at commit 499711d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jayadevanmurali
Copy link
Contributor Author

retest this please

}

test(
"SPARK-19059: Unable to retrieve data from parquet table whose name startswith underscore") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this bug is not parquet only right? how about SPARK-19059: read file based table whose name starts with underscore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is not parquet only. I think SPARK-19059: read file based table whose name starts with underscore is fine.


test(
"SPARK-19059: Unable to retrieve data from parquet table whose name startswith underscore") {
sql("CREATE TABLE `_tbl`(i INT) USING parquet")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use withTable("tbl", "_tbl") {...}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this change

"SPARK-19059: Unable to retrieve data from parquet table whose name startswith underscore") {
sql("CREATE TABLE `_tbl`(i INT) USING parquet")
sql("INSERT INTO `_tbl` VALUES (1), (2), (3)")
checkAnswer( sql("SELECT * FROM `_tbl`"), Row(1) :: Row(2) :: Row(3) :: Nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can stop here, create a table with underscore, then insert into it, then read it, that's enough to prove we can support table with underscore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine.

@SparkQA
Copy link

SparkQA commented Jan 19, 2017

Test build #71630 has finished for PR 16635 at commit 71be60f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jayadevanmurali
Copy link
Contributor Author

@cloud-fan
Incorporated code review comments

@jayadevanmurali
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 19, 2017

Test build #71635 has started for PR 16635 at commit ea6bd7d.

@uncleGen
Copy link
Contributor

process was terminated by signal 9

@uncleGen
Copy link
Contributor

retest this please.

@SparkQA
Copy link

SparkQA commented Jan 19, 2017

Test build #71644 has finished for PR 16635 at commit ea6bd7d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 064fadd Jan 19, 2017
@cloud-fan
Copy link
Contributor

thanks, merging to master!

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ame startswith underscore

## What changes were proposed in this pull request?
The initial shouldFilterOut() method invocation filter the root path name(table name in the intial call) and remove if it contains _. I moved the check one level below, so it first list files/directories in the given root path and then apply filter.
(Please fill in changes proposed in this fix)

## How was this patch tested?
Added new test case for this scenario
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: jayadevanmurali <[email protected]>
Author: jayadevan <[email protected]>

Closes apache#16635 from jayadevanmurali/branch-0.1-SPARK-19059.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…ame startswith underscore

## What changes were proposed in this pull request?
The initial shouldFilterOut() method invocation filter the root path name(table name in the intial call) and remove if it contains _. I moved the check one level below, so it first list files/directories in the given root path and then apply filter.
(Please fill in changes proposed in this fix)

## How was this patch tested?
Added new test case for this scenario
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: jayadevanmurali <[email protected]>
Author: jayadevan <[email protected]>

Closes apache#16635 from jayadevanmurali/branch-0.1-SPARK-19059.
@springcoil
Copy link

Came across this error today in Spark 2.1.0. Just adding this as a remark. I had code like nz_exp_test = spark.read.parquet("dbfs:/experience_test/_NZ_2017-05-31-11-59-05.parquet") the error reported wasn't very useful it was AnalysisException: unable to infer schema for Parquet. It must be specified manually.;'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants