-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19059] [SQL] Unable to retrieve data from parquet table whose name startswith underscore #16635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19059] [SQL] Unable to retrieve data from parquet table whose name startswith underscore #16635
Conversation
Update from original
|
@uncleGen @eric Liang @cloud-fan Could you please review this PR |
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the related JIRA: https://issues.apache.org/jira/browse/HIVE-6431
By default, FileInputFormat(which is the super class of various formats) in hadoop ignores file name starts with "_" or ".", and hard to walk around this in hive codebase.
We are also using FileInputFormat . Thus, it means we are facing the same issue.
| newSession.sessionState.conf.setConf(SQLConf.RUN_SQL_ON_FILES, originalValue) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does is still compile?
|
ok to test |
|
Test build #71628 has finished for PR 16635 at commit
|
|
retest this please |
|
Test build #71629 has finished for PR 16635 at commit
|
|
retest this please |
| } | ||
|
|
||
| test( | ||
| "SPARK-19059: Unable to retrieve data from parquet table whose name startswith underscore") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this bug is not parquet only right? how about SPARK-19059: read file based table whose name starts with underscore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is not parquet only. I think SPARK-19059: read file based table whose name starts with underscore is fine.
|
|
||
| test( | ||
| "SPARK-19059: Unable to retrieve data from parquet table whose name startswith underscore") { | ||
| sql("CREATE TABLE `_tbl`(i INT) USING parquet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use withTable("tbl", "_tbl") {...}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this change
| "SPARK-19059: Unable to retrieve data from parquet table whose name startswith underscore") { | ||
| sql("CREATE TABLE `_tbl`(i INT) USING parquet") | ||
| sql("INSERT INTO `_tbl` VALUES (1), (2), (3)") | ||
| checkAnswer( sql("SELECT * FROM `_tbl`"), Row(1) :: Row(2) :: Row(3) :: Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can stop here, create a table with underscore, then insert into it, then read it, that's enough to prove we can support table with underscore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fine.
|
Test build #71630 has finished for PR 16635 at commit
|
|
@cloud-fan |
|
retest this please |
|
Test build #71635 has started for PR 16635 at commit |
|
process was terminated by signal 9 |
|
retest this please. |
|
Test build #71644 has finished for PR 16635 at commit
|
|
thanks, merging to master! |
…ame startswith underscore ## What changes were proposed in this pull request? The initial shouldFilterOut() method invocation filter the root path name(table name in the intial call) and remove if it contains _. I moved the check one level below, so it first list files/directories in the given root path and then apply filter. (Please fill in changes proposed in this fix) ## How was this patch tested? Added new test case for this scenario (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: jayadevanmurali <[email protected]> Author: jayadevan <[email protected]> Closes apache#16635 from jayadevanmurali/branch-0.1-SPARK-19059.
…ame startswith underscore ## What changes were proposed in this pull request? The initial shouldFilterOut() method invocation filter the root path name(table name in the intial call) and remove if it contains _. I moved the check one level below, so it first list files/directories in the given root path and then apply filter. (Please fill in changes proposed in this fix) ## How was this patch tested? Added new test case for this scenario (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: jayadevanmurali <[email protected]> Author: jayadevan <[email protected]> Closes apache#16635 from jayadevanmurali/branch-0.1-SPARK-19059.
|
Came across this error today in Spark 2.1.0. Just adding this as a remark. I had code like |
What changes were proposed in this pull request?
The initial shouldFilterOut() method invocation filter the root path name(table name in the intial call) and remove if it contains _. I moved the check one level below, so it first list files/directories in the given root path and then apply filter.
(Please fill in changes proposed in this fix)
How was this patch tested?
Added new test case for this scenario
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.