[SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of throwing an exception for underscore files #23481

HyukjinKwon · 2019-01-07T07:22:08Z

What changes were proposed in this pull request?

The PR #23446 happened to introduce a behaviour change - empty dataframes can't be read anymore from underscore files. It looks controversial to allow or disallow this case so this PR targets to fix to issue warning instead of throwing an exception to be more conservative.

Before

scala> spark.read.schema("a int").parquet("_tmp*").show()
org.apache.spark.sql.AnalysisException: All paths were ignored:
file:/.../_tmp
  file:/.../_tmp1;
  at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:570)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:651)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:635)
  ... 49 elided

scala> spark.read.text("_tmp*").show()
org.apache.spark.sql.AnalysisException: All paths were ignored:
file:/.../_tmp
  file:/.../_tmp1;
  at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:570)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)
  at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:723)
  at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:695)
  ... 49 elided

After

scala> spark.read.schema("a int").parquet("_tmp*").show()
19/01/07 15:14:43 WARN DataSource: All paths were ignored:
  file:/.../_tmp
  file:/.../_tmp1
+---+
|  a|
+---+
+---+


scala> spark.read.text("_tmp*").show()
19/01/07 15:14:51 WARN DataSource: All paths were ignored:
  file:/.../_tmp
  file:/.../_tmp1
+-----+
|value|
+-----+
+-----+

How was this patch tested?

Manually tested as above.

HyukjinKwon · 2019-01-07T07:22:27Z

cc @gatorsmile and @srowen

SparkQA · 2019-01-07T08:05:02Z

Test build #100865 has finished for PR 23481 at commit daa89a8.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-01-07T08:05:13Z

retest this please

SparkQA · 2019-01-07T09:58:56Z

Test build #100868 has finished for PR 23481 at commit daa89a8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-01-07T10:10:19Z

retest this please

SparkQA · 2019-01-07T14:06:19Z

Test build #100877 has finished for PR 23481 at commit daa89a8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile

LGTM

Thanks! Merged to master

HyukjinKwon · 2019-01-08T03:27:12Z

Thank you @srowen and @gatorsmile.

…ception for underscore files ## What changes were proposed in this pull request? The PR apache#23446 happened to introduce a behaviour change - empty dataframes can't be read anymore from underscore files. It looks controversial to allow or disallow this case so this PR targets to fix to issue warning instead of throwing an exception to be more conservative. **Before** ```scala scala> spark.read.schema("a int").parquet("_tmp*").show() org.apache.spark.sql.AnalysisException: All paths were ignored: file:/.../_tmp file:/.../_tmp1; at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:570) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:651) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:635) ... 49 elided scala> spark.read.text("_tmp*").show() org.apache.spark.sql.AnalysisException: All paths were ignored: file:/.../_tmp file:/.../_tmp1; at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:570) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219) at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:723) at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:695) ... 49 elided ``` **After** ```scala scala> spark.read.schema("a int").parquet("_tmp*").show() 19/01/07 15:14:43 WARN DataSource: All paths were ignored: file:/.../_tmp file:/.../_tmp1 +---+ | a| +---+ +---+ scala> spark.read.text("_tmp*").show() 19/01/07 15:14:51 WARN DataSource: All paths were ignored: file:/.../_tmp file:/.../_tmp1 +-----+ |value| +-----+ +-----+ ``` ## How was this patch tested? Manually tested as above. Closes apache#23481 from HyukjinKwon/SPARK-26339. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: gatorsmile <[email protected]>

Issue warning instead of throwing an exception for underscore files

daa89a8

srowen approved these changes Jan 7, 2019

View reviewed changes

gatorsmile reviewed Jan 7, 2019

View reviewed changes

asfgit closed this in 5102ccc Jan 7, 2019

HyukjinKwon deleted the SPARK-26339 branch March 3, 2020 01:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of throwing an exception for underscore files #23481

[SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of throwing an exception for underscore files #23481

Uh oh!

HyukjinKwon commented Jan 7, 2019 •

edited

Loading

Uh oh!

HyukjinKwon commented Jan 7, 2019

Uh oh!

SparkQA commented Jan 7, 2019

Uh oh!

HyukjinKwon commented Jan 7, 2019

Uh oh!

SparkQA commented Jan 7, 2019

Uh oh!

HyukjinKwon commented Jan 7, 2019

Uh oh!

SparkQA commented Jan 7, 2019

Uh oh!

gatorsmile left a comment

Uh oh!

HyukjinKwon commented Jan 8, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of throwing an exception for underscore files #23481

[SPARK-26339][SQL][FOLLOW-UP] Issue warning instead of throwing an exception for underscore files #23481

Uh oh!

Conversation

HyukjinKwon commented Jan 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Jan 7, 2019

Uh oh!

SparkQA commented Jan 7, 2019

Uh oh!

HyukjinKwon commented Jan 7, 2019

Uh oh!

SparkQA commented Jan 7, 2019

Uh oh!

HyukjinKwon commented Jan 7, 2019

Uh oh!

SparkQA commented Jan 7, 2019

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jan 8, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HyukjinKwon commented Jan 7, 2019 •

edited

Loading