Skip to content
Prev Previous commit
Next Next commit
Revert "Clarify the message further with a different exception for fi…
…le which is ignored"

This reverts commit 08850ae.
  • Loading branch information
Hirobe Keiichi committed Dec 25, 2018
commit 777b4db54748aa417d412f66767f1076ec512c40
Original file line number Diff line number Diff line change
Expand Up @@ -554,13 +554,9 @@ case class DataSource(

// Sufficient to check head of the globPath seq for non-glob scenario
// Don't need to check once again if files exist in streaming mode
if (checkFilesExist) {
val firstPath = globPath.head
if (!fs.exists(firstPath)) {
throw new AnalysisException(s"Path does not exist: ${firstPath}")
} else if (InMemoryFileIndex.shouldFilterOut(firstPath.getName)) {
throw new AnalysisException(s"Path exists but is ignored: ${firstPath}")
}
if (checkFilesExist &&
(!fs.exists(globPath.head) || InMemoryFileIndex.shouldFilterOut(globPath.head.getName))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably misunderstanding, but doesn't this still cause it to throw a 'Path does not exist' exception?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InMemoryFileIndex.shouldFilterOut returns true if argument starts with underscore, so throw a 'Path does not exist' exception. I've checked and exception below was thrown.

org.apache.spark.sql.AnalysisException: Path does not exist: file:_test.csv;
  at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:558)
  at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
  at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:355)
  at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:545)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:231)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:625)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:478)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I didn't read carefully. This is the new desired behavior. I agree it would be better to not end up with an odd CSV parsing message. I wonder if we can clarify the message further with a different exception for the new case. The path does exist; it's just ignored.

if (checkFilesExist) {
  val firstPath = globPath.head
  if  (!fs.exists(firstPath)) {
    // ... Path does not exist
  } else if (shouldFilterOut...) {
    // ... Path exists but is ignored
  }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for understanding my proposal.
Your suggestion looks better,I’ll push later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen I pushed now. Could you please check my commit?

throw new AnalysisException(s"Path does not exist: ${globPath.head}")
}
globPath
}.toSeq
Expand Down