Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
[SPARK-23434][SQL] Spark should not warn metadata directory for a H…
…DFS file path

## What changes were proposed in this pull request?

In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), it warns with a wrong warning message during looking up `people.json/_spark_metadata`. The root cause of this situation is the difference between `LocalFileSystem` and `DistributedFileSystem`. `LocalFileSystem.exists()` returns `false`, but `DistributedFileSystem.exists` raises `org.apache.hadoop.security.AccessControlException`.

```scala
scala> spark.version
res0: String = 2.4.0-SNAPSHOT

scala> spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+

scala> spark.read.json("hdfs:///tmp/people.json")
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory.
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory.
```

After this PR,
```scala
scala> spark.read.json("hdfs:///tmp/people.json").show
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+
```

## How was this patch tested?

Manual.

Author: Dongjoon Hyun <[email protected]>

Closes #20616 from dongjoon-hyun/SPARK-23434.
  • Loading branch information
dongjoon-hyun committed Mar 2, 2018
commit fd538ca84936549af623d3678d43ae935a6549e3
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,11 @@ object FileStreamSink extends Logging {
try {
val hdfsPath = new Path(singlePath)
val fs = hdfsPath.getFileSystem(hadoopConf)
val metadataPath = new Path(hdfsPath, metadataDir)
val res = fs.exists(metadataPath)
res
if (fs.isDirectory(hdfsPath)) {
fs.exists(new Path(hdfsPath, metadataDir))
} else {
false
}
} catch {
case NonFatal(e) =>
logWarning(s"Error while looking for metadata directory.")
Expand Down