-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5068] [SQL] Fix bug query data when path doesn't exist for HiveContext #4356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
0033ed2
1a65548
76df33f
6958312
1f033cd
d3a4d3c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,7 +18,7 @@ | |
| package org.apache.spark.sql.hive | ||
|
|
||
| import org.apache.hadoop.conf.Configuration | ||
| import org.apache.hadoop.fs.{Path, PathFilter} | ||
| import org.apache.hadoop.fs.{FileSystem, Path, PathFilter} | ||
| import org.apache.hadoop.hive.conf.HiveConf | ||
| import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants._ | ||
| import org.apache.hadoop.hive.ql.exec.Utilities | ||
|
|
@@ -68,6 +68,8 @@ class HadoopTableReader( | |
| math.max(sc.hiveconf.getInt("mapred.map.tasks", 1), sc.sparkContext.defaultMinPartitions) | ||
| } | ||
|
|
||
| @transient private lazy val fs = FileSystem.get(sc.hiveconf) | ||
|
|
||
| // TODO: set aws s3 credentials. | ||
|
|
||
| private val _broadcastedHiveConf = | ||
|
|
@@ -218,11 +220,10 @@ class HadoopTableReader( | |
| * returned in a single, comma-separated string. | ||
| */ | ||
| private def applyFilterIfNeeded(path: Path, filterOpt: Option[PathFilter]): Option[String] = { | ||
| if (path.getFileSystem(sc.hiveconf).exists(path)) { | ||
| if (fs.exists(path)) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My concern is similar to what @marmbrus mentioned in #3981. It's pretty expensive to check each path in serial for tables with lots of partitions. Especially when the data reside on S3. Can we use |
||
| // if the file exists | ||
| filterOpt match { | ||
| case Some(filter) => | ||
| val fs = path.getFileSystem(sc.hiveconf) | ||
| val filteredFiles = fs.listStatus(path, filter).map(_.getPath.toString) | ||
| if (filteredFiles.length > 0) { | ||
| Some(filteredFiles.mkString(",")) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd better get
fsfrom thepath,because in thehadoop namenode federationwe may get some problems likeWrong FSexception if we use theFileSystem.get(sc.hiveconf)to get fs.