Skip to content
Closed
Prev Previous commit
Next Next commit
UT failed #120087. Hive-0.13 doesn't have `Partition.getPartitionPath…
…()`, using Partition.getPath() instead.
  • Loading branch information
Deegue committed Mar 20, 2020
commit 5a5ff7432d28c64b25c28ea74ba2d30d80ea6c2c
Original file line number Diff line number Diff line change
Expand Up @@ -658,7 +658,7 @@ private[hive] class HiveClientImpl(
// Check whether the partition we are going to drop is empty.
// We make a dummy one for the empty partition. See [SPARK-29786] for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this how hive resolve the problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this how hive resolve the problem?

Yes, It's the same method as Hive uses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it bad for performance? i.e. you call fs.exists and fs.listStatus for each partition.

Copy link
Contributor Author

@Deegue Deegue Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it bad for performance? i.e. you call fs.exists and fs.listStatus for each partition.

Yes, but only affect drop partitions. I think it's necessary and won't take much time to do the check while dropping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point to the Hive source code that does the same thing? i.e. create a dummy directory before dropping the partition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point to the Hive source code that does the same thing? i.e. create a dummy directory before dropping the partition.

In Hive 1.x, it's like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it for DROP PARTITION?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it for DROP PARTITION?

No, it will check every query before executing. Maybe it's better to do the check before all queries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Spark have a problem to do table scan when partition directory not exist?

Copy link
Contributor Author

@Deegue Deegue Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Spark have a problem to do table scan when partition directory not exist?

It's related to #24668, and controlled by spark.sql.files.ignoreMissingFiles.
Spark will check it when listing leaf files.

parts.foreach { partition =>
val partPath = partition.getPartitionPath
val partPath = partition.getPath.head
if (isEmptyPath(partPath)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check non-existing path, not empty path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check non-existing path, not empty path?

Yes, you're right. We only need to check the existence of path instead of those under the path.

val fs = partPath.getFileSystem(conf)
fs.mkdirs(partPath)
Expand Down