Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add note to explain
  • Loading branch information
josephsu committed Aug 15, 2014
commit 8fe2398f0e460c65ae57def2198308c4fe2de0f1
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,8 @@ private[parquet] object ParquetTypesConverter extends Logging {
}
ParquetRelation.enableLogForwarding()

// NOTE: Explicitly list "_temporary" because hadoop 0.23 removed the variable TEMP_DIR_NAME
// from FileOutputCommitter. Check MAPREDUCE-5229 for the detail.
val children = fs.listStatus(path).filterNot { status =>
val name = status.getPath.getName
name(0) == '.' || name == FileOutputCommitter.SUCCEEDED_FILE_NAME || name == "_temporary"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ignoring any file starting with _ ? Hadoop (also) uses this convention, for things like the _SUCCESS file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, that would ignore the metadata file "_metadata" as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should rethink about why we use filterNot here? simple filter works fine here, something like:

val children = fs.listStatus(path).filter { status =>
  val name = status.getPath.getName
  name == ParquetFileWriter.PARQUET_METADATA_FILE || (name(0) != '.' && name(0) != '_')
}

so we can ignore all of hidden/tmp files without _metadata

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. Just remove .* and _* except _metadata.

Expand Down