Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixed more line length issues.
  • Loading branch information
tdas committed Oct 28, 2014
commit 20aa7c67078e72061fe79dd2d50916c398e46c5f
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
override def getPreferredLocations(split: Partition): Seq[String] = {
val partition = split.asInstanceOf[WriteAheadLogBackedBlockRDDPartition]
val blockLocations = getBlockIdLocations().get(partition.blockId)
lazy val segmentLocations = HdfsUtils.getBlockLocations(
lazy val segmentLocations = HdfsUtils.getFileSegmentLocations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice and creative use of lazy val, but really I think we should make it dead obvious what's happening here, rather than relying on the lazy semantics to not run getFileSegmentLocations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using def instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I agree about the lazy semantics. But then @JoshRosen suggestion is more intuitive to me. Something like

def blockLocations = getBlockIdLocations().get(partition.blockId)
def segmentLocations = HdfsUtils.getBlockLocations(...)
blockLocations.orElse(segmentLocations).getOrElse(Array.empty)

All the logical alternatives are clearly in one line. And it does not have verbose, almost redundant code as case Some(loc) => loc

partition.segment.path, partition.segment.offset, partition.segment.length, hadoopConfig)
blockLocations.orElse(segmentLocations).getOrElse(Seq.empty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not walking over my dead body type of thing, but I think declaring two inline functions for this, coupled with orElse / getOrElse is less intuitive to most people.

Maybe others can chime in here. @shivaram @pwendell

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its not very ideal as I think the most easy to understand is something like

if ( ) {
  blockLocations
} else if ( ) {
  segmentLocations
} else {
  Seq.empty
}

but this isnt too bad if the above isn't possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this discussion is moot because we should just let getFileSegmentLocations return Seq[String] rather than Option[Seq[String]], and then this should only consist of two branches, accomplishable with a single getOrElse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final version that I am doing then.

    val blockLocations = getBlockIdLocations().get(partition.blockId)
    def segmentLocations = HdfsUtils.getFileSegmentLocations(...)
    blockLocations.getOrElse(segmentLocations)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Once we make that change, I think both the getOrElse and the if..else solutions are equivalent - one is a scala way of doing things, and the other is the "traditional" way. The ones using def/lazy val is really a more scala way of doing it.

I have no preference for any one method, but would generally consider the overhead and performance incurred by each and I am not that much of an expert in scala to know.

}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,9 @@ private[streaming] object HdfsUtils {
}
}

def getBlockLocations(path: String, offset: Long, length: Long, conf: Configuration): Option[Seq[String]] = {
/** Get the locations of the HDFS blocks containing the given file segment. */
def getFileSegmentLocations(
path: String, offset: Long, length: Long, conf: Configuration): Option[Seq[String]] = {
val dfsPath = new Path(path)
val dfs = getFileSystemForPath(dfsPath, conf)
val fileStatus = dfs.getFileStatus(dfsPath)
Expand Down