Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Code style and readability improvements.
  • Loading branch information
JoshRosen committed Jun 4, 2016
commit b933fe08e61b529e77c0923d5b559102463a1ff4
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

package org.apache.spark.sql.catalyst

import com.google.common.collect.Maps

import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.types.{StructField, StructType}

Expand Down Expand Up @@ -94,12 +96,14 @@ package object expressions {

private lazy val inputArr = attrs.toArray

private lazy val inputToOrdinal = {
val map = new java.util.HashMap[ExprId, Int](inputArr.length * 2)
private lazy val exprIdToOrdinal = {
val arr = inputArr
val map = Maps.newHashMapWithExpectedSize[ExprId, Int](arr.length)
var index = 0
attrs.foreach { attr =>
if (!map.containsKey(attr.exprId)) {
map.put(attr.exprId, index)
while (index < arr.length) {
val exprId = arr(index).exprId
if (!map.containsKey(exprId)) {
map.put(exprId, index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to check containsKey before the put?

Copy link
Contributor Author

@JoshRosen JoshRosen Jun 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was being conservative here in order to match the behavior of the old linear scan, which stopped upon finding the first entry with a matching expression id. However, we can remove the need for this check if we iterate over arr in reverse-order.

}
index += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which style is better, this style or a style to use zipWithIndex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a minor perf. optimization since this method was in a hot path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, accessing a variable that's outside of the inner closure is likely to be expensive, too. It's probably fastest to just iterate over the elements of inputArr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I did not have no preference. Good to hear a reason for this decision.

}
Expand All @@ -109,7 +113,7 @@ package object expressions {
def apply(ordinal: Int): Attribute = inputArr(ordinal)

def getOrdinalWithExprId(exprId: ExprId): Int = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would indexOf be more clear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this originally and then moved to this name in anticipation of a future change which would add more "get index with property" methods, but a lot of those methods aren't cachable (e.g. semanticEquals), so I'll revert this back to my first name choice.

Option(inputToOrdinal.get(exprId)).getOrElse(-1)
Option(exprIdToOrdinal.get(exprId)).getOrElse(-1)
}
}

Expand Down