Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Keep a deterministic output order in Attribute.toSeq
  • Loading branch information
maropu committed Aug 16, 2017
commit 3201f0a4a6ee9421dbe50e6e3195a73d35d3ef20
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,12 @@ class AttributeSet private (val baseSet: Set[AttributeEquals])

// We must force toSeq to not be strict otherwise we end up with a [[Stream]] that captures all
// sorts of things in its closure.
override def toSeq: Seq[Attribute] = baseSet.map(_.a).toArray.toSeq
override def toSeq: Seq[Attribute] = {
// We need to keep a deterministic output order for `baseSet` because this affects a variable
// order in generated code (e.g., `GenerateColumnAccessor`).
// See SPARK-18394 for details.
baseSet.map(_.a).toArray.sortBy { a => (a.name, a.exprId.id) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it needs to be a Seq, then should the toArray be toSeq? maybe I missed way it has to be an array first

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, as you suggested, I initially did so. But, I just kept the original code cuz I was afraid this change wrongly affected the others. cc: @marmbrus

Copy link
Contributor

@hvanhovell hvanhovell Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought map should always return a strict collection. I think it is safe to sort immediately after that.

Copy link
Member Author

@maropu maropu Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll fix in that way. Thanks!

}

override def toString: String = "{" + baseSet.map(_.a).mkString(", ") + "}"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,4 +78,34 @@ class AttributeSetSuite extends SparkFunSuite {
assert(aSet == aSet)
assert(aSet == AttributeSet(aUpper :: Nil))
}

test("SPARK-18394 keep a deterministic output order along with attribute names") {
Copy link
Member

@gatorsmile gatorsmile Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify this test case. Add a scenario in which the attribute set has two columns with the same name but different ids?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

val attrSeqA = {
val attr1 = AttributeReference("c1", IntegerType)(exprId = ExprId(1098))
val attr2 = AttributeReference("c2", IntegerType)(exprId = ExprId(107))
val attr3 = AttributeReference("c3", IntegerType)(exprId = ExprId(838))
val attrSetA = AttributeSet(attr1 :: attr2 :: attr3 :: Nil)

val attr4 = AttributeReference("c4", IntegerType)(exprId = ExprId(389))
val attr5 = AttributeReference("c5", IntegerType)(exprId = ExprId(89329))

val attrSetB = AttributeSet(attr4 :: attr5 :: Nil)
(attrSetA ++ attrSetB).toSeq.map(_.name)
}

val attrSeqB = {
val attr1 = AttributeReference("c1", IntegerType)(exprId = ExprId(392))
val attr2 = AttributeReference("c2", IntegerType)(exprId = ExprId(92))
val attr3 = AttributeReference("c3", IntegerType)(exprId = ExprId(87))
val attrSetA = AttributeSet(attr1 :: attr2 :: attr3 :: Nil)

val attr4 = AttributeReference("c4", IntegerType)(exprId = ExprId(9023920))
val attr5 = AttributeReference("c5", IntegerType)(exprId = ExprId(522))
val attrSetB = AttributeSet(attr4 :: attr5 :: Nil)

(attrSetA ++ attrSetB).toSeq.map(_.name)
}

assert(attrSeqA === attrSeqB)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,8 @@ class PruningSuite extends HiveComparisonTest with BeforeAndAfter {
}.head

assert(actualOutputColumns === expectedOutputColumns, "Output columns mismatch")
assert(actualScannedColumns === expectedScannedColumns, "Scanned columns mismatch")
assert(actualScannedColumns.sorted === expectedScannedColumns.sorted,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to explain where we call AttributeSet.toSeq?

"Scanned columns mismatch")

val actualPartitions = actualPartValues.map(_.asScala.mkString(",")).sorted
val expectedPartitions = expectedPartValues.map(_.mkString(",")).sorted
Expand Down