Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Apr 5, 2018

What changes were proposed in this pull request?

We don't have a good way to sequentially access UnsafeArrayData with a common interface such as Seq. An example is MapObject where we need to access several sequence collection types together. But UnsafeArrayData doesn't implement ArrayData.array. Calling toArray will copy the entire array. We can provide an IndexedSeq wrapper for ArrayData, so we can avoid copying the entire array.

How was this patch tested?

Added test.

@SparkQA
Copy link

SparkQA commented Apr 5, 2018

Test build #88942 has finished for PR 20984 at commit 27eddd7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) extends IndexedSeq[T]

case _ => (idx: Int) => arrayData.get(idx, dataType)
}

override def apply(idx: Int): T = if (idx < arrayData.numElements()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a check 0 <= idx, too? If so, it would be good to update a message in the exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added the check now.

@SparkQA
Copy link

SparkQA commented Apr 9, 2018

Test build #89059 has finished for PR 20984 at commit 962d048.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Apr 9, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 9, 2018

Test build #89057 has finished for PR 20984 at commit 4f9752f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 9, 2018

Test build #89068 has finished for PR 20984 at commit 962d048.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 10, 2018

cc @hvanhovell

@kiszk
Copy link
Member

kiszk commented Apr 10, 2018

LGTM


override def apply(idx: Int): T = if (0 <= idx && idx < arrayData.numElements()) {
if (arrayData.isNullAt(idx)) {
null.asInstanceOf[T]
Copy link
Member Author

@viirya viirya Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For primitive ArrayData, this seems to be an issue for null elements, if the caller asks for a primitive type sequence like Seq[Int].

val unsafeArrayData = ExpressionEncoder[Array[String]].resolveAndBind().
toRow(stringArray).getArray(0)
assert(unsafeArrayData.isInstanceOf[UnsafeArrayData])
testArrayData(unsafeArrayData)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test may need polish to test again all possible types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for all type test case. :)

@viirya
Copy link
Member Author

viirya commented Apr 10, 2018

Thanks @kiszk


private val accessor: (Int) => Any = getAccessor(dataType)

override def apply(idx: Int): T = if (0 <= idx && idx < arrayData.numElements()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NITish: Can you put the if statement on a separate line? This is kinda hard to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

*/
class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) extends IndexedSeq[T] {

private def getAccessor(dataType: DataType): (Int) => Any = dataType match {
Copy link
Contributor

@hvanhovell hvanhovell Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for a small follow-up: We could also use the accessor you create here to improve the foreach construct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will also want to reuse the accessor getter in #20981 too.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - pending jenkins

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89105 has finished for PR 20984 at commit a77128f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89103 has finished for PR 20984 at commit ac8d5b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 10, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89122 has finished for PR 20984 at commit a77128f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89139 has finished for PR 20984 at commit a77128f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89152 has finished for PR 20984 at commit a77128f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 10, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #89151 has finished for PR 20984 at commit a77128f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2018

Test build #89158 has finished for PR 20984 at commit a77128f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 12, 2018

ping @hvanhovell

case BooleanType => (idx: Int) => arrayData.getBoolean(idx)
case ByteType => (idx: Int) => arrayData.getByte(idx)
case ShortType => (idx: Int) => arrayData.getShort(idx)
case IntegerType => (idx: Int) => arrayData.getInt(idx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DateType and TimestampType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to reuse the access getter in #20981 which covers DateType and TimestampType.

@SparkQA
Copy link

SparkQA commented Apr 17, 2018

Test build #89425 has finished for PR 20984 at commit 77640c4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 17, 2018

Test build #89428 has finished for PR 20984 at commit 6e8a697.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 17, 2018

Test build #89429 has finished for PR 20984 at commit 8b5de0f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 17, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 17, 2018

Test build #89434 has finished for PR 20984 at commit 8b5de0f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 17, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 17, 2018

Test build #89443 has finished for PR 20984 at commit 8b5de0f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

Merging to master. Thanks!

@asfgit asfgit closed this in 30ffb53 Apr 17, 2018
@viirya
Copy link
Member Author

viirya commented Apr 17, 2018

Thanks! @hvanhovell. I will create a small follow-up based on the comment at #20984 (comment).

@viirya viirya deleted the SPARK-23875 branch December 27, 2023 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants