-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData #20984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #88942 has finished for PR 20984 at commit
|
| case _ => (idx: Int) => arrayData.get(idx, dataType) | ||
| } | ||
|
|
||
| override def apply(idx: Int): T = if (idx < arrayData.numElements()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a check 0 <= idx, too? If so, it would be good to update a message in the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added the check now.
|
Test build #89059 has finished for PR 20984 at commit
|
|
retest this please |
|
Test build #89057 has finished for PR 20984 at commit
|
|
Test build #89068 has finished for PR 20984 at commit
|
|
cc @hvanhovell |
|
LGTM |
|
|
||
| override def apply(idx: Int): T = if (0 <= idx && idx < arrayData.numElements()) { | ||
| if (arrayData.isNullAt(idx)) { | ||
| null.asInstanceOf[T] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For primitive ArrayData, this seems to be an issue for null elements, if the caller asks for a primitive type sequence like Seq[Int].
| val unsafeArrayData = ExpressionEncoder[Array[String]].resolveAndBind(). | ||
| toRow(stringArray).getArray(0) | ||
| assert(unsafeArrayData.isInstanceOf[UnsafeArrayData]) | ||
| testArrayData(unsafeArrayData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test may need polish to test again all possible types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for all type test case. :)
|
Thanks @kiszk |
|
|
||
| private val accessor: (Int) => Any = getAccessor(dataType) | ||
|
|
||
| override def apply(idx: Int): T = if (0 <= idx && idx < arrayData.numElements()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NITish: Can you put the if statement on a separate line? This is kinda hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
| */ | ||
| class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) extends IndexedSeq[T] { | ||
|
|
||
| private def getAccessor(dataType: DataType): (Int) => Any = dataType match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for a small follow-up: We could also use the accessor you create here to improve the foreach construct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I will also want to reuse the accessor getter in #20981 too.
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - pending jenkins
|
Test build #89105 has finished for PR 20984 at commit
|
|
Test build #89103 has finished for PR 20984 at commit
|
|
retest this please. |
|
Test build #89122 has finished for PR 20984 at commit
|
|
retest this please |
|
Test build #89139 has finished for PR 20984 at commit
|
|
retest this please |
|
Test build #89152 has finished for PR 20984 at commit
|
|
retest this please. |
|
Test build #89151 has finished for PR 20984 at commit
|
|
Test build #89158 has finished for PR 20984 at commit
|
|
ping @hvanhovell |
| case BooleanType => (idx: Int) => arrayData.getBoolean(idx) | ||
| case ByteType => (idx: Int) => arrayData.getByte(idx) | ||
| case ShortType => (idx: Int) => arrayData.getShort(idx) | ||
| case IntegerType => (idx: Int) => arrayData.getInt(idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DateType and TimestampType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to reuse the access getter in #20981 which covers DateType and TimestampType.
|
Test build #89425 has finished for PR 20984 at commit
|
|
Test build #89428 has finished for PR 20984 at commit
|
|
Test build #89429 has finished for PR 20984 at commit
|
|
retest this please. |
|
Test build #89434 has finished for PR 20984 at commit
|
|
retest this please. |
|
Test build #89443 has finished for PR 20984 at commit
|
|
Merging to master. Thanks! |
|
Thanks! @hvanhovell. I will create a small follow-up based on the comment at #20984 (comment). |
What changes were proposed in this pull request?
We don't have a good way to sequentially access
UnsafeArrayDatawith a common interface such asSeq. An example isMapObjectwhere we need to access several sequence collection types together. ButUnsafeArrayDatadoesn't implementArrayData.array. CallingtoArraywill copy the entire array. We can provide anIndexedSeqwrapper forArrayData, so we can avoid copying the entire array.How was this patch tested?
Added test.