Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
4062cda
Preparing Spark release v1.6.0-rc4
pwendell Dec 22, 2015
5b19e7c
Preparing development version 1.6.0-SNAPSHOT
pwendell Dec 22, 2015
309ef35
[MINOR] Fix typos in JavaStreamingContext
zsxwing Dec 22, 2015
0f905d7
[SPARK-11823][SQL] Fix flaky JDBC cancellation test in HiveThriftBina…
JoshRosen Dec 22, 2015
94fb5e8
[SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler
zsxwing Dec 22, 2015
942c057
[SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example f…
zsxwing Dec 23, 2015
c6c9bf9
[SPARK-12477][SQL] - Tungsten projection fails for null values in arr…
Dec 23, 2015
5987b16
[SPARK-12499][BUILD] don't force MAVEN_OPTS
abridgett Dec 24, 2015
b49856a
[SPARK-12411][CORE] Decrease executor heartbeat timeout to match hear…
nongli Dec 19, 2015
4dd8712
[SPARK-12502][BUILD][PYTHON] Script /dev/run-tests fails when IBM Jav…
kiszk Dec 24, 2015
865dd8b
[SPARK-12010][SQL] Spark JDBC requires support for column-name-free I…
CK50 Dec 24, 2015
b8da77e
[SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equ…
gatorsmile Dec 28, 2015
1fbcb6e
[SPARK-12517] add default RDD name for one created via sc.textFile
wyaron Dec 28, 2015
7c7d76f
[SPARK-12424][ML] The implementation of ParamMap#filter is wrong.
sarutak Dec 28, 2015
a9c52d4
[SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer t…
adrian-wang Dec 28, 2015
fd20248
[SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugs
zsxwing Dec 28, 2015
d545dfe
Merge branch 'branch-1.6' of github.com:apache/spark into csd-1.6
markhamstra Dec 29, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-12477][SQL] - Tungsten projection fails for null values in arr…
…ay fields

Accessing null elements in an array field fails when tungsten is enabled.
It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled.

This PR solves this by checking if the accessed element in the array field is null, in the generated code.

Example:
```
// Array of String
case class AS( as: Seq[String] )
val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
dfAS.registerTempTable("T_AS")
for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))}
```

With Tungsten disabled:
```
0 = [a]
1 = [null]
2 = [b]
```

With Tungsten enabled:
```
0 = [a]
15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 15)
java.lang.NullPointerException
	at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90)
	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
```

Author: pierre-borckmans <pierre.borckmans@realimpactanalytics.com>

Closes apache#10429 from pierre-borckmans/SPARK-12477_Tungsten-Projection-Null-Element-In-Array.

(cherry picked from commit 43b2a63)
Signed-off-by: Reynold Xin <rxin@databricks.com>
  • Loading branch information
pierre-borckmans authored and rxin committed Dec 23, 2015
commit c6c9bf99af0ee0559248ad772460e9b2efde5861
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ case class GetArrayItem(child: Expression, ordinal: Expression)
nullSafeCodeGen(ctx, ev, (eval1, eval2) => {
s"""
final int index = (int) $eval2;
if (index >= $eval1.numElements() || index < 0) {
if (index >= $eval1.numElements() || index < 0 || $eval1.isNullAt(index)) {
${ev.isNull} = true;
} else {
${ev.value} = ${ctx.getValue(eval1, dataType, "index")};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,13 @@ class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext {
val df = sparkContext.parallelize(Seq((1, 1))).toDF("a", "b")
df.select(array($"a").as("s")).select(f(expr("s[0]"))).collect()
}

test("SPARK-12477 accessing null element in array field") {
val df = sparkContext.parallelize(Seq((Seq("val1", null, "val2"),
Seq(Some(1), None, Some(2))))).toDF("s", "i")
val nullStringRow = df.selectExpr("s[1]").collect()(0)
assert(nullStringRow == org.apache.spark.sql.Row(null))
val nullIntRow = df.selectExpr("i[1]").collect()(0)
assert(nullIntRow == org.apache.spark.sql.Row(null))
}
}