Skip to content
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -361,10 +361,18 @@ case class JsonTuple(children: Seq[Expression])
// the fields to query are the remaining children
@transient private lazy val fieldExpressions: Seq[Expression] = children.tail

// a field name given with constant null will be replaced with this pseudo field name
private val nullFieldName = "__NullFieldName"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A field name given with constant null will be replaced with this pseudo field name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmchung, could we maybe compute this foldable related optimization ahead -
https://github.com/jmchung/spark/blob/ffa575a6731fef3e0731b73e0f7311cb024e831b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L425-L439 and remove this fake field name?

I think we can make a function for the above codes first and then use it for computation for each row. Did I understand correctly?

I tried a rough version I thought - https://github.com/jmchung/spark/compare/SPARK-21677...HyukjinKwon:tmp-18930?expand=1, @viirya what do you think about this?

Copy link
Member

@viirya viirya Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've also considered using Option here. But don't want to come out Option version from me first, so we can experience review process. It looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon @viirya Yep, we've discarded the fake field name and use Option here. We made a slight revision to deal with the None in foldableFieldNames instead of creating a new function.


// eagerly evaluate any foldable the field names
@transient private lazy val foldableFieldNames: IndexedSeq[String] = {
fieldExpressions.map {
case expr if expr.foldable => expr.eval().asInstanceOf[UTF8String].toString
case expr if expr.foldable =>
if (expr.eval() == null) {
nullFieldName
} else {
expr.eval().asInstanceOf[UTF8String].toString
}
case _ => null
}.toIndexedSeq
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2034,4 +2034,13 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
}
}
}

test("SPARK-21677: json_tuple throws NullPointException when column is null as string type") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to spark/sql/core/src/test/resources/sql-tests/inputs/json-functions.sql and/or spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an end-to-end test case. We also need to add unit test cases in JsonExpressionsSuite

Copy link
Member

@viirya viirya Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end-to-end test at L2047 may not be able to move to JsonExpressionsSuite. We can have some unit test cases similar to L2039 in JsonExpressionsSuite as @gatorsmile suggested.

It is also good to have this end-to-end tests in json-functions.sql.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile has added unit test case in JsonExpressionsSuite
@viirya also add end-to-end test in json-functions.sql

checkAnswer(sql(
"""
|SELECT json_tuple('{"a" : 1, "b" : 2}'
|, cast(NULL AS STRING), 'b'
|, cast(NULL AS STRING), 'a')
""".stripMargin), Row(null, "2", null, "1"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmchung Can we also add the test we discussed in slack which mixes constant field name and non constant one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya Done, the added test case contains column name, constant field name, and null field name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move Row(null, "2", null, "1")) to the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

}
}