Commit f6ff7d0
[SPARK-30127][SQL] Support case class parameter for typed Scala UDF
### What changes were proposed in this pull request?
To support case class parameter for typed Scala UDF, e.g.
```
case class TestData(key: Int, value: String)
val f = (d: TestData) => d.key * d.value.toInt
val myUdf = udf(f)
val df = Seq(("data", TestData(50, "2"))).toDF("col1", "col2")
checkAnswer(df.select(myUdf(Column("col2"))), Row(100) :: Nil)
```
### Why are the changes needed?
Currently, Spark UDF can only work on data types like java.lang.String, o.a.s.sql.Row, Seq[_], etc. This is inconvenient if user want to apply an operation on one column, and the column is struct type. You must access data from a Row object, instead of domain object like Dataset operations. It will be great if UDF can work on types that are supported by Dataset, e.g. case class.
And here's benchmark result of using case class comparing to row:
```scala
// case class: 58ms 65ms 59ms 64ms 61ms
// row: 59ms 64ms 73ms 84ms 69ms
val f1 = (d: TestData) => s"${d.key}, ${d.value}"
val f2 = (r: Row) => s"${r.getInt(0)}, ${r.getString(1)}"
val udf1 = udf(f1)
// set spark.sql.legacy.allowUntypedScalaUDF=true
val udf2 = udf(f2, StringType)
val df = spark.range(100000).selectExpr("cast (id as int) as id")
.select(struct('id, lit("str")).as("col"))
df.cache().collect()
// warmup to exclude some extra influence
df.select(udf1('col)).write.mode(SaveMode.Overwrite).format("noop").save()
df.select(udf2('col)).write.mode(SaveMode.Overwrite).format("noop").save()
start = System.currentTimeMillis()
df.select(udf1('col)).write.mode(SaveMode.Overwrite).format("noop").save()
println(System.currentTimeMillis() - start)
start = System.currentTimeMillis()
df.select(udf2('col)).write.mode(SaveMode.Overwrite).format("noop").save()
println(System.currentTimeMillis() - start)
```
### Does this PR introduce any user-facing change?
Yes. User now could be able to use typed Scala UDF with case class as input parameter.
### How was this patch tested?
Added unit tests.
Closes #27937 from Ngone51/udf_caseclass_support.
Authored-by: yi.wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent 1fd4607 commit f6ff7d0
File tree
12 files changed
+519
-410
lines changed- sql
- catalyst/src
- main/scala/org/apache/spark/sql/catalyst
- analysis
- expressions
- test/scala/org/apache/spark/sql/catalyst
- analysis
- expressions
- optimizer
- trees
- core/src
- main/scala/org/apache/spark/sql
- execution/datasources
- expressions
- test/scala/org/apache/spark/sql
12 files changed
+519
-410
lines changedLines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2707 | 2707 | | |
2708 | 2708 | | |
2709 | 2709 | | |
2710 | | - | |
2711 | | - | |
| 2710 | + | |
| 2711 | + | |
2712 | 2712 | | |
2713 | 2713 | | |
2714 | | - | |
| 2714 | + | |
2715 | 2715 | | |
2716 | | - | |
| 2716 | + | |
2717 | 2717 | | |
2718 | 2718 | | |
2719 | 2719 | | |
| |||
Lines changed: 327 additions & 268 deletions
Large diffs are not rendered by default.
Lines changed: 12 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
326 | 327 | | |
327 | 328 | | |
328 | 329 | | |
329 | | - | |
| 330 | + | |
| 331 | + | |
330 | 332 | | |
331 | 333 | | |
332 | 334 | | |
333 | 335 | | |
334 | 336 | | |
335 | | - | |
| 337 | + | |
336 | 338 | | |
337 | 339 | | |
338 | 340 | | |
339 | 341 | | |
340 | 342 | | |
341 | 343 | | |
342 | | - | |
| 344 | + | |
343 | 345 | | |
344 | 346 | | |
345 | 347 | | |
| |||
351 | 353 | | |
352 | 354 | | |
353 | 355 | | |
354 | | - | |
| 356 | + | |
355 | 357 | | |
356 | 358 | | |
357 | 359 | | |
| |||
362 | 364 | | |
363 | 365 | | |
364 | 366 | | |
365 | | - | |
366 | | - | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
367 | 373 | | |
368 | 374 | | |
369 | 375 | | |
| |||
Lines changed: 12 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
30 | | - | |
| 31 | + | |
| 32 | + | |
31 | 33 | | |
32 | 34 | | |
33 | | - | |
| 35 | + | |
| 36 | + | |
34 | 37 | | |
35 | 38 | | |
36 | 39 | | |
| |||
39 | 42 | | |
40 | 43 | | |
41 | 44 | | |
42 | | - | |
| 45 | + | |
43 | 46 | | |
44 | 47 | | |
45 | 48 | | |
| |||
52 | 55 | | |
53 | 56 | | |
54 | 57 | | |
55 | | - | |
| 58 | + | |
| 59 | + | |
56 | 60 | | |
57 | 61 | | |
58 | 62 | | |
| |||
61 | 65 | | |
62 | 66 | | |
63 | 67 | | |
64 | | - | |
| 68 | + | |
| 69 | + | |
65 | 70 | | |
66 | 71 | | |
67 | 72 | | |
| |||
73 | 78 | | |
74 | 79 | | |
75 | 80 | | |
76 | | - | |
| 81 | + | |
| 82 | + | |
77 | 83 | | |
78 | 84 | | |
79 | 85 | | |
| |||
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
244 | 245 | | |
245 | 246 | | |
246 | 247 | | |
247 | | - | |
| 248 | + | |
| 249 | + | |
248 | 250 | | |
249 | 251 | | |
250 | 252 | | |
| |||
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| |||
594 | 595 | | |
595 | 596 | | |
596 | 597 | | |
597 | | - | |
| 598 | + | |
| 599 | + | |
598 | 600 | | |
599 | 601 | | |
600 | 602 | | |
| |||
0 commit comments