Commit e651900
[SPARK-16344][SQL] Decoding Parquet array of struct with a single field named "element"
## What changes were proposed in this pull request?
Due to backward-compatibility reasons, the following Parquet schema is ambiguous:
```
optional group f (LIST) {
repeated group list {
optional group element {
optional int32 element;
}
}
}
```
According to the parquet-format spec, when interpreted as a standard 3-level layout, this type is equivalent to the following SQL type:
```
ARRAY<STRUCT<element: INT>>
```
However, when interpreted as a legacy 2-level layout, it's equivalent to
```
ARRAY<STRUCT<element: STRUCT<element: INT>>>
```
Historically, to disambiguate these cases, we employed two methods:
- `ParquetSchemaConverter.isElementType()`
Used to disambiguate the above cases while converting Parquet types to Spark types.
- `ParquetRowConverter.isElementType()`
Used to disambiguate the above cases while instantiating row converters that convert Parquet records to Spark rows.
Unfortunately, these two methods make different decision about the above problematic Parquet type, and caused SPARK-16344.
`ParquetRowConverter.isElementType()` is necessary for Spark 1.4 and earlier versions because Parquet requested schemata are directly converted from Spark schemata in these versions. The converted Parquet schemata may be incompatible with actual schemata of the underlying physical files when the files are written by a system/library that uses a schema conversion scheme that is different from Spark when writing Parquet LIST and MAP fields.
In Spark 1.5, Parquet requested schemata are always properly tailored from schemata of physical files to be read. Thus `ParquetRowConverter.isElementType()` is no longer necessary. This PR replaces this method with a simply yet accurate scheme: whenever an ambiguous Parquet type is hit, convert the type in question back to a Spark type using `ParquetSchemaConverter` and check whether it matches the corresponding Spark type.
## How was this patch tested?
New test cases added in `ParquetHiveCompatibilitySuite` and `ParquetQuerySuite`.
Author: Cheng Lian <lian@databricks.com>
Closes apache#14014 from liancheng/spark-16344-for-master-and-2.0.1 parent e3cd5b3 commit e651900
File tree
6 files changed
+73
-47
lines changed- sql
- core/src
- main/scala/org/apache/spark/sql/execution/datasources/parquet
- test/scala/org/apache/spark/sql/execution/datasources/parquet
- hive/src/test/scala/org/apache/spark/sql/hive
6 files changed
+73
-47
lines changedLines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
99 | | - | |
| 99 | + | |
| 100 | + | |
100 | 101 | | |
101 | 102 | | |
102 | 103 | | |
| |||
Lines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | | - | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | | - | |
| 37 | + | |
| 38 | + | |
37 | 39 | | |
38 | 40 | | |
39 | 41 | | |
| |||
Lines changed: 42 additions & 37 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
| 28 | + | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| 116 | + | |
116 | 117 | | |
117 | 118 | | |
118 | 119 | | |
119 | 120 | | |
120 | 121 | | |
121 | 122 | | |
| 123 | + | |
122 | 124 | | |
123 | 125 | | |
124 | 126 | | |
| |||
292 | 294 | | |
293 | 295 | | |
294 | 296 | | |
295 | | - | |
296 | | - | |
297 | | - | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
298 | 301 | | |
299 | 302 | | |
300 | 303 | | |
| |||
442 | 445 | | |
443 | 446 | | |
444 | 447 | | |
445 | | - | |
446 | 448 | | |
447 | | - | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
448 | 481 | | |
449 | 482 | | |
450 | 483 | | |
451 | 484 | | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
452 | 488 | | |
453 | 489 | | |
454 | 490 | | |
| |||
462 | 498 | | |
463 | 499 | | |
464 | 500 | | |
465 | | - | |
466 | | - | |
467 | | - | |
468 | | - | |
469 | | - | |
470 | | - | |
471 | | - | |
472 | | - | |
473 | | - | |
474 | | - | |
475 | | - | |
476 | | - | |
477 | | - | |
478 | | - | |
479 | | - | |
480 | | - | |
481 | | - | |
482 | | - | |
483 | | - | |
484 | | - | |
485 | | - | |
486 | | - | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
491 | | - | |
492 | | - | |
493 | | - | |
494 | | - | |
495 | | - | |
496 | 501 | | |
497 | 502 | | |
498 | 503 | | |
| |||
Lines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
260 | 260 | | |
261 | 261 | | |
262 | 262 | | |
263 | | - | |
| 263 | + | |
264 | 264 | | |
265 | 265 | | |
266 | 266 | | |
| |||
270 | 270 | | |
271 | 271 | | |
272 | 272 | | |
273 | | - | |
| 273 | + | |
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
| |||
282 | 282 | | |
283 | 283 | | |
284 | 284 | | |
285 | | - | |
| 285 | + | |
286 | 286 | | |
287 | 287 | | |
288 | 288 | | |
| |||
293 | 293 | | |
294 | 294 | | |
295 | 295 | | |
296 | | - | |
| 296 | + | |
297 | 297 | | |
298 | 298 | | |
299 | 299 | | |
| |||
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
Lines changed: 15 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| |||
668 | 668 | | |
669 | 669 | | |
670 | 670 | | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
671 | 683 | | |
672 | 684 | | |
673 | 685 | | |
| 686 | + | |
| 687 | + | |
674 | 688 | | |
675 | 689 | | |
676 | 690 | | |
| |||
Lines changed: 6 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
24 | 22 | | |
25 | 23 | | |
26 | 24 | | |
| |||
137 | 135 | | |
138 | 136 | | |
139 | 137 | | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
140 | 144 | | |
0 commit comments