Commit 5a70fa8
[SPARK-31361][SQL] Rebase datetime in parquet/avro according to file metadata
### What changes were proposed in this pull request?
This PR adds a new parquet/avro file metadata: `org.apache.spark.legacyDatetime`. It indicates that the file was written with the "rebaseInWrite" config enabled, and spark need to do rebase when reading it.
This makes Spark be able to do rebase more smartly:
1. If we don't know which Spark version writes the file, do rebase if the "rebaseInRead" config is true.
2. If the file was written by Spark 2.4 and earlier, then do rebase.
3. If the file was written by Spark 3.0 and later, do rebase if the `org.apache.spark.legacyDatetime` exists in file metadata.
### Why are the changes needed?
It's very easy to have mixed-calendar parquet/avro files: e.g. A user upgrades to Spark 3.0 and writes some parquet files to an existing directory. Then he realizes that the directory contains legacy datetime values before 1582. However, it's too late and he has to find out all the legacy files manually and read them separately.
To support mixed-calendar parquet/avro files, we need to decide to rebase or not based on the file metadata.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Updated test
Closes #28137 from cloud-fan/datetime.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit a5ebbac)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>1 parent 756e85e commit 5a70fa8
File tree
21 files changed
+299
-105
lines changed- external/avro/src
- main/scala/org/apache/spark/sql
- avro
- v2/avro
- test/scala/org/apache/spark/sql/avro
- sql/core/src
- main
- java/org/apache/spark/sql/execution/datasources/parquet
- scala/org/apache/spark/sql
- execution/datasources
- parquet
- v2/parquet
- test/scala/org/apache/spark/sql/execution
- benchmark
- datasources/parquet
21 files changed
+299
-105
lines changedLines changed: 7 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
45 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
46 | 50 | | |
47 | | - | |
48 | | - | |
49 | | - | |
| 51 | + | |
50 | 52 | | |
51 | 53 | | |
52 | 54 | | |
| |||
Lines changed: 8 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
| 38 | + | |
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| |||
123 | 124 | | |
124 | 125 | | |
125 | 126 | | |
126 | | - | |
127 | | - | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
128 | 133 | | |
129 | 134 | | |
130 | 135 | | |
| |||
Lines changed: 13 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
33 | 33 | | |
34 | 34 | | |
| 35 | + | |
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
| |||
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
44 | 49 | | |
45 | | - | |
| 50 | + | |
| 51 | + | |
46 | 52 | | |
47 | 53 | | |
48 | 54 | | |
49 | 55 | | |
50 | 56 | | |
51 | 57 | | |
52 | | - | |
53 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
54 | 63 | | |
55 | 64 | | |
56 | 65 | | |
| |||
Lines changed: 9 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
46 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
47 | 50 | | |
48 | | - | |
49 | | - | |
50 | | - | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
51 | 55 | | |
52 | 56 | | |
53 | 57 | | |
| |||
Lines changed: 7 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | | - | |
92 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
93 | 97 | | |
94 | 98 | | |
95 | 99 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
288 | 288 | | |
289 | 289 | | |
290 | 290 | | |
291 | | - | |
| 291 | + | |
292 | 292 | | |
293 | 293 | | |
294 | 294 | | |
| |||
Lines changed: 80 additions & 23 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
88 | | - | |
89 | | - | |
| 87 | + | |
| 88 | + | |
90 | 89 | | |
91 | 90 | | |
92 | 91 | | |
| |||
1530 | 1529 | | |
1531 | 1530 | | |
1532 | 1531 | | |
| 1532 | + | |
| 1533 | + | |
| 1534 | + | |
| 1535 | + | |
| 1536 | + | |
| 1537 | + | |
| 1538 | + | |
| 1539 | + | |
| 1540 | + | |
| 1541 | + | |
| 1542 | + | |
| 1543 | + | |
| 1544 | + | |
| 1545 | + | |
| 1546 | + | |
| 1547 | + | |
| 1548 | + | |
| 1549 | + | |
| 1550 | + | |
| 1551 | + | |
| 1552 | + | |
| 1553 | + | |
| 1554 | + | |
| 1555 | + | |
| 1556 | + | |
| 1557 | + | |
| 1558 | + | |
| 1559 | + | |
| 1560 | + | |
| 1561 | + | |
| 1562 | + | |
| 1563 | + | |
| 1564 | + | |
| 1565 | + | |
| 1566 | + | |
| 1567 | + | |
| 1568 | + | |
| 1569 | + | |
1533 | 1570 | | |
1534 | | - | |
1535 | | - | |
1536 | | - | |
1537 | | - | |
1538 | | - | |
1539 | | - | |
1540 | | - | |
1541 | | - | |
1542 | | - | |
| 1571 | + | |
| 1572 | + | |
| 1573 | + | |
| 1574 | + | |
| 1575 | + | |
1543 | 1576 | | |
1544 | 1577 | | |
1545 | 1578 | | |
| |||
1554 | 1587 | | |
1555 | 1588 | | |
1556 | 1589 | | |
1557 | | - | |
1558 | | - | |
| 1590 | + | |
| 1591 | + | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
1559 | 1597 | | |
1560 | | - | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
1561 | 1602 | | |
1562 | 1603 | | |
1563 | 1604 | | |
| |||
1589 | 1630 | | |
1590 | 1631 | | |
1591 | 1632 | | |
1592 | | - | |
1593 | | - | |
1594 | | - | |
1595 | | - | |
| 1633 | + | |
| 1634 | + | |
| 1635 | + | |
| 1636 | + | |
| 1637 | + | |
| 1638 | + | |
| 1639 | + | |
| 1640 | + | |
| 1641 | + | |
1596 | 1642 | | |
1597 | | - | |
| 1643 | + | |
| 1644 | + | |
| 1645 | + | |
| 1646 | + | |
1598 | 1647 | | |
1599 | 1648 | | |
1600 | 1649 | | |
| |||
1612 | 1661 | | |
1613 | 1662 | | |
1614 | 1663 | | |
1615 | | - | |
1616 | | - | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
1617 | 1671 | | |
1618 | | - | |
| 1672 | + | |
| 1673 | + | |
| 1674 | + | |
| 1675 | + | |
1619 | 1676 | | |
1620 | 1677 | | |
1621 | 1678 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
42 | 41 | | |
43 | 42 | | |
44 | 43 | | |
| |||
109 | 108 | | |
110 | 109 | | |
111 | 110 | | |
112 | | - | |
| 111 | + | |
| 112 | + | |
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
135 | | - | |
| 135 | + | |
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
| |||
Lines changed: 15 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | | - | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
90 | 95 | | |
91 | 96 | | |
92 | 97 | | |
| |||
116 | 121 | | |
117 | 122 | | |
118 | 123 | | |
119 | | - | |
| 124 | + | |
| 125 | + | |
120 | 126 | | |
| 127 | + | |
121 | 128 | | |
122 | 129 | | |
123 | 130 | | |
124 | 131 | | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
125 | 137 | | |
126 | 138 | | |
127 | 139 | | |
| |||
309 | 321 | | |
310 | 322 | | |
311 | 323 | | |
312 | | - | |
| 324 | + | |
313 | 325 | | |
314 | 326 | | |
315 | 327 | | |
| |||
Lines changed: 17 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
| 26 | + | |
25 | 27 | | |
| 28 | + | |
26 | 29 | | |
27 | 30 | | |
28 | 31 | | |
| |||
64 | 67 | | |
65 | 68 | | |
66 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
67 | 84 | | |
0 commit comments