Commit 59993a8
[SPARK-26122][SQL] Support encoding for multiLine in CSV datasource
## What changes were proposed in this pull request?
In the PR, I propose to pass the CSV option `encoding`/`charset` to `uniVocity` parser to allow parsing CSV files in different encodings when `multiLine` is enabled. The value of the option is passed to the `beginParsing` method of `CSVParser`.
## How was this patch tested?
Added new test to `CSVSuite` for different encodings and enabled/disabled header.
Closes apache#23091 from MaxGekk/csv-miltiline-encoding.
Authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>1 parent 407c30d commit 59993a8
File tree
3 files changed
+32
-7
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv
- core/src
- main/scala/org/apache/spark/sql/execution/datasources/csv
- test/scala/org/apache/spark/sql/execution/datasources/csv
3 files changed
+32
-7
lines changedLines changed: 7 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
271 | 271 | | |
272 | 272 | | |
273 | 273 | | |
274 | | - | |
| 274 | + | |
| 275 | + | |
275 | 276 | | |
276 | 277 | | |
277 | 278 | | |
278 | | - | |
| 279 | + | |
279 | 280 | | |
280 | 281 | | |
281 | 282 | | |
| |||
297 | 298 | | |
298 | 299 | | |
299 | 300 | | |
300 | | - | |
| 301 | + | |
301 | 302 | | |
302 | 303 | | |
303 | 304 | | |
304 | 305 | | |
305 | 306 | | |
306 | 307 | | |
307 | 308 | | |
308 | | - | |
| 309 | + | |
| 310 | + | |
309 | 311 | | |
310 | | - | |
| 312 | + | |
311 | 313 | | |
312 | 314 | | |
313 | 315 | | |
| |||
Lines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
192 | 192 | | |
193 | 193 | | |
194 | 194 | | |
195 | | - | |
| 195 | + | |
| 196 | + | |
196 | 197 | | |
197 | 198 | | |
198 | 199 | | |
| |||
203 | 204 | | |
204 | 205 | | |
205 | 206 | | |
206 | | - | |
| 207 | + | |
| 208 | + | |
207 | 209 | | |
208 | 210 | | |
209 | 211 | | |
| |||
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1859 | 1859 | | |
1860 | 1860 | | |
1861 | 1861 | | |
| 1862 | + | |
| 1863 | + | |
| 1864 | + | |
| 1865 | + | |
| 1866 | + | |
| 1867 | + | |
| 1868 | + | |
| 1869 | + | |
| 1870 | + | |
| 1871 | + | |
| 1872 | + | |
| 1873 | + | |
| 1874 | + | |
| 1875 | + | |
| 1876 | + | |
| 1877 | + | |
| 1878 | + | |
| 1879 | + | |
| 1880 | + | |
| 1881 | + | |
| 1882 | + | |
1862 | 1883 | | |
0 commit comments