Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 19, 2018

What changes were proposed in this pull request?

In the PR, I propose to pass the CSV option encoding/charset to uniVocity parser to allow parsing CSV files in different encodings when multiLine is enabled. The value of the option is passed to the beginParsing method of CSVParser.

How was this patch tested?

Added new test to CSVSuite for different encodings and enabled/disabled header.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 19, 2018

@HyukjinKwon Please, take a look at the PR.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the tests pass. I already tested this locally a while ago.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Nov 20, 2018

FYI hey @priancho IIRC, you proposed a similar change before in the mailing list. I wasn't positive about that because I was thinking we should deprecate encoding option at that time. It had a long long discussion and we're going to support this.

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99021 has finished for PR 23091 at commit 16eb14c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 2df34db Nov 21, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

In the PR, I propose to pass the CSV option `encoding`/`charset` to `uniVocity` parser to allow parsing CSV files in different encodings when `multiLine` is enabled. The value of the option is passed to the `beginParsing` method of `CSVParser`.

## How was this patch tested?

Added new test to `CSVSuite` for different encodings and enabled/disabled header.

Closes apache#23091 from MaxGekk/csv-miltiline-encoding.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
@MaxGekk MaxGekk deleted the csv-miltiline-encoding branch August 17, 2019 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants