Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Jun 21, 2016

What changes were proposed in this pull request?

This is a follow-up to #13795 to properly set CSV options in Python API. As part of this, I also make the Python option setting for both CSV and JSON more robust against positional errors.

How was this patch tested?

N/A

@rxin
Copy link
Contributor Author

rxin commented Jun 21, 2016

cc @felixcheung

allowComments, allowUnquotedFieldNames, allowSingleQuotes,
allowNumericLeadingZero, allowBackslashEscapingAnyCharacter,
mode, columnNameOfCorruptRecord)
self._set_json_opts(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tdas

previously these options were too susceptible to positional change in the arg list.

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #60907 has finished for PR 13800 at commit 1499753.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #3120 has finished for PR 13800 at commit 1499753.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Jun 21, 2016

cc @davies

any idea why this would fail python tests?

@mengxr
Copy link
Contributor

mengxr commented Jun 21, 2016

Maybe #13793 broke master. It was sent to branch-1.6 but merged into master and branch-2.0.

@mengxr
Copy link
Contributor

mengxr commented Jun 21, 2016

test this please

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #60914 has finished for PR 13800 at commit 1499753.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if columnNameOfCorruptRecord is not None:
self.option("columnNameOfCorruptRecord", columnNameOfCorruptRecord)

def _set_csv_opts(self, schema, sep, encoding, quote, escape,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could be:

def _set_csv_opts(self, schema, **options):
     if schema is not None:
          self.schema(schema)
     for k in options:
          if options[k] is not None:
               self.option(k, options[k])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. There are a bunch of things I want to do to the readwrite.py (mainly break it apart). I will do it there and merge this to unblock the rc.

asfgit pushed a commit that referenced this pull request Jun 21, 2016
## What changes were proposed in this pull request?
This is a follow-up to #13795 to properly set CSV options in Python API. As part of this, I also make the Python option setting for both CSV and JSON more robust against positional errors.

## How was this patch tested?
N/A

Author: Reynold Xin <[email protected]>

Closes #13800 from rxin/SPARK-13792-2.

(cherry picked from commit 9333880)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 9333880 Jun 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants