Skip to content

Conversation

@Cazen
Copy link
Contributor

@Cazen Cazen commented Dec 28, 2015

We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@Cazen
Copy link
Contributor Author

Cazen commented Dec 28, 2015

For example, if JSON file that includes not listed by JSON backslash quoting specification, it returns corrupt_record
JSON File
{"name": "Cazen Lee", "price": "$10"}
{"name": "John Doe", "price": "$20"}
{"name": "Tracy", "price": "$10"}
corrupt_record(returns null)
scala> df.show
+--------------------+---------+-----+
| _corrupt_record| name|price|
+--------------------+---------+-----+
| null|Cazen Lee| $10|
|{"name": "John Do...| null| null|
| null| Tracy| $10|
+--------------------+---------+-----+
And after apply this patch, we can enable allowBackslashEscapingAnyCharacter option like below
scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "true").json("/user/Cazen/test/test2.txt")
df: org.apache.spark.sql.DataFrame = [name: string, price: string]

scala> df.show
+---------+-----+
| name|price|
+---------+-----+
|Cazen Lee| $10|
| John Doe| $20|
| Tracy| $10|
+---------+-----+
This issue similar to HIVE-11825, HIVE-12717.

@Cazen Cazen closed this Dec 28, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also test the price field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right. It needeed
I'll modify test code soon
Thz

@rxin
Copy link
Contributor

rxin commented Dec 30, 2015

@Cazen how come you closed the pull request?

@Cazen
Copy link
Contributor Author

Cazen commented Dec 30, 2015

Hi Xin Thank you for review

I've created PR(11496, this PR) but it doesn't connect with jira(SPARK-12537) so I've closed.

After that, I recreated PR(11497) but linked 11496 in the jira instead of 11497 automatically.

Should I reopen this PR and close new one(11497)?

I'm sorry about confusing.

@rxin
Copy link
Contributor

rxin commented Dec 30, 2015

It's fine to use #10497

Just update it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants