Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Branch 2.1 #16181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Branch 2.1 #16181
Changes from 1 commit
dcbc426e9f1d4ac42301fdcbf3fdd2f2cf6a8fbcdb9c78d359ebd5e52fa1a634101029df40ee26b332907a84edbd1eac3e9873d574af82d529f59c749439294cb4e5fc8879bfee400f63b360e5ef6b6d39595a71876eee221bbf94ba80eaf988f90898dd7ac0dc14f1f672083b89c38bac441d15bd31dc626f6d680f58514424c90b7d29258c489a7b54d71b62236b9be3933dc602894064d43151dca6100c9c7d465e4b487820dac2ebda456859c08933551b2ba83d6fae4240c692248fc645512bde11d554c02518dc1ec07fe1c3c623d2db691f0cff7a70ae6679927999b3649c15fa0125fd0762c0c0af94e75f7a9aff13a33bb424dc9e469d3b1126c31175c478436ae207b57e48b18c5a94567db9a94659c6b2301b82084706b6eb4e416bc3db0ae871c0dbe08b86e9623d4756d523abfe95157936a3cbbc014fcee2ee4fc84fcecb442777b1536a2159787988fc466bee8b19555912c19ec622eb6717981136f6874b1df0eb4bad04693401b4b396a630a6fbb15ad3a315eb86cb0b2f1094a9eed063da0cbc3e7b3cffaf50f8662dbfb4e63531002e4251a992b0a73c9406f3392afc18b6dbe448aaa2a17c70214063aa01f36cd10d0e60e4b0e624e9fa36013fb2ea54bd338f664b9de94b96ffb3be2d1efc5fee8fabb5ae5f198d2ebeb051539c193e11d7c6599dac1835f03f15d2cf227d81d004ec74fa7f414557dbc68a49dfa969856f2b5afdac906d82cda66b97830ee13ff699339c549571e8fbef6b77889886f880d6e027e712bd5ae449f75a9d4feb32b259f34ad4d54d7947881e3f97b38694380aabc075d73d1cdf315bc46928fa0c1c6945e2b3cc4cbdc81759cf627a1a5cea6957d06a56df84b2af2124944a086a3bdd3aaed2e8ca1ae68e8d24045ae2928b57c8eb0b36355b1142b95aad7e780733a5ec2a78cd466e5e4afbf7043c6b05ba5ee6e044ab3de93fbeae85da7c0e296f542df39e96ac5c2c2fdc6e2e9877d45967e8d8e359dc3ef68579ab5cbbe2176916ddc4c673c647466742d2e8012f91b01b9eb100fce1be60f0903da7f8ebb65e896a415730ee374b2432c8538c69825af915f81f537632839d4e9cf3dbec28ea432b098b4828f698b8145c8241d698ec13c29388e07ef1821cbeafd232130c0743e23c8cf39759ffc6a4e3dfecd23d6c4c3361946854d4588168ca6a82655297be362d99ace4079d20e0d665f53319b5bc2aFile filter
Filter by extension
Conversations
Uh oh!
There was an error while loading. Please reload this page.
Jump to
Uh oh!
There was an error while loading. Please reload this page.
…ma is lager than parsed tokens ## What changes were proposed in this pull request? Currently, there are the three cases when reading CSV by datasource when it is `PERMISSIVE` parse mode. - schema == parsed tokens (from each line) No problem to cast the value in the tokens to the field in the schema as they are equal. - schema < parsed tokens (from each line) It slices the tokens into the number of fields in schema. - schema > parsed tokens (from each line) It appends `null` into parsed tokens so that safely values can be casted with the schema. However, when `null` is appended in the third case, we should take `null` into account when casting the values. In case of `StringType`, it is fine as `UTF8String.fromString(datum)` produces `null` when the input is `null`. Therefore, this case will happen only when schema is explicitly given and schema includes data types that are not `StringType`. The codes below: ```scala val path = "/tmp/a" Seq("1").toDF().write.text(path.getAbsolutePath) val schema = StructType( StructField("a", IntegerType, true) :: StructField("b", IntegerType, true) :: Nil) spark.read.schema(schema).option("header", "false").csv(path).show() ``` prints **Before** ``` java.lang.NumberFormatException: null at java.lang.Integer.parseInt(Integer.java:542) at java.lang.Integer.parseInt(Integer.java:615) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:24) ``` **After** ``` +---+----+ | a| b| +---+----+ | 1|null| +---+----+ ``` ## How was this patch tested? Unit test in `CSVSuite.scala` and `CSVTypeCastSuite.scala` Author: hyukjinkwon <[email protected]> Closes #15767 from HyukjinKwon/SPARK-18269. (cherry picked from commit 556a3b7) Signed-off-by: Reynold Xin <[email protected]>Uh oh!
There was an error while loading. Please reload this page.
There are no files selected for viewing