Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
82e2f09
Fix part of undocumented/duplicated arguments warnings by CRAN-check
junyangq Aug 9, 2016
41d9dca
[SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.c…
Aug 9, 2016
44115e9
[SPARK-16956] Make ApplicationState.MAX_NUM_RETRY configurable
JoshRosen Aug 9, 2016
2d136db
[SPARK-16905] SQL DDL: MSCK REPAIR TABLE
Aug 9, 2016
901edbb
More fixes of the docs.
junyangq Aug 10, 2016
475ee38
Fixed typo
jupblb Aug 10, 2016
2285de7
[SPARK-16522][MESOS] Spark application throws exception on exit.
sun-rui Aug 10, 2016
20efb79
[SPARK-16324][SQL] regexp_extract should doc that it returns empty st…
srowen Aug 10, 2016
719ac5f
[SPARK-15899][SQL] Fix the construction of the file path with hadoop …
avulanov Aug 10, 2016
15637f7
Revert "[SPARK-15899][SQL] Fix the construction of the file path with…
srowen Aug 10, 2016
977fbbf
[SPARK-15639] [SPARK-16321] [SQL] Push down filter at RowGroups level…
viirya Aug 10, 2016
d3a30d2
[SPARK-16579][SPARKR] add install.spark function
junyangq Aug 10, 2016
1e40135
[SPARK-17010][MINOR][DOC] Wrong description in memory management docu…
WangTaoTheTonic Aug 11, 2016
8611bc2
[SPARK-16866][SQL] Infrastructure for file-based SQL end-to-end tests
petermaxlee Aug 10, 2016
51b1016
[SPARK-17008][SPARK-17009][SQL] Normalization and isolation in SQLQue…
petermaxlee Aug 11, 2016
ea8a198
[SPARK-17007][SQL] Move test data files into a test-data folder
petermaxlee Aug 11, 2016
4b434e7
[SPARK-17011][SQL] Support testing exceptions in SQLQueryTestSuite
petermaxlee Aug 11, 2016
0ed6236
Correct example value for spark.ssl.YYY.XXX settings
ash211 Aug 11, 2016
33a213f
[SPARK-15899][SQL] Fix the construction of the file path with hadoop …
avulanov Aug 11, 2016
b87ba8f
Fix remaining undocumented/duplicated warnings
junyangq Aug 11, 2016
6bf20cd
[SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests
petermaxlee Aug 11, 2016
bc683f0
[SPARK-17018][SQL] literals.sql for testing literal parsing
petermaxlee Aug 11, 2016
0fb0149
[SPARK-17022][YARN] Handle potential deadlock in driver handling mess…
WangTaoTheTonic Aug 11, 2016
d2c1d64
Keep to the convention where we have docs for generic and the function.
junyangq Aug 12, 2016
b4047fc
[SPARK-16975][SQL] Column-partition path starting '_' should be handl…
dongjoon-hyun Aug 12, 2016
bde94cd
[SPARK-17013][SQL] Parse negative numeric literals
petermaxlee Aug 12, 2016
38378f5
[SPARK-12370][DOCUMENTATION] Documentation should link to examples …
jagadeesanas2 Aug 13, 2016
a21ecc9
[SPARK-17023][BUILD] Upgrade to Kafka 0.10.0.1 release
lresende Aug 13, 2016
750f880
[SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.ap…
srowen Aug 13, 2016
e02d0d0
[SPARK-17027][ML] Avoid integer overflow in PolynomialExpansion.getPo…
zero323 Aug 14, 2016
8f4cacd
[SPARK-16508][SPARKR] Split docs for arrange and orderBy methods
junyangq Aug 15, 2016
4503632
[SPARK-17065][SQL] Improve the error message when encountering an inc…
zsxwing Aug 15, 2016
e5771a1
Fix docs for window functions
junyangq Aug 16, 2016
2e2c787
[SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package
hvanhovell Aug 16, 2016
237ae54
Revert "[SPARK-16964][SQL] Remove private[hive] from sql.hive.executi…
rxin Aug 16, 2016
1c56971
[SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.ex…
hvanhovell Aug 16, 2016
022230c
[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings…
felixcheung Aug 16, 2016
6cb3eab
[SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator
phalodi Aug 16, 2016
3e0163b
[SPARK-17084][SQL] Rename ParserUtils.assert to validate
hvanhovell Aug 17, 2016
68a24d3
[MINOR][DOC] Fix the descriptions for `properties` argument in the do…
Aug 17, 2016
22c7660
[SPARK-15285][SQL] Generated SpecificSafeProjection.apply method grow…
kiszk Aug 17, 2016
394d598
[SPARK-17102][SQL] bypass UserDefinedGenerator for json format check
cloud-fan Aug 17, 2016
9406f82
[SPARK-17096][SQL][STREAMING] Improve exception string reported throu…
tdas Aug 17, 2016
585d1d9
[SPARK-17038][STREAMING] fix metrics retrieval source of 'lastReceive…
keypointt Aug 17, 2016
91aa532
[SPARK-16995][SQL] TreeNodeException when flat mapping RelationalGrou…
viirya Aug 18, 2016
5735b8b
[SPARK-16391][SQL] Support partial aggregation for reduceGroups
rxin Aug 18, 2016
ec5f157
[SPARK-17117][SQL] 1 / NULL should not fail analysis
petermaxlee Aug 18, 2016
0bc3753
Fix part of undocumented/duplicated arguments warnings by CRAN-check
junyangq Aug 9, 2016
6d5233e
More fixes of the docs.
junyangq Aug 10, 2016
0edfd7d
Fix remaining undocumented/duplicated warnings
junyangq Aug 11, 2016
e72a6aa
Keep to the convention where we have docs for generic and the function.
junyangq Aug 12, 2016
afa69ed
Fix docs for window functions
junyangq Aug 16, 2016
c9cfe43
some fixes of R doc
junyangq Aug 18, 2016
3aafaa7
Move param docs from generic function to method definition.
junyangq Aug 18, 2016
315a0dd
some fixes of R doc
junyangq Aug 18, 2016
aa3d233
Move param docs from generic function to method definition.
junyangq Aug 18, 2016
71170e9
Solve conflicts.
junyangq Aug 18, 2016
2682719
Revert "Fix docs for window functions"
junyangq Aug 18, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-16975][SQL] Column-partition path starting '_' should be handl…
…ed correctly

Currently, Spark ignores path names starting with underscore `_` and `.`. This causes read-failures for the column-partitioned file data sources whose partition column names starts from '_', e.g. `_col`.

**Before**
```scala
scala> spark.range(10).withColumn("_locality_code", $"id").write.partitionBy("_locality_code").save("/tmp/parquet")
scala> spark.read.parquet("/tmp/parquet")
org.apache.spark.sql.AnalysisException: Unable to infer schema for ParquetFormat at /tmp/parquet20. It must be specified manually;
```

**After**
```scala
scala> spark.range(10).withColumn("_locality_code", $"id").write.partitionBy("_locality_code").save("/tmp/parquet")
scala> spark.read.parquet("/tmp/parquet")
res2: org.apache.spark.sql.DataFrame = [id: bigint, _locality_code: int]
```

Pass the Jenkins with a new test case.

Author: Dongjoon Hyun <[email protected]>

Closes #14585 from dongjoon-hyun/SPARK-16975-PARQUET.

(cherry picked from commit abff92b)
Signed-off-by: Cheng Lian <[email protected]>
  • Loading branch information
dongjoon-hyun authored and liancheng committed Aug 12, 2016
commit b4047fc21cefcf6a43c1ee88af330a042f02bebc
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,6 @@ abstract class PartitioningAwareFileCatalog(

private def isDataPath(path: Path): Boolean = {
val name = path.getName
!(name.startsWith("_") || name.startsWith("."))
!((name.startsWith("_") && !name.contains("=")) || name.startsWith("."))
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@ private[sql] object HadoopFsRelation extends Logging {
// We filter everything that starts with _ and ., except _common_metadata and _metadata
// because Parquet needs to find those metadata files from leaf files returned by this method.
// We should refactor this logic to not mix metadata files with data files.
(pathName.startsWith("_") || pathName.startsWith(".")) &&
((pathName.startsWith("_") && !pathName.contains("=")) || pathName.startsWith(".")) &&
!pathName.startsWith("_common_metadata") && !pathName.startsWith("_metadata")
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister {
.getOrElse(sparkSession.sessionState.conf.columnNameOfCorruptRecord)
val jsonFiles = files.filterNot { status =>
val name = status.getPath.getName
name.startsWith("_") || name.startsWith(".")
(name.startsWith("_") && !name.contains("=")) || name.startsWith(".")
}.toArray

val jsonSchema = InferSchema.infer(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,8 @@ private[sql] class ParquetFileFormat
// Lists `FileStatus`es of all leaf nodes (files) under all base directories.
val leaves = allFiles.filter { f =>
isSummaryFile(f.getPath) ||
!(f.getPath.getName.startsWith("_") || f.getPath.getName.startsWith("."))
!((f.getPath.getName.startsWith("_") && !f.getPath.getName.contains("=")) ||
f.getPath.getName.startsWith("."))
}.toArray.sortBy(_.getPath.toString)

FileTypes(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

package org.apache.spark.sql

import java.io.File
import java.math.MathContext
import java.sql.{Date, Timestamp}

Expand Down Expand Up @@ -2591,6 +2592,14 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
Row(s"$expected") :: Nil)
}

test("SPARK-16975: Column-partition path starting '_' should be handled correctly") {
withTempDir { dir =>
val parquetDir = new File(dir, "parquet").getCanonicalPath
spark.range(10).withColumn("_col", $"id").write.partitionBy("_col").save(parquetDir)
spark.read.parquet(parquetDir)
}
}

test("SPARK-16644: Aggregate should not put aggregate expressions to constraints") {
withTable("tbl") {
sql("CREATE TABLE tbl(a INT, b INT) USING parquet")
Expand Down