Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
82e2f09
Fix part of undocumented/duplicated arguments warnings by CRAN-check
junyangq Aug 9, 2016
41d9dca
[SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.c…
Aug 9, 2016
44115e9
[SPARK-16956] Make ApplicationState.MAX_NUM_RETRY configurable
JoshRosen Aug 9, 2016
2d136db
[SPARK-16905] SQL DDL: MSCK REPAIR TABLE
Aug 9, 2016
901edbb
More fixes of the docs.
junyangq Aug 10, 2016
475ee38
Fixed typo
jupblb Aug 10, 2016
2285de7
[SPARK-16522][MESOS] Spark application throws exception on exit.
sun-rui Aug 10, 2016
20efb79
[SPARK-16324][SQL] regexp_extract should doc that it returns empty st…
srowen Aug 10, 2016
719ac5f
[SPARK-15899][SQL] Fix the construction of the file path with hadoop …
avulanov Aug 10, 2016
15637f7
Revert "[SPARK-15899][SQL] Fix the construction of the file path with…
srowen Aug 10, 2016
977fbbf
[SPARK-15639] [SPARK-16321] [SQL] Push down filter at RowGroups level…
viirya Aug 10, 2016
d3a30d2
[SPARK-16579][SPARKR] add install.spark function
junyangq Aug 10, 2016
1e40135
[SPARK-17010][MINOR][DOC] Wrong description in memory management docu…
WangTaoTheTonic Aug 11, 2016
8611bc2
[SPARK-16866][SQL] Infrastructure for file-based SQL end-to-end tests
petermaxlee Aug 10, 2016
51b1016
[SPARK-17008][SPARK-17009][SQL] Normalization and isolation in SQLQue…
petermaxlee Aug 11, 2016
ea8a198
[SPARK-17007][SQL] Move test data files into a test-data folder
petermaxlee Aug 11, 2016
4b434e7
[SPARK-17011][SQL] Support testing exceptions in SQLQueryTestSuite
petermaxlee Aug 11, 2016
0ed6236
Correct example value for spark.ssl.YYY.XXX settings
ash211 Aug 11, 2016
33a213f
[SPARK-15899][SQL] Fix the construction of the file path with hadoop …
avulanov Aug 11, 2016
b87ba8f
Fix remaining undocumented/duplicated warnings
junyangq Aug 11, 2016
6bf20cd
[SPARK-17015][SQL] group-by/order-by ordinal and arithmetic tests
petermaxlee Aug 11, 2016
bc683f0
[SPARK-17018][SQL] literals.sql for testing literal parsing
petermaxlee Aug 11, 2016
0fb0149
[SPARK-17022][YARN] Handle potential deadlock in driver handling mess…
WangTaoTheTonic Aug 11, 2016
d2c1d64
Keep to the convention where we have docs for generic and the function.
junyangq Aug 12, 2016
b4047fc
[SPARK-16975][SQL] Column-partition path starting '_' should be handl…
dongjoon-hyun Aug 12, 2016
bde94cd
[SPARK-17013][SQL] Parse negative numeric literals
petermaxlee Aug 12, 2016
38378f5
[SPARK-12370][DOCUMENTATION] Documentation should link to examples …
jagadeesanas2 Aug 13, 2016
a21ecc9
[SPARK-17023][BUILD] Upgrade to Kafka 0.10.0.1 release
lresende Aug 13, 2016
750f880
[SPARK-16966][SQL][CORE] App Name is a randomUUID even when "spark.ap…
srowen Aug 13, 2016
e02d0d0
[SPARK-17027][ML] Avoid integer overflow in PolynomialExpansion.getPo…
zero323 Aug 14, 2016
8f4cacd
[SPARK-16508][SPARKR] Split docs for arrange and orderBy methods
junyangq Aug 15, 2016
4503632
[SPARK-17065][SQL] Improve the error message when encountering an inc…
zsxwing Aug 15, 2016
e5771a1
Fix docs for window functions
junyangq Aug 16, 2016
2e2c787
[SPARK-16964][SQL] Remove private[hive] from sql.hive.execution package
hvanhovell Aug 16, 2016
237ae54
Revert "[SPARK-16964][SQL] Remove private[hive] from sql.hive.executi…
rxin Aug 16, 2016
1c56971
[SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.ex…
hvanhovell Aug 16, 2016
022230c
[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings…
felixcheung Aug 16, 2016
6cb3eab
[SPARK-17089][DOCS] Remove api doc link for mapReduceTriplets operator
phalodi Aug 16, 2016
3e0163b
[SPARK-17084][SQL] Rename ParserUtils.assert to validate
hvanhovell Aug 17, 2016
68a24d3
[MINOR][DOC] Fix the descriptions for `properties` argument in the do…
Aug 17, 2016
22c7660
[SPARK-15285][SQL] Generated SpecificSafeProjection.apply method grow…
kiszk Aug 17, 2016
394d598
[SPARK-17102][SQL] bypass UserDefinedGenerator for json format check
cloud-fan Aug 17, 2016
9406f82
[SPARK-17096][SQL][STREAMING] Improve exception string reported throu…
tdas Aug 17, 2016
585d1d9
[SPARK-17038][STREAMING] fix metrics retrieval source of 'lastReceive…
keypointt Aug 17, 2016
91aa532
[SPARK-16995][SQL] TreeNodeException when flat mapping RelationalGrou…
viirya Aug 18, 2016
5735b8b
[SPARK-16391][SQL] Support partial aggregation for reduceGroups
rxin Aug 18, 2016
ec5f157
[SPARK-17117][SQL] 1 / NULL should not fail analysis
petermaxlee Aug 18, 2016
0bc3753
Fix part of undocumented/duplicated arguments warnings by CRAN-check
junyangq Aug 9, 2016
6d5233e
More fixes of the docs.
junyangq Aug 10, 2016
0edfd7d
Fix remaining undocumented/duplicated warnings
junyangq Aug 11, 2016
e72a6aa
Keep to the convention where we have docs for generic and the function.
junyangq Aug 12, 2016
afa69ed
Fix docs for window functions
junyangq Aug 16, 2016
c9cfe43
some fixes of R doc
junyangq Aug 18, 2016
3aafaa7
Move param docs from generic function to method definition.
junyangq Aug 18, 2016
315a0dd
some fixes of R doc
junyangq Aug 18, 2016
aa3d233
Move param docs from generic function to method definition.
junyangq Aug 18, 2016
71170e9
Solve conflicts.
junyangq Aug 18, 2016
2682719
Revert "Fix docs for window functions"
junyangq Aug 18, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-17008][SPARK-17009][SQL] Normalization and isolation in SQLQue…
…ryTestSuite.

## What changes were proposed in this pull request?
This patch enhances SQLQueryTestSuite in two ways:

1. SPARK-17009: Use a new SparkSession for each test case to provide stronger isolation (e.g. config changes in one test case does not impact another). That said, we do not currently isolate catalog changes.
2. SPARK-17008: Normalize query output using sorting, inspired by HiveComparisonTest.

I also ported a few new test cases over from SQLQuerySuite.

## How was this patch tested?
This is a test harness update.

Author: petermaxlee <[email protected]>

Closes #14590 from petermaxlee/SPARK-17008.

(cherry picked from commit 425c7c2)
Signed-off-by: Wenchen Fan <[email protected]>
  • Loading branch information
petermaxlee authored and cloud-fan committed Aug 11, 2016
commit 51b1016682a805e06b857a6b1f160a877839dbd5
4 changes: 4 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/datetime.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
-- date time functions

-- [SPARK-16836] current_date and current_timestamp literals
select current_date = current_date(), current_timestamp = current_timestamp();
15 changes: 15 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/having.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
create temporary view hav as select * from values
("one", 1),
("two", 2),
("three", 3),
("one", 5)
as hav(k, v);

-- having clause
SELECT k, sum(v) FROM hav GROUP BY k HAVING sum(v) > 2;

-- having condition contains grouping column
SELECT count(k) FROM hav GROUP BY v + 1 HAVING v + 1 = 2;

-- SPARK-11032: resolve having correctly
SELECT MIN(t.v) FROM (SELECT * FROM hav WHERE v > 0) t HAVING(COUNT(1) > 0);
20 changes: 20 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/natural-join.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
create temporary view nt1 as select * from values
("one", 1),
("two", 2),
("three", 3)
as nt1(k, v1);

create temporary view nt2 as select * from values
("one", 1),
("two", 22),
("one", 5)
as nt2(k, v2);


SELECT * FROM nt1 natural join nt2 where k = "one";

SELECT * FROM nt1 natural left join nt2 order by v1, v2;

SELECT * FROM nt1 natural right join nt2 order by v1, v2;

SELECT count(*) FROM nt1 natural full outer join nt2;
10 changes: 10 additions & 0 deletions sql/core/src/test/resources/sql-tests/results/datetime.sql.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
-- Automatically generated by org.apache.spark.sql.SQLQueryTestSuite
-- Number of queries: 1


-- !query 0
select current_date = current_date(), current_timestamp = current_timestamp()
-- !query 0 schema
struct<(current_date() = current_date()):boolean,(current_timestamp() = current_timestamp()):boolean>
-- !query 0 output
true true
40 changes: 40 additions & 0 deletions sql/core/src/test/resources/sql-tests/results/having.sql.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
-- Automatically generated by org.apache.spark.sql.SQLQueryTestSuite
-- Number of queries: 4


-- !query 0
create temporary view hav as select * from values
("one", 1),
("two", 2),
("three", 3),
("one", 5)
as hav(k, v)
-- !query 0 schema
struct<>
-- !query 0 output



-- !query 1
SELECT k, sum(v) FROM hav GROUP BY k HAVING sum(v) > 2
-- !query 1 schema
struct<k:string,sum(v):bigint>
-- !query 1 output
one 6
three 3


-- !query 2
SELECT count(k) FROM hav GROUP BY v + 1 HAVING v + 1 = 2
-- !query 2 schema
struct<count(k):bigint>
-- !query 2 output
1


-- !query 3
SELECT MIN(t.v) FROM (SELECT * FROM hav WHERE v > 0) t HAVING(COUNT(1) > 0)
-- !query 3 schema
struct<min(v):int>
-- !query 3 output
1
64 changes: 64 additions & 0 deletions sql/core/src/test/resources/sql-tests/results/natural-join.sql.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
-- Automatically generated by org.apache.spark.sql.SQLQueryTestSuite
-- Number of queries: 6


-- !query 0
create temporary view nt1 as select * from values
("one", 1),
("two", 2),
("three", 3)
as nt1(k, v1)
-- !query 0 schema
struct<>
-- !query 0 output



-- !query 1
create temporary view nt2 as select * from values
("one", 1),
("two", 22),
("one", 5)
as nt2(k, v2)
-- !query 1 schema
struct<>
-- !query 1 output



-- !query 2
SELECT * FROM nt1 natural join nt2 where k = "one"
-- !query 2 schema
struct<k:string,v1:int,v2:int>
-- !query 2 output
one 1 1
one 1 5


-- !query 3
SELECT * FROM nt1 natural left join nt2 order by v1, v2
-- !query 3 schema
struct<k:string,v1:int,v2:int>
-- !query 3 output
one 1 1
one 1 5
two 2 22
three 3 NULL


-- !query 4
SELECT * FROM nt1 natural right join nt2 order by v1, v2
-- !query 4 schema
struct<k:string,v1:int,v2:int>
-- !query 4 output
one 1 1
one 1 5
two 2 22


-- !query 5
SELECT count(*) FROM nt1 natural full outer join nt2
-- !query 5 schema
struct<count(1):bigint>
-- !query 5 output
4
62 changes: 0 additions & 62 deletions sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -38,26 +38,6 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {

setupTestData()

test("having clause") {
withTempView("hav") {
Seq(("one", 1), ("two", 2), ("three", 3), ("one", 5)).toDF("k", "v")
.createOrReplaceTempView("hav")
checkAnswer(
sql("SELECT k, sum(v) FROM hav GROUP BY k HAVING sum(v) > 2"),
Row("one", 6) :: Row("three", 3) :: Nil)
}
}

test("having condition contains grouping column") {
withTempView("hav") {
Seq(("one", 1), ("two", 2), ("three", 3), ("one", 5)).toDF("k", "v")
.createOrReplaceTempView("hav")
checkAnswer(
sql("SELECT count(k) FROM hav GROUP BY v + 1 HAVING v + 1 = 2"),
Row(1) :: Nil)
}
}

test("SPARK-8010: promote numeric to string") {
val df = Seq((1, 1)).toDF("key", "value")
df.createOrReplaceTempView("src")
Expand Down Expand Up @@ -1959,15 +1939,6 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
}
}

test("SPARK-11032: resolve having correctly") {
withTempView("src") {
Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("src")
checkAnswer(
sql("SELECT MIN(t.i) FROM (SELECT * FROM src WHERE i > 0) t HAVING(COUNT(1) > 0)"),
Row(1))
}
}

test("SPARK-11303: filter should not be pushed down into sample") {
val df = spark.range(100)
List(true, false).foreach { withReplacement =>
Expand Down Expand Up @@ -2507,30 +2478,6 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
}
}

test("natural join") {
val df1 = Seq(("one", 1), ("two", 2), ("three", 3)).toDF("k", "v1")
val df2 = Seq(("one", 1), ("two", 22), ("one", 5)).toDF("k", "v2")
withTempView("nt1", "nt2") {
df1.createOrReplaceTempView("nt1")
df2.createOrReplaceTempView("nt2")
checkAnswer(
sql("SELECT * FROM nt1 natural join nt2 where k = \"one\""),
Row("one", 1, 1) :: Row("one", 1, 5) :: Nil)

checkAnswer(
sql("SELECT * FROM nt1 natural left join nt2 order by v1, v2"),
Row("one", 1, 1) :: Row("one", 1, 5) :: Row("two", 2, 22) :: Row("three", 3, null) :: Nil)

checkAnswer(
sql("SELECT * FROM nt1 natural right join nt2 order by v1, v2"),
Row("one", 1, 1) :: Row("one", 1, 5) :: Row("two", 2, 22) :: Nil)

checkAnswer(
sql("SELECT count(*) FROM nt1 natural full outer join nt2"),
Row(4) :: Nil)
}
}

test("join with using clause") {
val df1 = Seq(("r1c1", "r1c2", "t1r1c3"),
("r2c1", "r2c2", "t1r2c3"), ("r3c1x", "r3c2", "t1r3c3")).toDF("c1", "c2", "c3")
Expand Down Expand Up @@ -2945,13 +2892,4 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
data.selectExpr("`part.col1`", "`col.1`"))
}
}

test("current_date and current_timestamp literals") {
// NOTE that I am comparing the result of the literal with the result of the function call.
// This is done to prevent the test from failing because we are comparing a result to an out
// dated timestamp (quite likely) or date (very unlikely - but equally annoying).
checkAnswer(
sql("select current_date = current_date(), current_timestamp = current_timestamp()"),
Seq(Row(true, true)))
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,12 @@ package org.apache.spark.sql
import java.io.File
import java.util.{Locale, TimeZone}

import org.apache.spark.sql.catalyst.planning.PhysicalOperation
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.rules.RuleExecutor
import org.apache.spark.sql.catalyst.util.{fileToString, stringToFile}
import org.apache.spark.sql.test.SharedSQLContext
import org.apache.spark.sql.types.StructType

/**
* End-to-end test cases for SQL queries.
Expand Down Expand Up @@ -126,14 +129,18 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext {
cleaned.split("(?<=[^\\\\]);").map(_.trim).filter(_ != "").toSeq
}

// Create a local SparkSession to have stronger isolation between different test cases.
// This does not isolate catalog changes.
val localSparkSession = spark.newSession()

// Run the SQL queries preparing them for comparison.
val outputs: Seq[QueryOutput] = queries.map { sql =>
val df = spark.sql(sql)
val (schema, output) = getNormalizedResult(localSparkSession, sql)
// We might need to do some query canonicalization in the future.
QueryOutput(
sql = sql,
schema = df.schema.catalogString,
output = df.queryExecution.hiveResultString().mkString("\n"))
schema = schema.catalogString,
output = output.mkString("\n"))
}

if (regenerateGoldenFiles) {
Expand Down Expand Up @@ -176,6 +183,23 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext {
}
}

/** Executes a query and returns the result as (schema of the output, normalized output). */
private def getNormalizedResult(session: SparkSession, sql: String): (StructType, Seq[String]) = {
// Returns true if the plan is supposed to be sorted.
def isSorted(plan: LogicalPlan): Boolean = plan match {
case _: Join | _: Aggregate | _: Generate | _: Sample | _: Distinct => false
case PhysicalOperation(_, _, Sort(_, true, _)) => true
case _ => plan.children.iterator.exists(isSorted)
}

val df = session.sql(sql)
val schema = df.schema
val answer = df.queryExecution.hiveResultString()

// If the output is not pre-sorted, sort it.
if (isSorted(df.queryExecution.analyzed)) (schema, answer) else (schema, answer.sorted)
}

private def listTestCases(): Seq[TestCase] = {
listFilesRecursively(new File(inputFilePath)).map { file =>
val resultFile = file.getAbsolutePath.replace(inputFilePath, goldenFilePath) + ".out"
Expand Down