Skip to content

Conversation

@andrewor14
Copy link
Contributor

A fundamental limitation of the existing SQL tests is that there is simply no way to create your own SparkContext. This is a serious limitation because the user may wish to use a different master or config. As a case in point, BroadcastJoinSuite is entirely commented out because there is no way to make it pass with the existing infrastructure.

This patch removes the singletons TestSQLContext and TestData, and instead introduces a SharedSQLContext that starts a context per suite. Unfortunately the singletons were so ingrained in the SQL tests that this patch necessarily needed to touch all the SQL test files.

Review on Reviewable

Andrew Or added 17 commits August 4, 2015 16:14
This allows us to use custom SparkContexts in hive tests.
This is a clean up to refactor helper test traits and abstract
classes in such a way that is accessible to hive tests.
MyTestSQLContext -> SharedSQLContext
MyTestHiveContext -> SharedHiveContext
LocalSQLContext -> TestSQLContext
TestData -> TestSQLData
The test data is currently loaded as a bunch of lazy vals.
If the data is accessed by name, however, they won't be loaded
automatically. This patch adds an explicit method call that loads
the data if necessary.
This commit allows us to call `import testImplicits._` in the test
constructor and use implicit methods properly. This was previously
not possible without also starting a SQLContext in the constructor.
Instead, now we can properly use implicits *while* starting the
SQLContext in `beforeAll`.

However, there is currently an issue with tests using the test
data prepared in advance. This will be fixed in the subsequent
commit.
Tests that use test data used to fail before this commit. This is
because the underlying case classes would bring in the entire
`SQLTestData` trait into the scope. This no longer happens after
we move the case classes outside of the trait.
Test suites that extend DataSourceTest used to have this weird
implicit SQLContext that was created in the constructor. This was
failing tests because the base SQLContext is not ready until after
the first test is run. A minor refactor was required to fix the
resulting NPEs.

This commit also fixes test suites that need to materialize the
test data. These suites were materializing them in the constructor
before the SQLContext was ready.
This makes hive tests use the same pattern as SQL tests, i.e.
everything inherits HiveTestUtils, and those that want to use
implicits can do `import testImplicits._`.
Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/RowFormatConvertersSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlSerializer2Suite.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/UnsafeKVExternalSorterSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetAvroCompatibilitySuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompatibilityTest.scala
	sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/sources/SaveLoadSuite.scala
	sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
	sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHiveContext.scala
	sql/hive/src/test/java/test/org/apache/spark/sql/hive/JavaDataFrameSuite.java
	sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
	sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala
	sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
	sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
	sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
	sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala
	sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
	sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala
	sql/hive/src/test/scala/org/apache/spark/sql/sources/ParquetHadoopFsRelationSuite.scala
Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/joins/OuterJoinSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/execution/joins/SemiJoinSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
@andrewor14 andrewor14 changed the title [SPARK-9580] [SQL] [WIP] Replace singletons TestSQLContext and TestData [SPARK-9580] [SQL] [WIP] Replace singletons in SQL tests Aug 11, 2015
@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40506 has finished for PR 8111 at commit 9395cfa.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SQLTransformer (override val uid: String) extends Transformer
    • implicit class StringToColumn(val sc: StringContext)

Andrew Or added 2 commits August 11, 2015 14:07
This suite was using the SQLContext extensively in the
constructor of the test suite. The fact that we don't have the
singleton anymore means this is no longer possible. This commit
refactors the suite to never reference a SQLContext outside of
a test body.
@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40512 has finished for PR 8111 at commit 997715e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • implicit class StringToColumn(val sc: StringContext)
    • case class EqualNullSafe(attribute: String, value: Any) extends Filter

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40513 has finished for PR 8111 at commit 1400770.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SQLTransformer (override val uid: String) extends Transformer
    • implicit class StringToColumn(val sc: StringContext)
    • case class EqualNullSafe(attribute: String, value: Any) extends Filter

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #1458 has finished for PR 8111 at commit 1400770.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SQLTransformer (override val uid: String) extends Transformer
    • implicit class StringToColumn(val sc: StringContext)
    • case class EqualNullSafe(attribute: String, value: Any) extends Filter

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #1459 has finished for PR 8111 at commit 1400770.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SQLTransformer (override val uid: String) extends Transformer
    • implicit class StringToColumn(val sc: StringContext)
    • case class EqualNullSafe(attribute: String, value: Any) extends Filter

@andrewor14
Copy link
Contributor Author

retest this please

@JoshRosen
Copy link
Contributor

Reviewed 28 of 131 files at r1, 1 of 2 files at r2.
Review status: 29 of 132 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


project/SparkBuild.scala, line 344 [r4] (raw file):
This is probably safe to remove given that we should inherit this from the global TestSettings.


project/SparkBuild.scala, line 363 [r4] (raw file):
Should still have something here to create the context and import its implicits. This will require manual testing before signoff.


sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java, line 46 [r4] (raw file):
Note that this is called on every test, not once per suite, so I think that we'll wind up re-loading the test data many times. If this turns out to be a perf. issue then we may want to use JUnit BeforeAll or BeforeSuite or whatever it's called.


sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala, line 49 [r4] (raw file):
As discussed offline, I think that SQLTestUtils should remain a trait that is mixed into SharedSQLContext and SharedHiveContext. This will allow its helper methods to be used in non-shared-context settings as well.


Comments from the review on Reviewable.io

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #1543 has finished for PR 8111 at commit c4d44c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait Identifiable
    • class VectorUDT extends UserDefinedType[Vector]
    • implicit class StringToColumn(val sc: StringContext)

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #1545 has finished for PR 8111 at commit c4d44c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait Identifiable
    • case class QRDecomposition[QType, RType](Q: QType, R: RType)
    • class VectorUDT extends UserDefinedType[Vector]
    • implicit class StringToColumn(val sc: StringContext)

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #1541 timed out for PR 8111 at commit 828144f after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40739 timed out for PR 8111 at commit c4d44c9 after a configured wait of 175m.

Andrew Or added 2 commits August 13, 2015 09:33
Conflicts:
	sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
	sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #1554 has finished for PR 8111 at commit 0606c82.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait Identifiable
    • class VectorUDT extends UserDefinedType[Vector]
    • class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol, HasSeed):
    • implicit class StringToColumn(val sc: StringContext)

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #1555 has finished for PR 8111 at commit 0606c82.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait Identifiable
    • class VectorUDT extends UserDefinedType[Vector]
    • class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol, HasSeed):
    • implicit class StringToColumn(val sc: StringContext)

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40777 timed out for PR 8111 at commit 0606c82 after a configured wait of 175m.

Andrew Or added 3 commits August 13, 2015 12:44
This is because setting up multiple HiveContext's makes tests
really unstable. We keep running out of PermGen space and run
into JVM seg faults once in a while.

This reduces the size of the patch significantly and only deals
with the singleton SQLContext.
@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40791 has finished for PR 8111 at commit aed7fc7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • implicit class StringToColumn(val sc: StringContext)

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #1564 has finished for PR 8111 at commit d85a6d8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • implicit class StringToColumn(val sc: StringContext)

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40793 has finished for PR 8111 at commit b8780a9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • implicit class StringToColumn(val sc: StringContext)

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #1566 has finished for PR 8111 at commit d85a6d8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait Identifiable
    • case class QRDecomposition[QType, RType](Q: QType, R: RType)
    • class VectorUDT extends UserDefinedType[Vector]
    • class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol, HasSeed):
    • implicit class StringToColumn(val sc: StringContext)

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #1565 has finished for PR 8111 at commit d85a6d8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • implicit class StringToColumn(val sc: StringContext)

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #1569 timed out for PR 8111 at commit d85a6d8 after a configured wait of 175m.

asfgit pushed a commit that referenced this pull request Aug 14, 2015
A fundamental limitation of the existing SQL tests is that *there is simply no way to create your own `SparkContext`*. This is a serious limitation because the user may wish to use a different master or config. As a case in point, `BroadcastJoinSuite` is entirely commented out because there is no way to make it pass with the existing infrastructure.

This patch removes the singletons `TestSQLContext` and `TestData`, and instead introduces a `SharedSQLContext` that starts a context per suite. Unfortunately the singletons were so ingrained in the SQL tests that this patch necessarily needed to touch *all* the SQL test files.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/8111)
<!-- Reviewable:end -->

Author: Andrew Or <[email protected]>

Closes #8111 from andrewor14/sql-tests-refactor.

(cherry picked from commit 8187b3a)
Signed-off-by: Reynold Xin <[email protected]>
@rxin
Copy link
Contributor

rxin commented Aug 14, 2015

I've merged this.

@asfgit asfgit closed this in 8187b3a Aug 14, 2015
@andrewor14 andrewor14 deleted the sql-tests-refactor branch August 14, 2015 00:44
@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #1574 has finished for PR 8111 at commit 821ea67.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • implicit class StringToColumn(val sc: StringContext)

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #40820 has finished for PR 8111 at commit 821ea67.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • implicit class StringToColumn(val sc: StringContext)

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #1573 timed out for PR 8111 at commit 821ea67 after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #1575 timed out for PR 8111 at commit 821ea67 after a configured wait of 175m.

CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
A fundamental limitation of the existing SQL tests is that *there is simply no way to create your own `SparkContext`*. This is a serious limitation because the user may wish to use a different master or config. As a case in point, `BroadcastJoinSuite` is entirely commented out because there is no way to make it pass with the existing infrastructure.

This patch removes the singletons `TestSQLContext` and `TestData`, and instead introduces a `SharedSQLContext` that starts a context per suite. Unfortunately the singletons were so ingrained in the SQL tests that this patch necessarily needed to touch *all* the SQL test files.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/8111)
<!-- Reviewable:end -->

Author: Andrew Or <[email protected]>

Closes apache#8111 from andrewor14/sql-tests-refactor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants