-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9580] [SQL] Replace singletons in SQL tests #8111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This allows us to use custom SparkContexts in hive tests.
This is a clean up to refactor helper test traits and abstract classes in such a way that is accessible to hive tests.
MyTestSQLContext -> SharedSQLContext MyTestHiveContext -> SharedHiveContext LocalSQLContext -> TestSQLContext TestData -> TestSQLData
The test data is currently loaded as a bunch of lazy vals. If the data is accessed by name, however, they won't be loaded automatically. This patch adds an explicit method call that loads the data if necessary.
This commit allows us to call `import testImplicits._` in the test constructor and use implicit methods properly. This was previously not possible without also starting a SQLContext in the constructor. Instead, now we can properly use implicits *while* starting the SQLContext in `beforeAll`. However, there is currently an issue with tests using the test data prepared in advance. This will be fixed in the subsequent commit.
Tests that use test data used to fail before this commit. This is because the underlying case classes would bring in the entire `SQLTestData` trait into the scope. This no longer happens after we move the case classes outside of the trait.
Test suites that extend DataSourceTest used to have this weird implicit SQLContext that was created in the constructor. This was failing tests because the base SQLContext is not ready until after the first test is run. A minor refactor was required to fix the resulting NPEs. This commit also fixes test suites that need to materialize the test data. These suites were materializing them in the constructor before the SQLContext was ready.
This makes hive tests use the same pattern as SQL tests, i.e. everything inherits HiveTestUtils, and those that want to use implicits can do `import testImplicits._`.
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala sql/core/src/test/scala/org/apache/spark/sql/execution/RowFormatConvertersSuite.scala sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlSerializer2Suite.scala sql/core/src/test/scala/org/apache/spark/sql/execution/UnsafeKVExternalSorterSuite.scala sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetAvroCompatibilitySuite.scala sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompatibilityTest.scala sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala sql/core/src/test/scala/org/apache/spark/sql/sources/SaveLoadSuite.scala sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHiveContext.scala sql/hive/src/test/java/test/org/apache/spark/sql/hive/JavaDataFrameSuite.java sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala sql/hive/src/test/scala/org/apache/spark/sql/sources/ParquetHadoopFsRelationSuite.scala
Conflicts: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala sql/core/src/test/scala/org/apache/spark/sql/execution/joins/OuterJoinSuite.scala sql/core/src/test/scala/org/apache/spark/sql/execution/joins/SemiJoinSuite.scala sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
Conflicts: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
|
Test build #40506 has finished for PR 8111 at commit
|
This suite was using the SQLContext extensively in the constructor of the test suite. The fact that we don't have the singleton anymore means this is no longer possible. This commit refactors the suite to never reference a SQLContext outside of a test body.
|
Test build #40512 has finished for PR 8111 at commit
|
|
Test build #40513 has finished for PR 8111 at commit
|
|
Test build #1458 has finished for PR 8111 at commit
|
|
Test build #1459 has finished for PR 8111 at commit
|
|
retest this please |
|
Reviewed 28 of 131 files at r1, 1 of 2 files at r2. project/SparkBuild.scala, line 344 [r4] (raw file): project/SparkBuild.scala, line 363 [r4] (raw file): sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java, line 46 [r4] (raw file): sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala, line 49 [r4] (raw file): Comments from the review on Reviewable.io |
|
Test build #1543 has finished for PR 8111 at commit
|
|
Test build #1545 has finished for PR 8111 at commit
|
|
Test build #1541 timed out for PR 8111 at commit |
|
Test build #40739 timed out for PR 8111 at commit |
Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
|
Test build #1554 has finished for PR 8111 at commit
|
|
Test build #1555 has finished for PR 8111 at commit
|
|
Test build #40777 timed out for PR 8111 at commit |
This is because setting up multiple HiveContext's makes tests really unstable. We keep running out of PermGen space and run into JVM seg faults once in a while. This reduces the size of the patch significantly and only deals with the singleton SQLContext.
Per @marmbrus request.
b8780a9 to
d85a6d8
Compare
|
Test build #40791 has finished for PR 8111 at commit
|
|
Test build #1564 has finished for PR 8111 at commit
|
|
Test build #40793 has finished for PR 8111 at commit
|
|
Test build #1566 has finished for PR 8111 at commit
|
|
Test build #1565 has finished for PR 8111 at commit
|
|
Test build #1569 timed out for PR 8111 at commit |
A fundamental limitation of the existing SQL tests is that *there is simply no way to create your own `SparkContext`*. This is a serious limitation because the user may wish to use a different master or config. As a case in point, `BroadcastJoinSuite` is entirely commented out because there is no way to make it pass with the existing infrastructure. This patch removes the singletons `TestSQLContext` and `TestData`, and instead introduces a `SharedSQLContext` that starts a context per suite. Unfortunately the singletons were so ingrained in the SQL tests that this patch necessarily needed to touch *all* the SQL test files. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/8111) <!-- Reviewable:end --> Author: Andrew Or <[email protected]> Closes #8111 from andrewor14/sql-tests-refactor. (cherry picked from commit 8187b3a) Signed-off-by: Reynold Xin <[email protected]>
|
I've merged this. |
|
Test build #1574 has finished for PR 8111 at commit
|
|
Test build #40820 has finished for PR 8111 at commit
|
|
Test build #1573 timed out for PR 8111 at commit |
|
Test build #1575 timed out for PR 8111 at commit |
A fundamental limitation of the existing SQL tests is that *there is simply no way to create your own `SparkContext`*. This is a serious limitation because the user may wish to use a different master or config. As a case in point, `BroadcastJoinSuite` is entirely commented out because there is no way to make it pass with the existing infrastructure. This patch removes the singletons `TestSQLContext` and `TestData`, and instead introduces a `SharedSQLContext` that starts a context per suite. Unfortunately the singletons were so ingrained in the SQL tests that this patch necessarily needed to touch *all* the SQL test files. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/8111) <!-- Reviewable:end --> Author: Andrew Or <[email protected]> Closes apache#8111 from andrewor14/sql-tests-refactor.
A fundamental limitation of the existing SQL tests is that there is simply no way to create your own
SparkContext. This is a serious limitation because the user may wish to use a different master or config. As a case in point,BroadcastJoinSuiteis entirely commented out because there is no way to make it pass with the existing infrastructure.This patch removes the singletons
TestSQLContextandTestData, and instead introduces aSharedSQLContextthat starts a context per suite. Unfortunately the singletons were so ingrained in the SQL tests that this patch necessarily needed to touch all the SQL test files.