[SPARK-5084][SQL]Replaces TestHiveContext.configure() with HiveContext.overrideHiveConf() #3895

baishuo · 2015-01-05T06:36:51Z

No description provided.

marmbrus · 2015-01-05T07:13:43Z

ok to test

baishuo · 2015-01-05T07:22:28Z

some explain:
if we want to use mysql instead of derby to store the metadata for spark-sql, we add the param such as "javax.jdo.option.ConnectionURL" to a hive-site.xml, it will always be overrided by the hard code of TestHive.scala or HiveContext.scala
and if there is alread a database called "default" in metastore storage, the code "context.runSqlHive("CREATE DATABASE default")" will invoke an Exception: Database default already exists

SparkQA · 2015-01-05T07:39:38Z

Test build #25050 has finished for PR 3895 at commit 1757fde.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

baishuo · 2015-01-06T08:59:06Z

Hi @marmbrus, I had modify some code and do test locally , would you please trigger the test again? :)
I run the test by "sbt/sbt catalyst/test sql/test hive/test"
PS:
I meet a error, but I think it has no relation with my change to the code.Since I had do the same test on a clean master branch, The error also happens.

the detail of error:
[info] - Converting Hive to Parquet Table via saveAsParquetFile (378 milliseconds)
01:44:21.200 WARN parquet.hadoop.ParquetOutputCommitter: could not write summary file for /tmp/parquetTest8472205136348547909
parquet.io.ParquetEncodingException: file:/tmp/parquetTest8472205136348547909/part-r-00002.parquet invalid: all the files must be contained in the root /tmp/parquetTest8472205136348547909
at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422)
at parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398)
at parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51)
at org.apache.spark.sql.parquet.InsertIntoParquetTable.saveAsHadoopFile(ParquetTableOperations.scala:327)
at org.apache.spark.sql.parquet.InsertIntoParquetTable.execute(ParquetTableOperations.scala:251)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.(SchemaRDD.scala:108)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:96)
at org.apache.spark.sql.parquet.HiveParquetSuite$$anonfun$5$$anonfun$apply$mcV$sp$4$$anonfun$apply$mcV$sp$9$$anonfun$apply$2.apply$mcV$sp(HiveParquetSuite.scala:74)
at org.apache.spark.sql.parquet.ParquetTest$class.withTempTable(ParquetTest.scala:111)
at org.apache.spark.sql.parquet.HiveParquetSuite.withTempTable(HiveParquetSuite.scala:26)
at org.apache.spark.sql.parquet.HiveParquetSuite$$anonfun$5$$anonfun$apply$mcV$sp$4$$anonfun$apply$mcV$sp$9.apply(HiveParquetSuite.scala:72)
at org.apache.spark.sql.parquet.HiveParquetSuite$$anonfun$5$$anonfun$apply$mcV$sp$4$$anonfun$apply$mcV$sp$9.apply(HiveParquetSuite.scala:69)
at org.apache.spark.sql.parquet.ParquetTest$class.withTempPath(ParquetTest.scala:70)
at org.apache.spark.sql.parquet.HiveParquetSuite.withTempPath(HiveParquetSuite.scala:26)
at org.apache.spark.sql.parquet.HiveParquetSuite$$anonfun$5$$anonfun$apply$mcV$sp$4.apply$mcV$sp(HiveParquetSuite.scala:69)
at org.apache.spark.sql.parquet.ParquetTest$class.withTempTable(ParquetTest.scala:111)
at org.apache.spark.sql.parquet.HiveParquetSuite.withTempTable(HiveParquetSuite.scala:26)

it occurs when do “hive/test”

SparkQA · 2015-01-06T10:05:18Z

Test build #25095 has finished for PR 3895 at commit 1f42b8e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

baishuo · 2015-01-15T06:49:48Z

Hi @marmbrus ,can this PR be merged? :)

scwf · 2015-01-15T06:53:19Z

hey @baishuo, you should rebase this PR

yhuai · 2015-01-21T05:41:03Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala

Maybe add a comment to explain what we are doing?

after rebase this code block has been removed, the purpose is to avoid hard code.

liancheng · 2015-01-26T03:18:09Z

@baishuo Why do you want to use MySQL based metastore when running test suites?

baishuo · 2015-03-13T04:32:07Z

Hi guys, sorry for response this so late. I just work on trip before china new year. And I had rebase the code and just left one modification. thanks

SparkQA · 2015-03-13T05:14:59Z

Test build #28548 has finished for PR 3895 at commit 3b86819.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-03-13T06:17:09Z

Test build #28550 has finished for PR 3895 at commit 58bbdb3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2015-03-14T16:57:30Z

This LGTM, merging to master. Thanks for working on this!

(Update: actually the merge operation failed due to network issue...)

marmbrus · 2015-03-14T17:55:34Z

I actually question why we need this at all. As far as I understand, we only need this function because of a bug in the way we are initializing TestHive (the only place this function is used). It used to be that we call configure() before creating the sessions state, but, when we refactored the creation of the config and session state, we changed it so that it ends up getting initialized first, then we call configure to setup the location of the metastore and warehouse. This late change during initialization breaks a bunch of things (i.e. we don't create the metastore in a temp directory anymore) and needing to create the database manually is just a hack to work around this (I think). Instead, we should do the following.

Change it such that configure() is passed the configuration explicitly and called during setup, instead of relying on the ordering of the constructor. Have a default no-op implementation in HiveContext. Override configure in TestHive to do the right thing creating a fresh metastore each time it is invoked.

@marmbrus

This change is per @marmbrus's comment in PR apache#3895.

liancheng · 2015-03-15T10:11:03Z

@marmbrus @baishuo My last merging operation happened to fail because of network issue, and then I saw Michael's comment. Created a baishuo#2 per Michael's comment. Please have a look at this one. Thanks!

baishuo · 2015-03-17T02:25:05Z

thank you @liancheng , I had study baishuo#2 ， and I think that is good :) @marmbrus

baishuo · 2015-03-17T02:27:41Z

had modify the Title of this PR @marmbrus @liancheng

marmbrus · 2015-03-17T02:27:47Z

Can you resolve the conflicts here?

marmbrus · 2015-03-17T02:28:35Z

Thanks for working on this BTW, I'm glad we finally found what was messing up the initialization of TestHive! It has been bothering me for a while.

baishuo · 2015-03-17T02:32:00Z

@marmbrus no problem, let me resolve the conflicts :)

@marmbrus

This change is per @marmbrus's comment in PR apache#3895.

SparkQA · 2015-03-17T03:49:48Z

Test build #28690 has finished for PR 3895 at commit 5241751.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2015-03-17T04:39:16Z

Test build #28697 has finished for PR 3895 at commit d6c29c1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2015-03-22T15:14:42Z

Haven't got time to investigate why so many (hundreds) tests failed. Will come back to this one later.

marmbrus · 2015-04-11T23:25:38Z

I'd love to fix this, but can we close this issue until there i progress on the test failures.

andrewor14 · 2015-06-19T01:17:19Z

@baishuo would you mind closing the issue for now since it's mostly gone stale? Feel free to reopen an updated version if you prefer.

andrewor14 · 2015-09-02T01:57:25Z

Let's close this PR

yhuai reviewed Jan 21, 2015
View reviewed changes

baishuo force-pushed the SPARK-5084-20150105-2 branch from 1f42b8e to 3b86819 Compare March 13, 2015 03:54

liancheng added a commit to liancheng/spark that referenced this pull request Mar 15, 2015

Replaces TestHiveContext.configure() with HiveContext.overrideHiveConf()

2344733

This change is per @marmbrus's comment in PR apache#3895.

liancheng mentioned this pull request Mar 15, 2015

Replaces TestHiveContext.configure() with HiveContext.overrideHiveConf() baishuo/spark#2

Merged

baishuo changed the title ~~[SPARK-5084][SQL]add if not exists after create database-in Shim13.scala~~ [SPARK-5084][SQL]Replaces TestHiveContext.configure() with HiveContext.overrideHiveConf() Mar 17, 2015

baishuo added 5 commits March 16, 2015 20:15

add if not exists after create database

ecc1231

modify the hard code about jdbc ConnectionURL

23d118f

add if not exists again

6a756bf

choose an other getConf function

062c783

restore TestHive.scala

cdf3237

liancheng added 2 commits March 16, 2015 20:25

Replaces TestHiveContext.configure() with HiveContext.overrideHiveConf()

8c3eae4

This change is per @marmbrus's comment in PR apache#3895.

Fixes comment of TestHiveContext.overrideHiveConf()

d6c29c1

baishuo force-pushed the SPARK-5084-20150105-2 branch from 5241751 to d6c29c1 Compare March 17, 2015 04:19

marmbrus mentioned this pull request Apr 14, 2015

[SPARK-6675][SQL]Hive setConf issue #5457

Closed

asfgit closed this in 804a012 Sep 4, 2015

[SPARK-5084][SQL]Replaces TestHiveContext.configure() with HiveContext.overrideHiveConf() #3895

[SPARK-5084][SQL]Replaces TestHiveContext.configure() with HiveContext.overrideHiveConf() #3895

Uh oh!

Conversation

baishuo commented Jan 5, 2015

Uh oh!

marmbrus commented Jan 5, 2015

Uh oh!

baishuo commented Jan 5, 2015

Uh oh!

SparkQA commented Jan 5, 2015

Uh oh!

baishuo commented Jan 6, 2015

Uh oh!

SparkQA commented Jan 6, 2015

Uh oh!

baishuo commented Jan 15, 2015

Uh oh!

scwf commented Jan 15, 2015

Uh oh!

yhuai Jan 21, 2015

Choose a reason for hiding this comment

Uh oh!

baishuo Mar 13, 2015

Choose a reason for hiding this comment

Uh oh!

liancheng commented Jan 26, 2015

Uh oh!

baishuo commented Mar 13, 2015

Uh oh!

SparkQA commented Mar 13, 2015

Uh oh!

SparkQA commented Mar 13, 2015

Uh oh!

liancheng commented Mar 14, 2015

Uh oh!

marmbrus commented Mar 14, 2015

Uh oh!

liancheng commented Mar 15, 2015

Uh oh!

baishuo commented Mar 17, 2015

Uh oh!

baishuo commented Mar 17, 2015

Uh oh!

marmbrus commented Mar 17, 2015

Uh oh!

marmbrus commented Mar 17, 2015

Uh oh!

baishuo commented Mar 17, 2015

Uh oh!

SparkQA commented Mar 17, 2015

Uh oh!

SparkQA commented Mar 17, 2015

Uh oh!

liancheng commented Mar 22, 2015

Uh oh!

marmbrus commented Apr 11, 2015

Uh oh!

andrewor14 commented Jun 19, 2015

Uh oh!

andrewor14 commented Sep 2, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants