Skip to content

Conversation

@baishuo
Copy link
Contributor

@baishuo baishuo commented Jan 5, 2015

No description provided.

@marmbrus
Copy link
Contributor

marmbrus commented Jan 5, 2015

ok to test

@baishuo
Copy link
Contributor Author

baishuo commented Jan 5, 2015

some explain:
if we want to use mysql instead of derby to store the metadata for spark-sql, we add the param such as "javax.jdo.option.ConnectionURL" to a hive-site.xml, it will always be overrided by the hard code of TestHive.scala or HiveContext.scala
and if there is alread a database called "default" in metastore storage, the code "context.runSqlHive("CREATE DATABASE default")" will invoke an Exception: Database default already exists

@SparkQA
Copy link

SparkQA commented Jan 5, 2015

Test build #25050 has finished for PR 3895 at commit 1757fde.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@baishuo
Copy link
Contributor Author

baishuo commented Jan 6, 2015

Hi @marmbrus, I had modify some code and do test locally , would you please trigger the test again? :)
I run the test by "sbt/sbt catalyst/test sql/test hive/test"
PS:
I meet a error, but I think it has no relation with my change to the code.Since I had do the same test on a clean master branch, The error also happens.

the detail of error:
[info] - Converting Hive to Parquet Table via saveAsParquetFile (378 milliseconds)
01:44:21.200 WARN parquet.hadoop.ParquetOutputCommitter: could not write summary file for /tmp/parquetTest8472205136348547909
parquet.io.ParquetEncodingException: file:/tmp/parquetTest8472205136348547909/part-r-00002.parquet invalid: all the files must be contained in the root /tmp/parquetTest8472205136348547909
at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422)
at parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398)
at parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51)
at org.apache.spark.sql.parquet.InsertIntoParquetTable.saveAsHadoopFile(ParquetTableOperations.scala:327)
at org.apache.spark.sql.parquet.InsertIntoParquetTable.execute(ParquetTableOperations.scala:251)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.(SchemaRDD.scala:108)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:96)
at org.apache.spark.sql.parquet.HiveParquetSuite$$anonfun$5$$anonfun$apply$mcV$sp$4$$anonfun$apply$mcV$sp$9$$anonfun$apply$2.apply$mcV$sp(HiveParquetSuite.scala:74)
at org.apache.spark.sql.parquet.ParquetTest$class.withTempTable(ParquetTest.scala:111)
at org.apache.spark.sql.parquet.HiveParquetSuite.withTempTable(HiveParquetSuite.scala:26)
at org.apache.spark.sql.parquet.HiveParquetSuite$$anonfun$5$$anonfun$apply$mcV$sp$4$$anonfun$apply$mcV$sp$9.apply(HiveParquetSuite.scala:72)
at org.apache.spark.sql.parquet.HiveParquetSuite$$anonfun$5$$anonfun$apply$mcV$sp$4$$anonfun$apply$mcV$sp$9.apply(HiveParquetSuite.scala:69)
at org.apache.spark.sql.parquet.ParquetTest$class.withTempPath(ParquetTest.scala:70)
at org.apache.spark.sql.parquet.HiveParquetSuite.withTempPath(HiveParquetSuite.scala:26)
at org.apache.spark.sql.parquet.HiveParquetSuite$$anonfun$5$$anonfun$apply$mcV$sp$4.apply$mcV$sp(HiveParquetSuite.scala:69)
at org.apache.spark.sql.parquet.ParquetTest$class.withTempTable(ParquetTest.scala:111)
at org.apache.spark.sql.parquet.HiveParquetSuite.withTempTable(HiveParquetSuite.scala:26)

it occurs when do “hive/test”

@SparkQA
Copy link

SparkQA commented Jan 6, 2015

Test build #25095 has finished for PR 3895 at commit 1f42b8e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@baishuo
Copy link
Contributor Author

baishuo commented Jan 15, 2015

Hi @marmbrus ,can this PR be merged? :)

@scwf
Copy link
Contributor

scwf commented Jan 15, 2015

hey @baishuo, you should rebase this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to explain what we are doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after rebase this code block has been removed, the purpose is to avoid hard code.

@liancheng
Copy link
Contributor

@baishuo Why do you want to use MySQL based metastore when running test suites?

@baishuo baishuo force-pushed the SPARK-5084-20150105-2 branch from 1f42b8e to 3b86819 Compare March 13, 2015 03:54
@baishuo
Copy link
Contributor Author

baishuo commented Mar 13, 2015

Hi guys, sorry for response this so late. I just work on trip before china new year. And I had rebase the code and just left one modification. thanks

@SparkQA
Copy link

SparkQA commented Mar 13, 2015

Test build #28548 has finished for PR 3895 at commit 3b86819.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 13, 2015

Test build #28550 has finished for PR 3895 at commit 58bbdb3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

This LGTM, merging to master. Thanks for working on this!

(Update: actually the merge operation failed due to network issue...)

@marmbrus
Copy link
Contributor

I actually question why we need this at all. As far as I understand, we only need this function because of a bug in the way we are initializing TestHive (the only place this function is used). It used to be that we call configure() before creating the sessions state, but, when we refactored the creation of the config and session state, we changed it so that it ends up getting initialized first, then we call configure to setup the location of the metastore and warehouse. This late change during initialization breaks a bunch of things (i.e. we don't create the metastore in a temp directory anymore) and needing to create the database manually is just a hack to work around this (I think). Instead, we should do the following.

Change it such that configure() is passed the configuration explicitly and called during setup, instead of relying on the ordering of the constructor. Have a default no-op implementation in HiveContext. Override configure in TestHive to do the right thing creating a fresh metastore each time it is invoked.

@liancheng
Copy link
Contributor

@marmbrus @baishuo My last merging operation happened to fail because of network issue, and then I saw Michael's comment. Created a baishuo#2 per Michael's comment. Please have a look at this one. Thanks!

@baishuo
Copy link
Contributor Author

baishuo commented Mar 17, 2015

thank you @liancheng , I had study baishuo#2 , and I think that is good :) @marmbrus

@baishuo baishuo changed the title [SPARK-5084][SQL]add if not exists after create database-in Shim13.scala [SPARK-5084][SQL]Replaces TestHiveContext.configure() with HiveContext.overrideHiveConf() Mar 17, 2015
@baishuo
Copy link
Contributor Author

baishuo commented Mar 17, 2015

had modify the Title of this PR @marmbrus @liancheng

@marmbrus
Copy link
Contributor

Can you resolve the conflicts here?

@marmbrus
Copy link
Contributor

Thanks for working on this BTW, I'm glad we finally found what was messing up the initialization of TestHive! It has been bothering me for a while.

@baishuo
Copy link
Contributor Author

baishuo commented Mar 17, 2015

@marmbrus no problem, let me resolve the conflicts :)

@SparkQA
Copy link

SparkQA commented Mar 17, 2015

Test build #28690 has finished for PR 3895 at commit 5241751.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@baishuo baishuo force-pushed the SPARK-5084-20150105-2 branch from 5241751 to d6c29c1 Compare March 17, 2015 04:19
@SparkQA
Copy link

SparkQA commented Mar 17, 2015

Test build #28697 has finished for PR 3895 at commit d6c29c1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

Haven't got time to investigate why so many (hundreds) tests failed. Will come back to this one later.

@marmbrus
Copy link
Contributor

I'd love to fix this, but can we close this issue until there i progress on the test failures.

@andrewor14
Copy link
Contributor

@baishuo would you mind closing the issue for now since it's mostly gone stale? Feel free to reopen an updated version if you prefer.

@andrewor14
Copy link
Contributor

Let's close this PR

@asfgit asfgit closed this in 804a012 Sep 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants