[SPARK-14056] Appends s3 specific configurations and spark.hadoop con… #11876

sitalkedia · 2016-03-22T02:59:00Z

What changes were proposed in this pull request?

Appends s3 specific configurations and spark.hadoop configurations to hive configuration.

How was this patch tested?

Tested by running a job on cluster.

…figurations to hive configuration.

srowen · 2016-03-22T14:02:56Z

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

Nits: "S3-specific", "a Hadoop". It doesn't need to return a Configuration.

holdenk · 2016-03-22T21:31:50Z

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

Maybe also mention the buffer keys that are also added that don't fit into either of those.

Good point! changed it accordingly.

srowen · 2016-03-23T12:13:50Z

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

Extra "spark." here in "spark.spark.buffer.size".
This might be a dumb question, but it feels kind of funny that we set these things when a Configuration is created everywhere except one place. Is it because the HiveConf necessarily comes from somewhere else that Spark isn't initializing? Just trying to rationalize treating it specially in this one case

Thanks, removed extra "spark." .

You are right, HiveConf is being initialized in a separate code path which Spark isn't initializing properly. I am not very familiar with hive side of things to comment on why it was done that way. But the TODO in TableReader.scala suggests that it is the right place to initialize the HiveConf.

This is looking reasonable. If we're able to get a comment from @marmbrus about this particular aspect of the change (he put in the todo in 9aadcff ) maybe that would confirm that this does need a special treatment and so this change makes sense.

That is a very old TODO, but if I remember correctly, it was a result of me omitting code as I copied logic from Shark. It would be good to understand what the job that was tested on a cluster is doing, and why it needs info in the hive conf. Just because long term we are moving further and further away from using hive code.

No objections though if this is fixing something.

@marmbrus - the job run a simple hive query to create a table. From what I understand from the code is the hiveConf is being initialized which is does not include spark.hadoo.* configurations and that hiveConf is being used to initialized the HadoopRDD. So the HadoopRDD does not contain any spark.hadoop.* configurations. This fix is meant to resolve that issue.

srowen · 2016-03-29T16:17:21Z

Jenkins test this please

SparkQA · 2016-03-29T18:38:09Z

Test build #54447 has finished for PR 11876 at commit 018eea6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-03-30T11:51:01Z

Jenkins retest this please

SparkQA · 2016-03-30T13:55:26Z

Test build #54515 has finished for PR 11876 at commit 018eea6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sitalkedia · 2016-03-30T14:21:15Z

Thanks @srowen, not sure why the test failed, will take a look.

…figurations to hive configuration.

sitalkedia · 2016-04-02T22:59:36Z

@srowen - Thanks for taking a look, updated the diff to fix the test case.

srowen · 2016-04-02T23:09:51Z

Jenkins retest this please

SparkQA · 2016-04-03T01:27:40Z

Test build #54784 has finished for PR 11876 at commit 98eee85.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-04-03T02:17:32Z

Merged to master

srowen reviewed Mar 22, 2016
View reviewed changes

sitalkedia force-pushed the hiveConf branch from 312ee1f to 88baa9f Compare March 22, 2016 15:58

holdenk reviewed Mar 22, 2016
View reviewed changes

sitalkedia force-pushed the hiveConf branch from 88baa9f to 318cad6 Compare March 22, 2016 23:59

srowen reviewed Mar 23, 2016
View reviewed changes

sitalkedia force-pushed the hiveConf branch from 318cad6 to 018eea6 Compare March 24, 2016 16:20

[SPARK-14056] Appends s3 specific configurations and spark.hadoop con…

98eee85

…figurations to hive configuration.

sitalkedia force-pushed the hiveConf branch from 018eea6 to 98eee85 Compare April 2, 2016 18:44

asfgit closed this in 1cf7018 Apr 3, 2016

[SPARK-14056] Appends s3 specific configurations and spark.hadoop con… #11876

[SPARK-14056] Appends s3 specific configurations and spark.hadoop con… #11876

Uh oh!

Conversation

sitalkedia commented Mar 22, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Mar 29, 2016

Uh oh!

SparkQA commented Mar 29, 2016

Uh oh!

srowen commented Mar 30, 2016

Uh oh!

SparkQA commented Mar 30, 2016

Uh oh!

sitalkedia commented Mar 30, 2016

Uh oh!

sitalkedia commented Apr 2, 2016

Uh oh!

srowen commented Apr 2, 2016

Uh oh!

SparkQA commented Apr 3, 2016

Uh oh!

srowen commented Apr 3, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants