Skip to content

Conversation

@sitalkedia
Copy link

What changes were proposed in this pull request?

Appends s3 specific configurations and spark.hadoop configurations to hive configuration.

How was this patch tested?

Tested by running a job on cluster.

…figurations to hive configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits: "S3-specific", "a Hadoop". It doesn't need to return a Configuration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also mention the buffer keys that are also added that don't fit into either of those.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! changed it accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra "spark." here in "spark.spark.buffer.size".
This might be a dumb question, but it feels kind of funny that we set these things when a Configuration is created everywhere except one place. Is it because the HiveConf necessarily comes from somewhere else that Spark isn't initializing? Just trying to rationalize treating it specially in this one case

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed extra "spark." .

You are right, HiveConf is being initialized in a separate code path which Spark isn't initializing properly. I am not very familiar with hive side of things to comment on why it was done that way. But the TODO in TableReader.scala suggests that it is the right place to initialize the HiveConf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking reasonable. If we're able to get a comment from @marmbrus about this particular aspect of the change (he put in the todo in 9aadcff ) maybe that would confirm that this does need a special treatment and so this change makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very old TODO, but if I remember correctly, it was a result of me omitting code as I copied logic from Shark. It would be good to understand what the job that was tested on a cluster is doing, and why it needs info in the hive conf. Just because long term we are moving further and further away from using hive code.

No objections though if this is fixing something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marmbrus - the job run a simple hive query to create a table. From what I understand from the code is the hiveConf is being initialized which is does not include spark.hadoo.* configurations and that hiveConf is being used to initialized the HadoopRDD. So the HadoopRDD does not contain any spark.hadoop.* configurations. This fix is meant to resolve that issue.

@srowen
Copy link
Member

srowen commented Mar 29, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Mar 29, 2016

Test build #54447 has finished for PR 11876 at commit 018eea6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 30, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Mar 30, 2016

Test build #54515 has finished for PR 11876 at commit 018eea6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sitalkedia
Copy link
Author

Thanks @srowen, not sure why the test failed, will take a look.

@sitalkedia
Copy link
Author

@srowen - Thanks for taking a look, updated the diff to fix the test case.

@srowen
Copy link
Member

srowen commented Apr 2, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Apr 3, 2016

Test build #54784 has finished for PR 11876 at commit 98eee85.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 3, 2016

Merged to master

@asfgit asfgit closed this in 1cf7018 Apr 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants