Skip to content

Conversation

@jerryshao
Copy link
Contributor

@jerryshao jerryshao commented Mar 17, 2017

What changes were proposed in this pull request?

In the current Spark on YARN code, we will obtain tokens from provided services, but we're not going to add these tokens to the current user's credentials. This will make all the following operations to these services still require TGT rather than delegation tokens. This is unnecessary since we already got the tokens, also this will lead to failure in user impersonation scenario, because the TGT is granted by real user, not proxy user.

So here changing to put all the tokens to the current UGI, so that following operations to these services will honor tokens rather than TGT, and this will further handle the proxy user issue mentioned above.

How was this patch tested?

Local verified in secure cluster.

@vanzin @tgravescs @mridulm @dongjoon-hyun please help to review, thanks a lot.

Change-Id: If423f3fdc709ed3284cafc01efd1fe389f635560
@SparkQA
Copy link

SparkQA commented Mar 17, 2017

Test build #74740 has finished for PR 17335 at commit d31dcb3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you, @jerryshao . I'll test on this.

* example, when using proxying).
*/
private[spark] def doAsRealUser[T](fn: => T): T = {
val currentUser = UserGroupInformation.getCurrentUser()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I'm not so sure this will work in all cases. Can you test this with both spark.sql.hive.metastore.jars and spark.sql.hive.metastore.version set?

The problem is that this class is loaded by Spark's main class loader, while HiveClientImpl comes from a different class loader. So UserGroupInformation might be a different class in certain cases. It's the same reasoning why HiveClientImpl class does its own loginUserFromKeytab around L110.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let me take a try. But I'm guessing this is the only place where the issue can be handled from Spark side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin I tried with Above two configurations, though having some class not found issue in our HDP environment, but metastore connect can be correct established without GSSAPI tgt not found issue. Tried with spark.sql.hive.metastore.jars=maven, spark.sql.hive.metastore.version=1.2.1 and 2.0.1.

17/03/20 03:35:48 INFO metastore: Trying to connect to metastore with URI thrift://c6402.ambari.apache.org:9083
17/03/20 03:35:48 INFO metastore: Opened a connection to metastore, current connections: 1
17/03/20 03:35:48 INFO metastore: Connected to metastore.

@yaooqinn
Copy link
Member

The dbs and tbls may be created on hdfs via the real user,so that the proxy user may have no rights to things such as:

Error: java.lang.RuntimeException: Cannot create staging directory 'hdfs://hz-test01/user/hive/warehouse/hzyaoqin.db/src2/.hive-staging_hive_2017-03-20_22-43-44_189_8479160175818973314-1': Permission denied: user=hzyaoqin, access=WRITE, inode="/user/hive/warehouse/hzyaoqin.db/src2/.hive-staging_hive_2017-03-20_22-43-44_189_8479160175818973314-1":hive:hdfs:drwxr-xr-x

Which means each w required hdfs operation has to be doAsRealUser. In that case, proxy-users may could visit other proxy ones data. Do you meet that error while using intert/ctas?

@jerryshao
Copy link
Contributor Author

Thanks @yaooqinn , that's really an issue here. That was my concern when I had this fix, since we wrap the whole SessionState.start with real user, it means all the operations inside this start will be executed as real user, ideally we should only wrap the metastore connection code.

Change-Id: I84897d0b14fc69a68a70a6341e64c4c0a8188cba
Change-Id: Iaa917493ac596e8497394fa89b900d47a94f7da2
@jerryshao
Copy link
Contributor Author

jerryshao commented Mar 21, 2017

@yaooqinn , pushed another way to fix this issue, I think with this new fix hdfs folder owner should be the right user (proxy user).

@SparkQA
Copy link

SparkQA commented Mar 21, 2017

Test build #74946 has finished for PR 17335 at commit 11a1094.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Change-Id: I6be6be7b1e9a4580e5e1eeab8aac451ea830ef8b
@yaooqinn
Copy link
Member

with this creds provided by HiveCredentialProvider and configured by hive.metastore.kerberos.principal, do we need to re-login with spark.yarn.principal aiming to connect metastore?

I guess that spark.yarn.principal is used to auth yarn's rm to submit apps, hive.metastore.kerberos.principal to metastore, and dfs.namenode.kerberos.principal to namenode, all these and other principals are used by spark driver to connect different services, am I right?

@SparkQA
Copy link

SparkQA commented Mar 21, 2017

Test build #74954 has finished for PR 17335 at commit e9b5580.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

jerryshao commented Mar 21, 2017

@yaooqinn , You only need one principal ('spark.yarn.principal') (for example principal "[email protected]") to get authentication from different services, the configurations for hive and NN mentioned above is only for this two services, it is not for user who submits Spark application.

For the user who launches Spark application, only spark.yarn.principal could represent this user.

@yaooqinn
Copy link
Member

I have tested this with my kerberized hdfs and it works for me. LGTM, thanks.

@jerryshao
Copy link
Contributor Author

Ping @vanzin , mind reviewing again? Thanks a lot.

@jerryshao
Copy link
Contributor Author

Broaden this issue a bit. Currently in driver side (client mode), issued delegation tokens are not added into current ugi, this makes follow-up hdfs/metastore/hbase communication still use tgt instead of delegation tokens, this is unnecessary and should be avoided, since we already get tokens in yarn#client.

@yaooqinn
Copy link
Member

https://issues.apache.org/jira/browse/SPARK-15754 will this patch cause this problem?

@jerryshao
Copy link
Contributor Author

I've no idea about that issue, the description is so vague ("Resource Manager cancels the Delegation Token after 10 minutes of shutting down the spark context."). Not pretty sure the scenario mentioned in that jira.

@yaooqinn
Copy link
Member

@subrotosanyal would you please help to describe #13499 in detail?Thanks

@subrotosanyal
Copy link

hi @yaooqinn ,
This is a scenario where Spark is embed in client application (spark-client mode).
In the method Client#createContainerLaunchContext (), the credentials(delegation tokens) obtained to run the spark application is added to current UserGroupInformation(refer the the deleted line in the PR) which shouldn't be the case. UserGroupInformation is a static global object which once changed at any point of application id reflected throughout the JVM. Further, the delegation tokens so added are also passed to the YARN platform (specifically ResourceManager). Resource Manager expires the tokens of an application after a certain period of time lead to expiration of the token which is part of the Client which submitted the Spark Job.

The fix tries to remove the code where the Spark job specific credentials are added to current UserGroupInformation.

@jerryshao
Copy link
Contributor Author

@subrotosanyal would you please elaborate more about this:

Resource Manager expires the tokens of an application after a certain period of time lead to expiration of the token which is part of the Client which submitted the Spark Job.

What will be happened when RM expired the tokens, also when will this be happened?

@jerryshao
Copy link
Contributor Author

jerryshao commented Mar 23, 2017

@subrotosanyal

I'm not sure if I understand your scenario correctly. In your case Spark application is embeded into your own application, your application is still worked after Spark is stopped. And because delegation tokens is expired explicitly after yarn app is finished. Then your following hdfs operations which honor delegation tokens will be failed, so you have to use tgt rather than delegation tokens. Am I right?

I guess it related to this JIRA (https://issues.apache.org/jira/browse/YARN-2964). It may already be fixed in yarn side.

But with your fix, proxy user is not worked. And I think to handle your scenario, we could deliberately remove all the tokens in current UGI after application is finished. So that your following hdfs operations could honor tgt to get new tokens.

@vanzin
Copy link
Contributor

vanzin commented Mar 23, 2017

So I had to dig up some e-mails to refresh my brain about SPARK-15754. It is not related to YARN-2964 (that one is for things like Oozie, where the same token is using by multiple YARN apps). It's related to YARN cancelling tokens after apps finish (or a group of apps sharing the same token, in case of YARN-2964).

So, in the embedded case, something like this:

val sc1 = new SparkContext("yarn-client")
// do stuff
sc1.stop()

// wait a bit

// The following will fail because YARN will have cancelled the old delegation tokens
// which are still in the current UGI object.
val sc2 = new SparkContext("yarn-client")

The problem is caused by Spark adding the tokens to the current UGI, and the UGI API has no way to remove them. So when you start the new context, the code will try to use the tokens in the UGI and fail because they've been cancelled.

Allowing Spark to overwrite the current UGI's credentials seems to fix a bunch of issues, and is obviously fine for everybody using spark-submit. But I wonder if there's a way to avoid that in these applications that embed Spark without requiring them to manage their own delegation tokens.

Let me dig up some code from my e-mail and see if I can reproduce the original issue and find a workaround...

@vanzin
Copy link
Contributor

vanzin commented Mar 24, 2017

@subrotosanyal

I was able to write some code that should work for your use case even without the fix for SPARK-15754. I reverted that change and ran the following code a few times in the same JVM:

    PrivilegedExceptionAction<Void> action = () -> {
      dumpTokens("before");
      runSpark();
      dumpTokens("after");
      return null;
    };
    UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab);
    ugi.doAs(action);

(Where dumpTokens prints the tokens in the UGI, and runSpark starts a SparkContext and stops it.)

Each iteration starts with no tokens and finishes with an HDFS delegation token, so it seems to have the behavior you want.

With that being said, if reverting the fix for SPARK-15754 fixes the Hive token issue, we should probably do that since there seems to be a way for things to work in the embedded case.

@vanzin
Copy link
Contributor

vanzin commented Mar 24, 2017

@jerryshao the PR description seems to be out of sync with the current code, can you update it?

@jerryshao
Copy link
Contributor Author

Thanks @vanzin , I agree with you. The scenario what @subrotosanyal mentioned is a little bit customized, so this problem might be better to handle out of Spark

Sure, I will update it.

@jerryshao jerryshao changed the title [SPARK-19995][Hive][Yarn] Using real user to initialize hive SessionState [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of tokens in yarn client mode Mar 24, 2017
@vanzin
Copy link
Contributor

vanzin commented Mar 24, 2017

@jerryshao the description is still about the initial version of the patch, not the current code.

@jerryshao
Copy link
Contributor Author

Sorry @vanzin about it. Just update the description, please review again. Thanks a lot.

@vanzin
Copy link
Contributor

vanzin commented Mar 28, 2017

LGTM, merging to master / 2.1.

asfgit pushed a commit that referenced this pull request Mar 28, 2017
…g of tokens in yarn client mode

## What changes were proposed in this pull request?

In the current Spark on YARN code, we will obtain tokens from provided services, but we're not going to add these tokens to the current user's credentials. This will make all the following operations to these services still require TGT rather than delegation tokens. This is unnecessary since we already got the tokens, also this will lead to failure in user impersonation scenario, because the TGT is granted by real user, not proxy user.

So here changing to put all the tokens to the current UGI, so that following operations to these services will honor tokens rather than TGT, and this will further handle the proxy user issue mentioned above.

## How was this patch tested?

Local verified in secure cluster.

vanzin tgravescs mridulm  dongjoon-hyun please help to review, thanks a lot.

Author: jerryshao <[email protected]>

Closes #17335 from jerryshao/SPARK-19995.

(cherry picked from commit 17eddb3)
Signed-off-by: Marcelo Vanzin <[email protected]>
@asfgit asfgit closed this in 17eddb3 Mar 28, 2017
@rajeshcode
Copy link

Is this patch will work for spark-sql --master local mode as well.

In our environment localmode is not supporting proxy user where as yarn mode looks ok. Do we have a solution for proxy user support on localmode

@AnhQuanTran
Copy link

I wonder why am i still facing with this problem on spark 3.2.2. Please tell me how to fix it. Thank you

https://stackoverflow.com/questions/73984517/spark-thrift-3-2-2-impersonate-user-facing-error-with-metastore-authen-sasl-neg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants