-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of tokens in yarn client mode #17335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change-Id: If423f3fdc709ed3284cafc01efd1fe389f635560
|
Test build #74740 has finished for PR 17335 at commit
|
|
Thank you, @jerryshao . I'll test on this. |
| * example, when using proxying). | ||
| */ | ||
| private[spark] def doAsRealUser[T](fn: => T): T = { | ||
| val currentUser = UserGroupInformation.getCurrentUser() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... I'm not so sure this will work in all cases. Can you test this with both spark.sql.hive.metastore.jars and spark.sql.hive.metastore.version set?
The problem is that this class is loaded by Spark's main class loader, while HiveClientImpl comes from a different class loader. So UserGroupInformation might be a different class in certain cases. It's the same reasoning why HiveClientImpl class does its own loginUserFromKeytab around L110.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Let me take a try. But I'm guessing this is the only place where the issue can be handled from Spark side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin I tried with Above two configurations, though having some class not found issue in our HDP environment, but metastore connect can be correct established without GSSAPI tgt not found issue. Tried with spark.sql.hive.metastore.jars=maven, spark.sql.hive.metastore.version=1.2.1 and 2.0.1.
17/03/20 03:35:48 INFO metastore: Trying to connect to metastore with URI thrift://c6402.ambari.apache.org:9083
17/03/20 03:35:48 INFO metastore: Opened a connection to metastore, current connections: 1
17/03/20 03:35:48 INFO metastore: Connected to metastore.
|
The dbs and tbls may be created on hdfs via the real user,so that the proxy user may have no rights to things such as: Which means each |
|
Thanks @yaooqinn , that's really an issue here. That was my concern when I had this fix, since we wrap the whole |
Change-Id: I84897d0b14fc69a68a70a6341e64c4c0a8188cba
Change-Id: Iaa917493ac596e8497394fa89b900d47a94f7da2
|
@yaooqinn , pushed another way to fix this issue, I think with this new fix hdfs folder owner should be the right user (proxy user). |
|
Test build #74946 has finished for PR 17335 at commit
|
Change-Id: I6be6be7b1e9a4580e5e1eeab8aac451ea830ef8b
|
with this creds provided by HiveCredentialProvider and configured by I guess that |
|
Test build #74954 has finished for PR 17335 at commit
|
|
@yaooqinn , You only need one principal ('spark.yarn.principal') (for example principal "[email protected]") to get authentication from different services, the configurations for hive and NN mentioned above is only for this two services, it is not for user who submits Spark application. For the user who launches Spark application, only |
|
I have tested this with my kerberized hdfs and it works for me. LGTM, thanks. |
|
Ping @vanzin , mind reviewing again? Thanks a lot. |
|
Broaden this issue a bit. Currently in driver side (client mode), issued delegation tokens are not added into current ugi, this makes follow-up hdfs/metastore/hbase communication still use tgt instead of delegation tokens, this is unnecessary and should be avoided, since we already get tokens in yarn#client. |
|
https://issues.apache.org/jira/browse/SPARK-15754 will this patch cause this problem? |
|
I've no idea about that issue, the description is so vague ("Resource Manager cancels the Delegation Token after 10 minutes of shutting down the spark context."). Not pretty sure the scenario mentioned in that jira. |
|
@subrotosanyal would you please help to describe #13499 in detail?Thanks |
|
hi @yaooqinn , The fix tries to remove the code where the Spark job specific credentials are added to current |
|
@subrotosanyal would you please elaborate more about this:
What will be happened when RM expired the tokens, also when will this be happened? |
|
I'm not sure if I understand your scenario correctly. In your case Spark application is embeded into your own application, your application is still worked after Spark is stopped. And because delegation tokens is expired explicitly after yarn app is finished. Then your following hdfs operations which honor delegation tokens will be failed, so you have to use tgt rather than delegation tokens. Am I right? I guess it related to this JIRA (https://issues.apache.org/jira/browse/YARN-2964). It may already be fixed in yarn side. But with your fix, proxy user is not worked. And I think to handle your scenario, we could deliberately remove all the tokens in current UGI after application is finished. So that your following hdfs operations could honor tgt to get new tokens. |
|
So I had to dig up some e-mails to refresh my brain about SPARK-15754. It is not related to YARN-2964 (that one is for things like Oozie, where the same token is using by multiple YARN apps). It's related to YARN cancelling tokens after apps finish (or a group of apps sharing the same token, in case of YARN-2964). So, in the embedded case, something like this: The problem is caused by Spark adding the tokens to the current UGI, and the UGI API has no way to remove them. So when you start the new context, the code will try to use the tokens in the UGI and fail because they've been cancelled. Allowing Spark to overwrite the current UGI's credentials seems to fix a bunch of issues, and is obviously fine for everybody using Let me dig up some code from my e-mail and see if I can reproduce the original issue and find a workaround... |
|
I was able to write some code that should work for your use case even without the fix for SPARK-15754. I reverted that change and ran the following code a few times in the same JVM: (Where Each iteration starts with no tokens and finishes with an HDFS delegation token, so it seems to have the behavior you want. With that being said, if reverting the fix for SPARK-15754 fixes the Hive token issue, we should probably do that since there seems to be a way for things to work in the embedded case. |
|
@jerryshao the PR description seems to be out of sync with the current code, can you update it? |
|
Thanks @vanzin , I agree with you. The scenario what @subrotosanyal mentioned is a little bit customized, so this problem might be better to handle out of Spark Sure, I will update it. |
|
@jerryshao the description is still about the initial version of the patch, not the current code. |
|
Sorry @vanzin about it. Just update the description, please review again. Thanks a lot. |
|
LGTM, merging to master / 2.1. |
…g of tokens in yarn client mode ## What changes were proposed in this pull request? In the current Spark on YARN code, we will obtain tokens from provided services, but we're not going to add these tokens to the current user's credentials. This will make all the following operations to these services still require TGT rather than delegation tokens. This is unnecessary since we already got the tokens, also this will lead to failure in user impersonation scenario, because the TGT is granted by real user, not proxy user. So here changing to put all the tokens to the current UGI, so that following operations to these services will honor tokens rather than TGT, and this will further handle the proxy user issue mentioned above. ## How was this patch tested? Local verified in secure cluster. vanzin tgravescs mridulm dongjoon-hyun please help to review, thanks a lot. Author: jerryshao <[email protected]> Closes #17335 from jerryshao/SPARK-19995. (cherry picked from commit 17eddb3) Signed-off-by: Marcelo Vanzin <[email protected]>
|
Is this patch will work for spark-sql --master local mode as well. In our environment localmode is not supporting proxy user where as yarn mode looks ok. Do we have a solution for proxy user support on localmode |
|
I wonder why am i still facing with this problem on spark 3.2.2. Please tell me how to fix it. Thank you |
What changes were proposed in this pull request?
In the current Spark on YARN code, we will obtain tokens from provided services, but we're not going to add these tokens to the current user's credentials. This will make all the following operations to these services still require TGT rather than delegation tokens. This is unnecessary since we already got the tokens, also this will lead to failure in user impersonation scenario, because the TGT is granted by real user, not proxy user.
So here changing to put all the tokens to the current UGI, so that following operations to these services will honor tokens rather than TGT, and this will further handle the proxy user issue mentioned above.
How was this patch tested?
Local verified in secure cluster.
@vanzin @tgravescs @mridulm @dongjoon-hyun please help to review, thanks a lot.