Skip to content
This repository was archived by the owner on Oct 23, 2024. It is now read-only.

Conversation

@skonto
Copy link

@skonto skonto commented May 9, 2018

What changes were proposed in this pull request?

Fixes the --proxy-user issue.

@susanxhuynh I added the fix so we can discuss the cli implementation options and agree on something.
One basic option is move the secrets code in this PR to dcos Spark cli but that would require
the cli to download the spark distro, run spark submit code to generate the DTs (without the rest submission part) and then upload them as secrets to secret store.
I remember that in the past cli used to download the distro.
Design is attached: design.pdf

Note: this can be merged directly here but it will bring dependencies in and will work only with spark-submit in cluster mode.

This patch also fixes:
a) the issue with the requirement of the keytab to exist locally at the spark submit side in cluster mode.
b) partially SPARK-20982​

How was this patch tested?

Find attached the instructions: README_TESTS.md.txt

Copy link
Author

@skonto skonto May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here was due to the messed up release. This method was empty and caused failures when HadoopRDD was used!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanx

@justinrlee justinrlee requested a review from susanxhuynh May 9, 2018 21:50
Copy link
Author

@skonto skonto May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another issue the customer faced before the proxy user implementation. I follow the Yarn approach here: https://github.com/apache/spark/pull/17335/files

Copy link

@susanxhuynh susanxhuynh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skonto Looks good. I left some questions for you.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does UserGroupInformation.isSecurityEnabled get set to True?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it is either PROXY or KERBEROS not SIMPLE.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this get the TGT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, proxy user works in client mode as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if you look at the test results yes.

@skonto
Copy link
Author

skonto commented May 19, 2018

@susanxhuynh I will update the PR since there are some conflicts, but let me know how I should proceed regarding for example the dcos Spark cli unless we provide this with Spark Submit only?

@skonto skonto changed the title [WIP] --proxy-user fix Support --proxy-user on mesos Jun 7, 2018
@skonto skonto changed the title Support --proxy-user on mesos Support --proxy-user in cluster mode on DC/OS Jun 7, 2018
@justinrlee
Copy link

TID 10537 and TID 10538

@susanxhuynh
Copy link

Passing CI now.

@susanxhuynh
Copy link

Merging as is with spark-submit support only. DC/OS Spark CLI support not included in this PR.

@susanxhuynh susanxhuynh merged commit 3d31341 into d2iq-archive:custom-branch-2.2.1-X Jun 22, 2018
samvantran added a commit that referenced this pull request Aug 29, 2018
samvantran added a commit that referenced this pull request Aug 29, 2018
samvantran added a commit that referenced this pull request Aug 30, 2018
yaooqinn pushed a commit to apache/spark that referenced this pull request Mar 8, 2023
…8s in cluster deploy mode

### What changes were proposed in this pull request?

The PR fixes the authentication failure of the proxy user on driver side while accessing kerberized hdfs through spark on k8s job. It follows the similar approach as it was done for Mesos: d2iq-archive#26

 ### Why are the changes needed?

When we try to access the kerberized HDFS through a proxy user in Spark Job running in cluster deploy mode with Kubernetes resource manager, we encounter AccessControlException. This is because  authentication in driver is done using tokens of the proxy user and since proxy user doesn't have any delegation tokens on driver, auth fails.

Further details:
https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532063&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532063

 https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532135&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532135

 ### Does this PR introduce _any_ user-facing change?

Yes, user will now be able to use proxy-user to access kerberized hdfs with Spark on K8s.

### How was this patch tested?

The patch was tested by:

1. Running job which accesses kerberized hdfs with proxy user in cluster mode and client mode with kubernetes resource manager.

2. Running job which accesses kerberized hdfs without proxy user in cluster mode and client mode with kubernetes resource manager.

3. Build and run test github action : https://github.com/shrprasa/spark/actions/runs/3051203625

Closes #37880 from shrprasa/proxy_user_fix.

Authored-by: Shrikant Prasad <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
yaooqinn pushed a commit to apache/spark that referenced this pull request Mar 8, 2023
…8s in cluster deploy mode

### What changes were proposed in this pull request?

The PR fixes the authentication failure of the proxy user on driver side while accessing kerberized hdfs through spark on k8s job. It follows the similar approach as it was done for Mesos: d2iq-archive#26

 ### Why are the changes needed?

When we try to access the kerberized HDFS through a proxy user in Spark Job running in cluster deploy mode with Kubernetes resource manager, we encounter AccessControlException. This is because  authentication in driver is done using tokens of the proxy user and since proxy user doesn't have any delegation tokens on driver, auth fails.

Further details:
https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532063&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532063

 https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532135&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532135

 ### Does this PR introduce _any_ user-facing change?

Yes, user will now be able to use proxy-user to access kerberized hdfs with Spark on K8s.

### How was this patch tested?

The patch was tested by:

1. Running job which accesses kerberized hdfs with proxy user in cluster mode and client mode with kubernetes resource manager.

2. Running job which accesses kerberized hdfs without proxy user in cluster mode and client mode with kubernetes resource manager.

3. Build and run test github action : https://github.com/shrprasa/spark/actions/runs/3051203625

Closes #37880 from shrprasa/proxy_user_fix.

Authored-by: Shrikant Prasad <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
(cherry picked from commit b3b3557)
Signed-off-by: Kent Yao <[email protected]>
yaooqinn pushed a commit to apache/spark that referenced this pull request Mar 8, 2023
…8s in cluster deploy mode

### What changes were proposed in this pull request?

The PR fixes the authentication failure of the proxy user on driver side while accessing kerberized hdfs through spark on k8s job. It follows the similar approach as it was done for Mesos: d2iq-archive#26

 ### Why are the changes needed?

When we try to access the kerberized HDFS through a proxy user in Spark Job running in cluster deploy mode with Kubernetes resource manager, we encounter AccessControlException. This is because  authentication in driver is done using tokens of the proxy user and since proxy user doesn't have any delegation tokens on driver, auth fails.

Further details:
https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532063&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532063

 https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532135&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532135

 ### Does this PR introduce _any_ user-facing change?

Yes, user will now be able to use proxy-user to access kerberized hdfs with Spark on K8s.

### How was this patch tested?

The patch was tested by:

1. Running job which accesses kerberized hdfs with proxy user in cluster mode and client mode with kubernetes resource manager.

2. Running job which accesses kerberized hdfs without proxy user in cluster mode and client mode with kubernetes resource manager.

3. Build and run test github action : https://github.com/shrprasa/spark/actions/runs/3051203625

Closes #37880 from shrprasa/proxy_user_fix.

Authored-by: Shrikant Prasad <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
(cherry picked from commit b3b3557)
Signed-off-by: Kent Yao <[email protected]>
yaooqinn pushed a commit to apache/spark that referenced this pull request Mar 8, 2023
…8s in cluster deploy mode

### What changes were proposed in this pull request?

The PR fixes the authentication failure of the proxy user on driver side while accessing kerberized hdfs through spark on k8s job. It follows the similar approach as it was done for Mesos: d2iq-archive#26

 ### Why are the changes needed?

When we try to access the kerberized HDFS through a proxy user in Spark Job running in cluster deploy mode with Kubernetes resource manager, we encounter AccessControlException. This is because  authentication in driver is done using tokens of the proxy user and since proxy user doesn't have any delegation tokens on driver, auth fails.

Further details:
https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532063&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532063

 https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532135&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532135

 ### Does this PR introduce _any_ user-facing change?

Yes, user will now be able to use proxy-user to access kerberized hdfs with Spark on K8s.

### How was this patch tested?

The patch was tested by:

1. Running job which accesses kerberized hdfs with proxy user in cluster mode and client mode with kubernetes resource manager.

2. Running job which accesses kerberized hdfs without proxy user in cluster mode and client mode with kubernetes resource manager.

3. Build and run test github action : https://github.com/shrprasa/spark/actions/runs/3051203625

Closes #37880 from shrprasa/proxy_user_fix.

Authored-by: Shrikant Prasad <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
(cherry picked from commit b3b3557)
Signed-off-by: Kent Yao <[email protected]>
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…8s in cluster deploy mode

### What changes were proposed in this pull request?

The PR fixes the authentication failure of the proxy user on driver side while accessing kerberized hdfs through spark on k8s job. It follows the similar approach as it was done for Mesos: d2iq-archive#26

 ### Why are the changes needed?

When we try to access the kerberized HDFS through a proxy user in Spark Job running in cluster deploy mode with Kubernetes resource manager, we encounter AccessControlException. This is because  authentication in driver is done using tokens of the proxy user and since proxy user doesn't have any delegation tokens on driver, auth fails.

Further details:
https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532063&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532063

 https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532135&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532135

 ### Does this PR introduce _any_ user-facing change?

Yes, user will now be able to use proxy-user to access kerberized hdfs with Spark on K8s.

### How was this patch tested?

The patch was tested by:

1. Running job which accesses kerberized hdfs with proxy user in cluster mode and client mode with kubernetes resource manager.

2. Running job which accesses kerberized hdfs without proxy user in cluster mode and client mode with kubernetes resource manager.

3. Build and run test github action : https://github.com/shrprasa/spark/actions/runs/3051203625

Closes apache#37880 from shrprasa/proxy_user_fix.

Authored-by: Shrikant Prasad <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
(cherry picked from commit b3b3557)
Signed-off-by: Kent Yao <[email protected]>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…8s in cluster deploy mode

### What changes were proposed in this pull request?

The PR fixes the authentication failure of the proxy user on driver side while accessing kerberized hdfs through spark on k8s job. It follows the similar approach as it was done for Mesos: d2iq-archive#26

 ### Why are the changes needed?

When we try to access the kerberized HDFS through a proxy user in Spark Job running in cluster deploy mode with Kubernetes resource manager, we encounter AccessControlException. This is because  authentication in driver is done using tokens of the proxy user and since proxy user doesn't have any delegation tokens on driver, auth fails.

Further details:
https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532063&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532063

 https://issues.apache.org/jira/browse/SPARK-25355?focusedCommentId=17532135&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17532135

 ### Does this PR introduce _any_ user-facing change?

Yes, user will now be able to use proxy-user to access kerberized hdfs with Spark on K8s.

### How was this patch tested?

The patch was tested by:

1. Running job which accesses kerberized hdfs with proxy user in cluster mode and client mode with kubernetes resource manager.

2. Running job which accesses kerberized hdfs without proxy user in cluster mode and client mode with kubernetes resource manager.

3. Build and run test github action : https://github.com/shrprasa/spark/actions/runs/3051203625

Closes apache#37880 from shrprasa/proxy_user_fix.

Authored-by: Shrikant Prasad <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
(cherry picked from commit b3b3557)
Signed-off-by: Kent Yao <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants