[SPARK-23790][Mesos] fix metastore connection issue #20945

skonto · 2018-03-30T13:08:01Z

What changes were proposed in this pull request?

Adds delegations tokens to ugi if the proxy user exists. We run this early when the mesos backend starts.
Adds support for the proxy user in mesos client/cluster mode.
Fixes the HMS connection issue

How was this patch tested?

This was manually tested with a secured HDFS and by running the spark hive examples, both in client and cluster mode.

In cluster mode this was tested with a ticket cache by passing the following args:

--proxy-user nobody
--conf spark.mesos.driver.labels=DCOS_SPACE:/kerberized-spark
--conf spark.mesos.containerizer=mesos
--conf spark.mesos.driverEnv.SPARK_USER=nobody
--conf spark.mesos.driver.secret.names=/kerberized-spark/krb5cc_65534
--conf spark.mesos.driver.secret.filenames=krb5cc_65534
--conf spark.mesos.driverEnv.KRB5CCNAME=/mnt/mesos/sandbox/krb5cc_65534 \

skonto · 2018-03-30T13:12:50Z

@vanzin @susanxhuynh pls review, this probably needs to be backported to 2.3, as this is where a customer faced the issue. Yarn follows a different approach adds early the tokens at the ugi so no TGT is needed later on, still when I tried the same approach with mesos I hit the issue described in the ticket with the HadoopRDD (that RDD seems to be a permanent integration pain point). Not sure if this patch affects yarn at all.

SparkQA · 2018-03-30T13:14:29Z

Test build #88751 has finished for PR 20945 at commit 5aa7231.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-03-30T16:55:08Z

Test build #88752 has finished for PR 20945 at commit 5f7851c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2018-03-30T17:18:09Z

I don't think this is right. You do not want to start the session as the real user. That's why you're using a proxy user in the first place - to identify as someone else to external services.

Aren't you just missing the delegation token for the proxy user?

skonto · 2018-03-31T01:12:40Z

@vanzin ok let's see if I understand correctly, so Spark Job's main is run as a proxy user if the user exists, and then we use the real user for HiveDelegationTokenProvider just because the hive needs the real user to create the delegation token correctly. It cannot use the proxy user for that. I guess I can use it for the session state though.
Thank you for the feedback, I followed the yarn's approach and adapted the code to use the delegation tokens already obtained and updated this PR. Pls review.

skonto · 2018-03-31T03:13:12Z

...main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala

This will only happen if security is enabled and proxy user exists.

skonto · 2018-03-31T03:22:14Z

I attach the log files of the last run with the updated PR. Also updated the description.
proxy_client_mode.log
proxy_cluster_stderr.log

SparkQA · 2018-03-31T03:27:48Z

Test build #88776 has finished for PR 20945 at commit 76330eb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

skonto · 2018-04-01T01:29:09Z

...rs/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala

+    desc.conf.getOption("spark.mesos.proxyUser").foreach { v =>
+      options ++= Seq("--proxy-user", v)
+    }
+


In cluster mode we need to pass the proxy user to the dispatcher.

skonto · 2018-04-01T01:51:40Z

@vanzin @susanxhuynh I think its on the right path now.

SparkQA · 2018-04-01T04:50:48Z

Test build #88788 has finished for PR 20945 at commit 1060405.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

skonto · 2018-04-01T10:58:22Z

Failed unit test: org.apache.spark.launcher.LauncherServerSuite.testAppHandleDisconnect
Will re-test, this is not due to this patch.

skonto · 2018-04-01T10:59:38Z

retest this please

SparkQA · 2018-04-01T15:03:19Z

Test build #88794 has finished for PR 20945 at commit 1060405.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2018-04-02T23:04:38Z

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

    val jobCreds = conf.getCredentials()
-    jobCreds.mergeAll(UserGroupInformation.getCurrentUser().getCredentials())
+    val userCreds = UserGroupInformation.getCurrentUser().getCredentials()
+    logInfo(s"Adding user credentials: ${SparkHadoopUtil.get.dumpTokens(userCreds)}")


Don't use dumpTokens in an info message. Ok to add as debug if you want. But not really sure why you're adding this in this PR.

ok I saw it being used here, so I thought it would be helpful at the info level. The reason I added it there is I would like to see what credentials the HadoopRDD uses. There are different parts in code base where credentials are added and understanding what is happening can be confusing when looking at the logs of a job. Not clear to people that HadoopRDD fetches tokens on its own.

vanzin · 2018-04-02T23:05:42Z

core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala

    val jobConf = getJobConf()
    // add the credentials here as this can be called before SparkContext initialized
    SparkHadoopUtil.get.addCredentials(jobConf)
+    logInfo(s"HadoopRDD credentials: ${SparkHadoopUtil.get.dumpTokens(jobConf.getCredentials)}")


Commented above.

vanzin · 2018-04-02T23:14:12Z

...rs/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala

    }

+    desc.conf.getOption("spark.mesos.proxyUser").foreach { v =>
+      options ++= Seq("--proxy-user", v)


This looks a little odd. How's a cluster mode app run in Mesos?

Basically what I want to know:

which process starts the driver

what user that process is running as, and which user will the driver process run as

what kerberos credentials does it have and how are they managed

The gist is that running the Spark driver in client mode (which I think is how the driver in cluster mode is started eventually?) with a proxy user is a weird combination. It means the user code running in that driver has access to the credentials of the more privileged user - and could in its turn use those to run anything as any other user...

In comparison, YARN + cluster mode + proxy user starts the YARN application as the proxy user. So the user code, which only runs in a YARN container, has no access to the privileged credentials, which only exist in the launcher.

@vanzin

On DC/OS the spark dcos cli, which supports kerberos & keytab paths, submits jobs directly to the mesos rest api at the mesos dispatcher side. The keytabs are stored on the DC/OS secret store before the job is launched and they are mounted on the container before container is launched.
Thus, the idea here is to store the keytab for the superuser on the secret store, so the spark driver which is eventually launched in client mode within the cluster, to login to kerberos and impersonate another user. The driver will start the SparkJob's main as a proxy user (as usual) and will use the superuser credentials to impersonate the passed proxy user.
The driver is started by the mesos dispatcher and the mesos dispatcher does not have any access to keytabs, it just passes the spark config options. The driver can access a secret only if it is allowed to (this is controlled by DC/OS labels).
The OS user used by the container depends on the setup but that user should be the appropriate one.
Right now DC/OS switched back to root for Spark containers, previously it used nobody but users can customize the image to add their own users anyway.
You can change the user by passing --conf spark.mesos.driverEnv.SPARK_USER=.
Spark on Mesos uses that value if defined when setting up mesos tasks for the executors for example.
In containerized envs this adds extra headaches.
As a whole this is not that different from running this in client mode because in client mode as well I need to access the superuser's credentials somehow. The whole concept is migrated within a container and then the env (DC/OS) should make sure that the same user is consistent from the submit side all the way within the container and enforce restrictions. That is the intention here.

The other option to mimic yarn would be the spark submit to upload a locally created DT (to secret store) in the cluster and the driver to use that for impersonation. But this is not how things work on DC/OS deployments as Michael mentioned in the past here: https://issues.apache.org/jira/browse/SPARK-16742, you may not even have access to the keytab at the launcher side. Yarn has a different approach for that as you mentioned.
At the end of the day, if impersonation includes also launching the driver container as the proxy user then this can be supported with this PR by setting the appropriate user but it will have access to superuser's credentials, this is not ok. On the other hand, If impersonation for mesos starts within Spark at the integration level with the hadoop ecosystem (actually it starts with launching user's main with that user) then I dont see how this PR differs from mesos client mode with impersonation enabled.

@susanxhuynh feel free to add more on how DC/OS (mesos) handles the multi-tenancy story in general and user identity management.

The driver will start the SparkJob's main as a proxy user (as usual) and will use the superuser credentials to impersonate the passed proxy user.

That's a big problem, because, as I said, that makes the super user credentials available to untrusted user code. How do you prevent the user's Spark app from using those credentials?

On YARN cluster mode the super user's credentials are never available to the user application. (On client mode they are, but really, if you're using --proxy-user in client mode you're missing the point, or I hope you know what you're doing.)

Basically, you have a problem here you need to solve.

You either have to require kerberos creds on the launcher side, so you can upload DTs in cluster mode, or you need some level of separation between the code that launches the driver and the driver itself. The current system you have here is not secure at all - any user can just impersonate any other user, since they have access to the super user's credentials.

My problem here is that you're making spark-submit + proxy user + client mode the official way to > run Spark on Mesos in cluster mode, and now you're basically exposing everyone to that security > issue.

Yes because the assumption was client mode was safe. There is no warning about this especially for end users and I just started looking into the hadoop security part.
Anyway good to know will get back with an update thanx for the comments, discussing via comments is hard sometimes...

Yes because the assumption was client mode was safe. There is no warning about this

Could probably use something in the documentation - warnings printed to logs are easily ignored. Still, there are legitimate uses for client mode + proxy user, but I don't think this is one of them.

What are legitimate uses if it is not safe? Like knowing what your code does so its ok, spark shell?

require the launcher to have a kerberos login, and send DTs to the application. a.k.a. what >Spark-on-YARN does.
in the code that launches the driver on the Mesos side, create the DTs in a safe context (e.g. > not as part of the spark-submit invocation) and provide them to the Spark driver using the > HADOOP_TOKEN_FILE_LOCATION env var.

For the first option when I run the hive examples with yarn (EMR) in cluster mode (without a TGT) it did fail but it didnt require any credentials (no Spark code does that, its hadoop code). I got:

(Mechanism level: Failed to find any Kerberos tgt)

Coming from this line:

at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.jav> a:550)
at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:156)

So not sure what you mean here, unless you mean that and then if that check passes create the DTs at the launcher anyway.

What are legitimate uses if it is not safe? Like knowing what your code does so its ok

Yes. For example you can run some trusted code as a less-privileged user, so that you don't accidentally do something stupid as a super user.

(Mechanism level: Failed to find any Kerberos tgt)

That means you don't have any credentials (neither a tgt nor a dt). I don't know EMR so I don't know how to use it (with or without kerberos).

That means you don't have any credentials (neither a tgt nor a dt). I don't know EMR so I don't > know how to use it (with or without kerberos).

Yes that was the intention to check where Spark on Yarn code fails when there is no TGT (removed it with kdestroy). I am using it with kerberos.

susanxhuynh · 2018-04-02T23:40:24Z

@skonto Basic question: in your example above, which user does the "krb5cc_65534" ticket cache belong to? The superuser or the proxy-user ("nobody")?

skonto · 2018-04-03T13:03:27Z

@susanxhuynh AFIK the cache represents the ticket for the superuser since he needs to create a DT from his TGT for nobody to impersonate nobody. The superuser has the right to impersonate. The ticket cache replaces the need to kinit with the superuser's keytab. I had to rename it because I am running within a container as user nobody anyway (didnt want to add a superuser in the container for testing). My superuser is hive which does not exist on the DC/OS Spark container or the DC/OS nodes.
The filename of the cache depends on the OS user to be picked up, not ugi current user:

[hadoop@ip-10-0-9-161 ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_498
Default principal: nobody@LOCAL
Valid starting Expires Service principal
03/04/2018 11:15:37 03/04/2018 21:15:37 krbtgt/LOCAL@LOCAL
renew until 04/04/2018 11:15:37
[hadoop@ip-10-0-9-161 ~]$ id -u hadoop
498

In the above example the hadoop user has a ticketcache that has a suffix with his uid. On the other hand the cache contains a principal for nobody, it could be anything. As long as the ticket cache has a valid principal for user X and kerberos is used, then hadoop libraries will see user X as the authenticated one.

If I were to use a TGT with nobody user then I would get:

18/04/03 15:52:08 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: nobody@LOCAL is not allowed to impersonate nobody
at

Nobody is just an example here. You can use any other user as long as you have a superuser to impersonate him.

skonto · 2018-04-04T10:42:59Z

@vanzin @susanxhuynh

require the launcher to have a kerberos login, and send DTs to the application. a.k.a. what Spark-on-YARN does.

in the code that launches the driver on the Mesos side, create the DTs in a safe context (e.g. not as part of the spark-submit invocation) and provide them to the Spark driver using the HADOOP_TOKEN_FILE_LOCATION env var.

Option 1 above: Spark submit (launcher) could create the DTs locally as a utility function.
Then the user could upload them to secret store.
And then run the spark submit as usual referencing the DTs secret.
Due to https://issues.apache.org/jira/browse/SPARK-20982 we can not pass any info from the launcher side to the mesos dispatcher directly. Also we cannot write code in Spark Submit to integrate with secret store because this depends on the env. DC/OS has its own implementation and libs/API to integrate with it. Native mesos does not specify a secret store, you need to bring your own: http://mesos.apache.org/documentation/latest/secrets/

Option 2 above: means that the dispatcher should create the DTs and pass them to the driver’s container as secrets. That means it should be able to fetch the superusers TGT from the secret store create the delegation tokens in an isolated manner (other drivers could be launched in parallel) and store them back to the secret store so that the driver to be launched can use them. Again this would require for the mesos dispatcher to be integrated with DCOS APIs, for example to access the secret store you need an auth token to be passed and call a specific api https://docs.mesosphere.com/1.11.0/administration/secrets/secrets-api/.

Option 3: Spark submit (second one, client mode) within the container before it runs the user’s main it could create the DTs, save them to the local filesystem, point to them with HADOOP_TOKEN_FILE_LOCATION and then remove the TGT (/tmp/krb5cc_uid) (like kdestroy), so user code cannot use it to impersonate anyone. Could the user code fetch the TGT secret again from the secret store? It could if has access to /spark service's secrets. https://docs.mesosphere.com/services/spark/2.3.0-2.2.1-2/limitations/. @susanxhuynh would it possible to constraint this or all OS users within a driver's container can access all secrets given an auth token?

Option 4: Fix SPARK-20982 and pass DTs to the dispatcher in binary format, then store them to the secret store. The driver then can pick them up at launch time.

Thoughts? I am inclined to do 3 here if it is safe (minimal work). 1 is better but UX is ruined. 3,4 would bring unwanted dependencies, unless we fix this only at the DC/OS level. I checked but didnt see a mesos http API for the secret store.

susanxhuynh · 2018-04-04T18:05:46Z

(1) seems the most secure. How do we handle keytabs today in cluster mode in pure Mesos? Is it the same situation -- the keytab gets sent over a HTTP connection to the Dispatcher?

(3) Yes, the TGT secret would still be available from the secret store. There's currently no constraint based on a OS user.

skonto · 2018-04-04T18:14:43Z

@susanxhuynh @vanzin

(1) We have a problem here I agree, and yes it is more secure not to have the TGT anywhere near the user's code.

(3) The proxy user doAs in Spark submit uses java security manager and calls at the end of the day:
java.security.AccessController.doPrivileged.

I think (wild guess) we could restrict access to both the /tmp/... file for the TGT and the url pointing in the secret store.
https://www.techrepublic.com/article/java-security-policies-and-permission-management/
https://stackoverflow.com/questions/38974784/can-somebody-explain-what-is-the-role-of-urlpermission-class-in-java-1-8-in-clie
https://docs.oracle.com/javase/7/docs/api/java/io/FilePermission.html

Of course there is always jni and native code which could bypass this I guess or maybe Runtime.exec() or not? Can the sec. manager sandbox such cases? It seems yes for the latter:
https://stackoverflow.com/questions/29457939/java-block-runtime-exec

PS. Right now this can only be solved easily in DC/OS side I guess.

vanzin · 2018-04-04T23:18:04Z

SPARK-20982 doesn't look particularly hard to fix.

I don't understand the differences between plain Mesos and DC/OS so a lot of the things you're saying are over my head. I'm just concerned with the code that is present here in the Spark repo doing the right thing w.r.t. security, assuming whatever service it's talking to is secure.

skonto · 2018-04-05T10:58:09Z

@vanzin Sure we will try to comply the thing is pure mesos does not have an api for secrets only DC/OS has one and we cannot bring that api in the Spark project, otherwise I would just implement option 1) as with yarn and everyone would be happy and secure ;)
I will check SPARK-20982 as well, the issue there is that the dispatcher will need access to the secret store via some API which again does not exist in pure mesos, so any such solution would need to use dc/os libs or an api at the dispatcher side.

skonto · 2018-04-06T18:36:19Z

@susanxhuynh @vanzin
From what I see all secret stores I searched for provide an http API:
https://github.com/kubernetes/kubernetes/blob/09f321c80bfc9bca63a5530b56d7a1a3ba80ba9b/pkg/kubectl/cmd/util/factory_client_access.go#L473
https://www.vaultproject.io/api/index.html
https://docs.openshift.org/latest/rest_api/api/v1.Secret.html
https://v1-9.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#secret-v1-core
https://docs.mesosphere.com/1.11/security/ent/secrets/secrets-api/

So generating DTs at the first spark submit and then using an http API should be good enough, although all envs like k8s or DC/OS usually have a cli utility to do the job. That means only a few configuration options need to be passed like the api uri and some token for authentication (I assume). No real dependencies. This would require spark submit to be able to access the secret store's api (depends).
The alternative implementation would be (assuming SPARK-20982 is fixed), to pass the DTs to the dispatcher (this is always guaranteed to be accessible somehow) and then the dispatcher to access the secret store API so that it can store the DTs for the driver it is going to launch. The assumption here is that on pure mesos the dispatcher can access the secret store (not sure).

skonto · 2018-04-08T02:46:44Z

@susanxhuynh Unfortunately I cannot unify the APIs even for DC/OS, 1.10.x is different from 1.11.x (https://docs.mesosphere.com/services/spark/2.3.0-2.2.1-2/security/) and code is dependent on this (I played a bit with the DC/OS secret store API), not to mention other APIs out there. This would require a a generic secrets API at the pure mesos level (like in k8s) so I don't see a viable solution for now, unless I manage to restrict access to the TGT in client mode and essentially make it safe.

skonto · 2018-05-17T17:48:05Z

@vanzin here is the fix that works for DC/OS: d2iq-archive#26. It implements Yarn's approach.
Unfortunately I cannot bring it back here due to dependencies to the secret store. I will try fix: SPARK-20982. So feel free to close this.

vanzin · 2018-05-17T18:53:05Z

I actually can't close this, only you can.

If the DC/OS libraries are open source and something people can pull in by changing mesos.version or some other build-time parameter, you could potentially use reflection; we've done that many times for YARN.

But otherwise it'd be a little awkward to add the code to Spark.

skonto · 2018-05-19T18:21:24Z

@vanzin correct I will close it. The dependency is on a specific secret store api. So its mostly http calls which are DC/OS specific...

skonto force-pushed the fix_metastore branch from 5aa7231 to 5f7851c Compare March 30, 2018 13:21

skonto force-pushed the fix_metastore branch from 5f7851c to 76330eb Compare March 31, 2018 03:11

skonto commented Mar 31, 2018

View reviewed changes

fix metastore

1060405

skonto force-pushed the fix_metastore branch from 76330eb to 1060405 Compare April 1, 2018 01:23

skonto commented Apr 1, 2018

View reviewed changes

vanzin reviewed Apr 2, 2018

View reviewed changes

skonto closed this May 19, 2018

[SPARK-23790][Mesos] fix metastore connection issue #20945

[SPARK-23790][Mesos] fix metastore connection issue #20945

Uh oh!

Conversation

skonto commented Mar 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

skonto commented Mar 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Mar 30, 2018

Uh oh!

SparkQA commented Mar 30, 2018

Uh oh!

vanzin commented Mar 30, 2018

Uh oh!

skonto commented Mar 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto commented Mar 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Mar 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto commented Apr 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Apr 1, 2018

Uh oh!

skonto commented Apr 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skonto commented Apr 1, 2018

Uh oh!

SparkQA commented Apr 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto Apr 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto Apr 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto Apr 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto Apr 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto Apr 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skonto Apr 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

skonto commented Mar 30, 2018 •

edited

Loading

skonto commented Mar 30, 2018 •

edited

Loading

skonto commented Mar 31, 2018 •

edited

Loading

skonto commented Mar 31, 2018 •

edited

Loading

skonto commented Apr 1, 2018 •

edited

Loading

skonto commented Apr 1, 2018 •

edited

Loading

skonto Apr 3, 2018 •

edited

Loading

skonto Apr 3, 2018 •

edited

Loading

skonto Apr 3, 2018 •

edited

Loading

skonto Apr 3, 2018 •

edited

Loading

skonto Apr 4, 2018 •

edited

Loading

skonto Apr 4, 2018 •

edited

Loading

skonto commented Apr 3, 2018 •

edited

Loading

skonto commented Apr 4, 2018 •

edited

Loading

skonto commented Apr 4, 2018 •

edited

Loading

skonto commented Apr 5, 2018 •

edited

Loading

skonto commented Apr 6, 2018 •

edited

Loading

skonto commented May 17, 2018 •

edited

Loading

skonto commented May 19, 2018 •

edited

Loading