-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23790][Mesos] fix metastore connection issue #20945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@vanzin @susanxhuynh pls review, this probably needs to be backported to 2.3, as this is where a customer faced the issue. Yarn follows a different approach adds early the tokens at the ugi so no TGT is needed later on, still when I tried the same approach with mesos I hit the issue described in the ticket with the HadoopRDD (that RDD seems to be a permanent integration pain point). Not sure if this patch affects yarn at all. |
|
Test build #88751 has finished for PR 20945 at commit
|
|
Test build #88752 has finished for PR 20945 at commit
|
|
I don't think this is right. You do not want to start the session as the real user. That's why you're using a proxy user in the first place - to identify as someone else to external services. Aren't you just missing the delegation token for the proxy user? |
|
@vanzin ok let's see if I understand correctly, so Spark Job's main is run as a proxy user if the user exists, and then we use the real user for HiveDelegationTokenProvider just because the hive needs the real user to create the delegation token correctly. It cannot use the proxy user for that. I guess I can use it for the session state though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only happen if security is enabled and proxy user exists.
|
I attach the log files of the last run with the updated PR. Also updated the description. |
|
Test build #88776 has finished for PR 20945 at commit
|
| desc.conf.getOption("spark.mesos.proxyUser").foreach { v => | ||
| options ++= Seq("--proxy-user", v) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cluster mode we need to pass the proxy user to the dispatcher.
|
@vanzin @susanxhuynh I think its on the right path now. |
|
Test build #88788 has finished for PR 20945 at commit
|
|
Failed unit test: org.apache.spark.launcher.LauncherServerSuite.testAppHandleDisconnect |
|
retest this please |
|
Test build #88794 has finished for PR 20945 at commit
|
| val jobCreds = conf.getCredentials() | ||
| jobCreds.mergeAll(UserGroupInformation.getCurrentUser().getCredentials()) | ||
| val userCreds = UserGroupInformation.getCurrentUser().getCredentials() | ||
| logInfo(s"Adding user credentials: ${SparkHadoopUtil.get.dumpTokens(userCreds)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use dumpTokens in an info message. Ok to add as debug if you want. But not really sure why you're adding this in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I saw it being used here, so I thought it would be helpful at the info level. The reason I added it there is I would like to see what credentials the HadoopRDD uses. There are different parts in code base where credentials are added and understanding what is happening can be confusing when looking at the logs of a job. Not clear to people that HadoopRDD fetches tokens on its own.
| val jobConf = getJobConf() | ||
| // add the credentials here as this can be called before SparkContext initialized | ||
| SparkHadoopUtil.get.addCredentials(jobConf) | ||
| logInfo(s"HadoopRDD credentials: ${SparkHadoopUtil.get.dumpTokens(jobConf.getCredentials)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented above.
| } | ||
|
|
||
| desc.conf.getOption("spark.mesos.proxyUser").foreach { v => | ||
| options ++= Seq("--proxy-user", v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a little odd. How's a cluster mode app run in Mesos?
Basically what I want to know:
- which process starts the driver
- what user that process is running as, and which user will the driver process run as
- what kerberos credentials does it have and how are they managed
The gist is that running the Spark driver in client mode (which I think is how the driver in cluster mode is started eventually?) with a proxy user is a weird combination. It means the user code running in that driver has access to the credentials of the more privileged user - and could in its turn use those to run anything as any other user...
In comparison, YARN + cluster mode + proxy user starts the YARN application as the proxy user. So the user code, which only runs in a YARN container, has no access to the privileged credentials, which only exist in the launcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On DC/OS the spark dcos cli, which supports kerberos & keytab paths, submits jobs directly to the mesos rest api at the mesos dispatcher side. The keytabs are stored on the DC/OS secret store before the job is launched and they are mounted on the container before container is launched.
Thus, the idea here is to store the keytab for the superuser on the secret store, so the spark driver which is eventually launched in client mode within the cluster, to login to kerberos and impersonate another user. The driver will start the SparkJob's main as a proxy user (as usual) and will use the superuser credentials to impersonate the passed proxy user.
The driver is started by the mesos dispatcher and the mesos dispatcher does not have any access to keytabs, it just passes the spark config options. The driver can access a secret only if it is allowed to (this is controlled by DC/OS labels).
The OS user used by the container depends on the setup but that user should be the appropriate one.
Right now DC/OS switched back to root for Spark containers, previously it used nobody but users can customize the image to add their own users anyway.
You can change the user by passing --conf spark.mesos.driverEnv.SPARK_USER=.
Spark on Mesos uses that value if defined when setting up mesos tasks for the executors for example.
In containerized envs this adds extra headaches.
As a whole this is not that different from running this in client mode because in client mode as well I need to access the superuser's credentials somehow. The whole concept is migrated within a container and then the env (DC/OS) should make sure that the same user is consistent from the submit side all the way within the container and enforce restrictions. That is the intention here.
The other option to mimic yarn would be the spark submit to upload a locally created DT (to secret store) in the cluster and the driver to use that for impersonation. But this is not how things work on DC/OS deployments as Michael mentioned in the past here: https://issues.apache.org/jira/browse/SPARK-16742, you may not even have access to the keytab at the launcher side. Yarn has a different approach for that as you mentioned.
At the end of the day, if impersonation includes also launching the driver container as the proxy user then this can be supported with this PR by setting the appropriate user but it will have access to superuser's credentials, this is not ok. On the other hand, If impersonation for mesos starts within Spark at the integration level with the hadoop ecosystem (actually it starts with launching user's main with that user) then I dont see how this PR differs from mesos client mode with impersonation enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@susanxhuynh feel free to add more on how DC/OS (mesos) handles the multi-tenancy story in general and user identity management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The driver will start the SparkJob's main as a proxy user (as usual) and will use the superuser credentials to impersonate the passed proxy user.
That's a big problem, because, as I said, that makes the super user credentials available to untrusted user code. How do you prevent the user's Spark app from using those credentials?
On YARN cluster mode the super user's credentials are never available to the user application. (On client mode they are, but really, if you're using --proxy-user in client mode you're missing the point, or I hope you know what you're doing.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, you have a problem here you need to solve.
You either have to require kerberos creds on the launcher side, so you can upload DTs in cluster mode, or you need some level of separation between the code that launches the driver and the driver itself. The current system you have here is not secure at all - any user can just impersonate any other user, since they have access to the super user's credentials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My problem here is that you're making spark-submit + proxy user + client mode the official way to > run Spark on Mesos in cluster mode, and now you're basically exposing everyone to that security > issue.
Yes because the assumption was client mode was safe. There is no warning about this especially for end users and I just started looking into the hadoop security part.
Anyway good to know will get back with an update thanx for the comments, discussing via comments is hard sometimes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes because the assumption was client mode was safe. There is no warning about this
Could probably use something in the documentation - warnings printed to logs are easily ignored. Still, there are legitimate uses for client mode + proxy user, but I don't think this is one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are legitimate uses if it is not safe? Like knowing what your code does so its ok, spark shell?
require the launcher to have a kerberos login, and send DTs to the application. a.k.a. what >Spark-on-YARN does.
in the code that launches the driver on the Mesos side, create the DTs in a safe context (e.g. > not as part of the spark-submit invocation) and provide them to the Spark driver using the > HADOOP_TOKEN_FILE_LOCATION env var.
For the first option when I run the hive examples with yarn (EMR) in cluster mode (without a TGT) it did fail but it didnt require any credentials (no Spark code does that, its hadoop code). I got:
(Mechanism level: Failed to find any Kerberos tgt)
Coming from this line:
at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.jav> a:550)
at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:156)
So not sure what you mean here, unless you mean that and then if that check passes create the DTs at the launcher anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are legitimate uses if it is not safe? Like knowing what your code does so its ok
Yes. For example you can run some trusted code as a less-privileged user, so that you don't accidentally do something stupid as a super user.
(Mechanism level: Failed to find any Kerberos tgt)
That means you don't have any credentials (neither a tgt nor a dt). I don't know EMR so I don't know how to use it (with or without kerberos).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means you don't have any credentials (neither a tgt nor a dt). I don't know EMR so I don't > know how to use it (with or without kerberos).
Yes that was the intention to check where Spark on Yarn code fails when there is no TGT (removed it with kdestroy). I am using it with kerberos.
|
@skonto Basic question: in your example above, which user does the "krb5cc_65534" ticket cache belong to? The superuser or the proxy-user ("nobody")? |
|
@susanxhuynh AFIK the cache represents the ticket for the superuser since he needs to create a DT from his TGT for nobody to impersonate nobody. The superuser has the right to impersonate. The ticket cache replaces the need to kinit with the superuser's keytab. I had to rename it because I am running within a container as user nobody anyway (didnt want to add a superuser in the container for testing). My superuser is hive which does not exist on the DC/OS Spark container or the DC/OS nodes.
In the above example the hadoop user has a ticketcache that has a suffix with his uid. On the other hand the cache contains a principal for nobody, it could be anything. As long as the ticket cache has a valid principal for user X and kerberos is used, then hadoop libraries will see user X as the authenticated one. If I were to use a TGT with nobody user then I would get:
Nobody is just an example here. You can use any other user as long as you have a superuser to impersonate him. |
Option 1 above: Spark submit (launcher) could create the DTs locally as a utility function. Option 2 above: means that the dispatcher should create the DTs and pass them to the driver’s container as secrets. That means it should be able to fetch the superusers TGT from the secret store create the delegation tokens in an isolated manner (other drivers could be launched in parallel) and store them back to the secret store so that the driver to be launched can use them. Again this would require for the mesos dispatcher to be integrated with DCOS APIs, for example to access the secret store you need an auth token to be passed and call a specific api https://docs.mesosphere.com/1.11.0/administration/secrets/secrets-api/. Option 3: Spark submit (second one, client mode) within the container before it runs the user’s main it could create the DTs, save them to the local filesystem, point to them with HADOOP_TOKEN_FILE_LOCATION and then remove the TGT (/tmp/krb5cc_uid) (like kdestroy), so user code cannot use it to impersonate anyone. Could the user code fetch the TGT secret again from the secret store? It could if has access to /spark service's secrets. https://docs.mesosphere.com/services/spark/2.3.0-2.2.1-2/limitations/. @susanxhuynh would it possible to constraint this or all OS users within a driver's container can access all secrets given an auth token? Option 4: Fix SPARK-20982 and pass DTs to the dispatcher in binary format, then store them to the secret store. The driver then can pick them up at launch time. Thoughts? I am inclined to do 3 here if it is safe (minimal work). 1 is better but UX is ruined. 3,4 would bring unwanted dependencies, unless we fix this only at the DC/OS level. I checked but didnt see a mesos http API for the secret store. |
|
(1) seems the most secure. How do we handle keytabs today in cluster mode in pure Mesos? Is it the same situation -- the keytab gets sent over a HTTP connection to the Dispatcher? (3) Yes, the TGT secret would still be available from the secret store. There's currently no constraint based on a OS user. |
|
(1) We have a problem here I agree, and yes it is more secure not to have the TGT anywhere near the user's code. (3) The proxy user doAs in Spark submit uses java security manager and calls at the end of the day: I think (wild guess) we could restrict access to both the /tmp/... file for the TGT and the url pointing in the secret store. Of course there is always jni and native code which could bypass this I guess or maybe Runtime.exec() or not? Can the sec. manager sandbox such cases? It seems yes for the latter: PS. Right now this can only be solved easily in DC/OS side I guess. |
|
SPARK-20982 doesn't look particularly hard to fix. I don't understand the differences between plain Mesos and DC/OS so a lot of the things you're saying are over my head. I'm just concerned with the code that is present here in the Spark repo doing the right thing w.r.t. security, assuming whatever service it's talking to is secure. |
|
@vanzin Sure we will try to comply the thing is pure mesos does not have an api for secrets only DC/OS has one and we cannot bring that api in the Spark project, otherwise I would just implement option 1) as with yarn and everyone would be happy and secure ;) |
|
@susanxhuynh @vanzin So generating DTs at the first spark submit and then using an http API should be good enough, although all envs like k8s or DC/OS usually have a cli utility to do the job. That means only a few configuration options need to be passed like the api uri and some token for authentication (I assume). No real dependencies. This would require spark submit to be able to access the secret store's api (depends). |
|
@susanxhuynh Unfortunately I cannot unify the APIs even for DC/OS, 1.10.x is different from 1.11.x (https://docs.mesosphere.com/services/spark/2.3.0-2.2.1-2/security/) and code is dependent on this (I played a bit with the DC/OS secret store API), not to mention other APIs out there. This would require a a generic secrets API at the pure mesos level (like in k8s) so I don't see a viable solution for now, unless I manage to restrict access to the TGT in client mode and essentially make it safe. |
|
@vanzin here is the fix that works for DC/OS: d2iq-archive#26. It implements Yarn's approach. |
|
I actually can't close this, only you can. If the DC/OS libraries are open source and something people can pull in by changing But otherwise it'd be a little awkward to add the code to Spark. |
|
@vanzin correct I will close it. The dependency is on a specific secret store api. So its mostly http calls which are DC/OS specific... |
What changes were proposed in this pull request?
How was this patch tested?
This was manually tested with a secured HDFS and by running the spark hive examples, both in client and cluster mode.
In cluster mode this was tested with a ticket cache by passing the following args: