-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23790][Mesos] fix metastore connection issue #20945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -197,6 +197,7 @@ class HadoopRDD[K, V]( | |
| val jobConf = getJobConf() | ||
| // add the credentials here as this can be called before SparkContext initialized | ||
| SparkHadoopUtil.get.addCredentials(jobConf) | ||
| logInfo(s"HadoopRDD credentials: ${SparkHadoopUtil.get.dumpTokens(jobConf.getCredentials)}") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Commented above. |
||
| val allInputSplits = getInputFormat(jobConf).getSplits(jobConf, minPartitions) | ||
| val inputSplits = if (ignoreEmptySplits) { | ||
| allInputSplits.filter(_.getLength > 0) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -506,6 +506,10 @@ private[spark] class MesosClusterScheduler( | |
| options ++= Seq("--class", desc.command.mainClass) | ||
| } | ||
|
|
||
| desc.conf.getOption("spark.mesos.proxyUser").foreach { v => | ||
| options ++= Seq("--proxy-user", v) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks a little odd. How's a cluster mode app run in Mesos? Basically what I want to know:
The gist is that running the Spark driver in client mode (which I think is how the driver in cluster mode is started eventually?) with a proxy user is a weird combination. It means the user code running in that driver has access to the credentials of the more privileged user - and could in its turn use those to run anything as any other user... In comparison, YARN + cluster mode + proxy user starts the YARN application as the proxy user. So the user code, which only runs in a YARN container, has no access to the privileged credentials, which only exist in the launcher.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On DC/OS the spark dcos cli, which supports kerberos & keytab paths, submits jobs directly to the mesos rest api at the mesos dispatcher side. The keytabs are stored on the DC/OS secret store before the job is launched and they are mounted on the container before container is launched. The other option to mimic yarn would be the spark submit to upload a locally created DT (to secret store) in the cluster and the driver to use that for impersonation. But this is not how things work on DC/OS deployments as Michael mentioned in the past here: https://issues.apache.org/jira/browse/SPARK-16742, you may not even have access to the keytab at the launcher side. Yarn has a different approach for that as you mentioned.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @susanxhuynh feel free to add more on how DC/OS (mesos) handles the multi-tenancy story in general and user identity management.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's a big problem, because, as I said, that makes the super user credentials available to untrusted user code. How do you prevent the user's Spark app from using those credentials? On YARN cluster mode the super user's credentials are never available to the user application. (On client mode they are, but really, if you're using
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Basically, you have a problem here you need to solve. You either have to require kerberos creds on the launcher side, so you can upload DTs in cluster mode, or you need some level of separation between the code that launches the driver and the driver itself. The current system you have here is not secure at all - any user can just impersonate any other user, since they have access to the super user's credentials.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes because the assumption was client mode was safe. There is no warning about this especially for end users and I just started looking into the hadoop security part.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Could probably use something in the documentation - warnings printed to logs are easily ignored. Still, there are legitimate uses for client mode + proxy user, but I don't think this is one of them.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are legitimate uses if it is not safe? Like knowing what your code does so its ok, spark shell?
For the first option when I run the hive examples with yarn (EMR) in cluster mode (without a TGT) it did fail but it didnt require any credentials (no Spark code does that, its hadoop code). I got:
Coming from this line:
So not sure what you mean here, unless you mean that and then if that check passes create the DTs at the launcher anyway.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes. For example you can run some trusted code as a less-privileged user, so that you don't accidentally do something stupid as a super user.
That means you don't have any credentials (neither a tgt nor a dt). I don't know EMR so I don't know how to use it (with or without kerberos).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes that was the intention to check where Spark on Yarn code fails when there is no TGT (removed it with kdestroy). I am using it with kerberos. |
||
| } | ||
|
|
||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In cluster mode we need to pass the proxy user to the dispatcher. |
||
| desc.conf.getOption("spark.executor.memory").foreach { v => | ||
| options ++= Seq("--executor-memory", v) | ||
| } | ||
|
|
@@ -521,6 +525,7 @@ private[spark] class MesosClusterScheduler( | |
|
|
||
| // --conf | ||
| val replicatedOptionsBlacklist = Set( | ||
| "spark.mesos.proxyUser", | ||
| "spark.jars", // Avoids duplicate classes in classpath | ||
| "spark.submit.deployMode", // this would be set to `cluster`, but we need client | ||
| "spark.master" // this contains the address of the dispatcher, not master | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,6 +31,7 @@ import org.apache.mesos.Protos.{TaskInfo => MesosTaskInfo, _} | |
| import org.apache.mesos.SchedulerDriver | ||
|
|
||
| import org.apache.spark.{SecurityManager, SparkConf, SparkContext, SparkException, TaskState} | ||
| import org.apache.spark.deploy.SparkHadoopUtil | ||
| import org.apache.spark.deploy.mesos.config._ | ||
| import org.apache.spark.internal.config | ||
| import org.apache.spark.launcher.{LauncherBackend, SparkAppHandle} | ||
|
|
@@ -62,6 +63,8 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( | |
| private lazy val hadoopDelegationTokenManager: MesosHadoopDelegationTokenManager = | ||
| new MesosHadoopDelegationTokenManager(conf, sc.hadoopConfiguration, driverEndpoint) | ||
|
|
||
| private val isProxyUser = SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) | ||
|
|
||
| // Blacklist a slave after this many failures | ||
| private val MAX_SLAVE_FAILURES = 2 | ||
|
|
||
|
|
@@ -194,8 +197,12 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( | |
| super.start() | ||
|
|
||
| if (sc.deployMode == "client") { | ||
| if (isProxyUser) { | ||
| fetchHadoopDelegationTokens() | ||
|
||
| } | ||
| launcherBackend.connect() | ||
| } | ||
|
|
||
| val startedBefore = IdHelper.startedBefore.getAndSet(true) | ||
|
|
||
| val suffix = if (startedBefore) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use
dumpTokensin an info message. Ok to add as debug if you want. But not really sure why you're adding this in this PR.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I saw it being used here, so I thought it would be helpful at the info level. The reason I added it there is I would like to see what credentials the HadoopRDD uses. There are different parts in code base where credentials are added and understanding what is happening can be confusing when looking at the logs of a job. Not clear to people that HadoopRDD fetches tokens on its own.