[SPARK-16742] Mesos Kerberos Support #18519

mgummelt · 2017-07-03T23:19:04Z

What changes were proposed in this pull request?

Add Kerberos Support to Mesos. This includes kinit and --keytab support, but does not include delegation token renewal.

How was this patch tested?

Manually against a Secure DC/OS Apache HDFS cluster.

SparkQA · 2017-07-03T23:23:59Z

Test build #79111 has finished for PR 18519 at commit a9d8998.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-07-04T01:56:49Z

Test build #79112 has finished for PR 18519 at commit 5c59daa.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-07-04T05:45:20Z

Not a big deal but could we fix the PR title to be a bit more descriptive?

mgummelt · 2017-07-04T07:23:12Z

Whoops, fixed.

vanzin

Did a quick first pass. Is there anything here that's unit testable?

vanzin · 2017-07-11T19:53:17Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

 * a layer over the different cluster managers and deploy modes that Spark supports.
 */
-object SparkSubmit extends CommandLineUtils {
+object SparkSubmit extends CommandLineUtils with Logging {


You can't do this. This breaks the logging configuration of spark-shell and other shells (which is WARN by default instead of INFO).

vanzin · 2017-07-11T19:54:41Z

core/src/main/scala/org/apache/spark/deploy/security/CredentialsSerializer.scala

+
+import org.apache.hadoop.security.Credentials
+
+class CredentialsSerializer {


private[spark]? Also feels like this could be just a couple of methods in SparkHadoopUtil instead of a new separate class.

vanzin · 2017-07-11T19:55:10Z

core/src/main/scala/org/apache/spark/deploy/security/CredentialsSerializer.scala

+  }
+
+  def deserialize(tokenBytes: Array[Byte]): Credentials = {
+    val tokensBuf = new java.io.ByteArrayInputStream(tokenBytes)


Why not import it like other classes?

vanzin · 2017-07-11T19:57:35Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala

+    logInfo(s"Adding ${creds.numberOfTokens()} tokens and ${creds.numberOfSecretKeys()} secret" +
+      s"keys to the current user's credentials.")
+
+    UserGroupInformation.getCurrentUser().addCredentials(creds)


This looks like SparkHadoopUtil.addCurrentUserCredentials. That's only implemented on YarnSparkHadoopUtil for historical reasons, but since we dropped Hadoop 1.x support, the implementation can move to core/ now, and you'd avoid this copy of that code.

vanzin · 2017-07-11T19:58:08Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

 private[spark]
-class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: RpcEnv)
+class CoarseGrainedSchedulerBackend(
+  scheduler: TaskSchedulerImpl,


nit: one more indent level.

vanzin · 2017-07-11T19:58:31Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

          "messages.")))
    }

+


nit: not needed

vanzin · 2017-07-11T19:58:48Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

   * of requesting a delta of executors risks double counting new executors when there are
   * insufficient resources to satisfy the first request. We make the assumption here that the
   * cluster manager will eventually fulfill all requests when resources free up.
-   *


nit: leave these as they were.

vanzin · 2017-07-11T20:01:57Z

...main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala

+    sc.env.rpcEnv,
+    Some(new HadoopDelegationTokenManager(
+      sc.conf,
+      SparkHadoopUtil.get.newConfiguration(sc.conf))))


sc.hadoopConfiguration?

ArtRand · 2017-07-18T23:57:03Z

Hello @vanzin, I'm taking over Michael's Spark duties at Mesosphere and will be addressing the comments on this PR. Should be able to have the revisions done in the next day or so. Thanks for the patience.

SparkQA · 2017-07-21T16:43:59Z

Test build #79842 has finished for PR 18519 at commit 8662057.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-07-21T18:01:47Z

Test build #79847 has finished for PR 18519 at commit f903e6f.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-07-21T20:29:39Z

Test build #79849 has finished for PR 18519 at commit e6a7357.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-07-24T23:26:54Z

Test build #79914 has finished for PR 18519 at commit 4ba8bab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ArtRand · 2017-07-24T23:36:59Z

@vanzin all green, ready for a check.

ArtRand · 2017-08-04T14:35:32Z

@vanzin any thoughts on this and the related discussion?

vanzin · 2017-08-04T22:27:18Z

I've been pretty busy, probably can get to this sometime next week.

vanzin · 2017-08-08T21:08:14Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

+    val shortUserName = UserGroupInformation.getCurrentUser.getShortUserName
+    val key = s"spark.hadoop.${YarnConfiguration.RM_PRINCIPAL}"
+    // scalastyle:off println
+    printStream.println(s"Setting ${key} to ${shortUserName}")


Do you want this to be printed out every time someone runs spark-submit? Sounds a bit noisy.

It only prints when UserGroupInformation.isSecurityEnabled and I think it's useful information whenever a job is run.

vanzin · 2017-08-08T21:10:02Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

+    scheduler: TaskSchedulerImpl,
+    val rpcEnv: RpcEnv,
+    hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager])
+    extends ExecutorAllocationClient with SchedulerBackend with Logging


nit: unindent this line one level.

vanzin · 2017-08-08T21:14:13Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

+class CoarseGrainedSchedulerBackend(
+    scheduler: TaskSchedulerImpl,
+    val rpcEnv: RpcEnv,
+    hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager])


I'm a little torn on having this as a constructor argument. It seems cleaner at first, but it kinda make the constructor of sub-classes (like the mesos one) kinda ugly.

How about having a protected val hadoopDelegationTokenManager = None and overriding it where needed? That makes initialization in the sub-class more readable.

ArtRand · 2017-08-11T15:30:18Z

@vanzin Fixed this up. Please have a look. Thanks.

vanzin · 2017-08-11T17:23:47Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

-class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: RpcEnv)
-  extends ExecutorAllocationClient with SchedulerBackend with Logging
-{
+class CoarseGrainedSchedulerBackend(


You're not changing anything here now, are you?

vanzin · 2017-08-11T17:26:49Z

resource-managers/mesos/pom.xml

      <scope>test</scope>
    </dependency>

+    <dependency>


Is this really needed?

I don't see you adding specific tests for this, so wonder why you need the explicit dependency when other modules that depend on spark-core don't.

Yes, MesosClusterManagerSuite creates a MesosCoarseGrainedSchedulerBackend which contains a HadoopDelegationTokenManager... etc.

Hmm, ok... the credential manager code should be safe when Hive classes aren't present, but if there's a problem in that area it's not your fault.

vanzin · 2017-08-11T17:27:03Z

...main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala

+  override val hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager] =
+    Some(new HadoopDelegationTokenManager(sc.conf, sc.hadoopConfiguration))
+
+  override val hadoopDelegationCreds: Option[Array[Byte]] = getHadoopDelegationCreds()


No need to override this guy.

vanzin · 2017-08-11T17:27:22Z

...main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala

+    with org.apache.mesos.Scheduler
+    with MesosSchedulerUtils {
+
+  override val hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager] =


protected

ArtRand · 2017-08-11T20:16:43Z

Hello @vanzin thanks for the reviews, I believe I've addressed your comments. I also added support for the user to pass a ticket-granting ticket instead of a key tab. It's a very small change and I tested it against kerberized HDFS. Thanks.

SparkQA · 2017-08-11T23:06:10Z

Test build #80551 has finished for PR 18519 at commit 63ca4db.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: RpcEnv)

vanzin · 2017-08-11T23:09:25Z

I also added support for the user to pass a ticket-granting ticket instead of a key tab

It'd be better to avoid adding new features after the patch has been reviewed and is mostly ready for checking in.

For example, you added a feature that is not necessary. UserGroupInformation automatically loads the kerberos ticket cache from its default location, or you can set KRB5CCNAME in your environment if you want to use a custom location.

SparkQA · 2017-08-11T23:30:01Z

Test build #80552 has finished for PR 18519 at commit 4a86186.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-12T06:38:58Z

Test build #80565 has finished for PR 18519 at commit 857cf31.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-12T06:55:22Z

Test build #80566 has finished for PR 18519 at commit 1d7ddbd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-12T18:51:13Z

Test build #80573 has finished for PR 18519 at commit 4c77d54.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

ArtRand · 2017-08-13T00:19:41Z

Hello @vanzin, point taken. I reverted the change. The envvar trick works also, thanks for that.

SparkQA · 2017-08-13T02:03:54Z

Test build #80574 has finished for PR 18519 at commit 685e976.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-08-15T20:11:13Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

  @volatile protected var currentExecutorIdCounter = 0

+  // hadoop token manager used by some sub-classes (e.g. Mesos)
+  protected var hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager] = None


This should be a val. Just override it in the subclass.

vanzin · 2017-08-15T20:11:51Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

+  protected var hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager] = None
+
+  // Hadoop delegation tokens to be sent to the executors.
+  protected var hadoopDelegationCreds: Option[Array[Byte]] = None


This should be a val; you don't need to set in in the subclass. Because there might be some initialization order issue, it might need to be a lazy val.

So, another option here is have this be a val and have hadoopDelegationTokenManager be a def. The latter means there's no initialization order issue (so no need for the lazy val hack). Since this is really only done once, that should be fine.

vanzin · 2017-08-17T22:07:24Z

LGTM pending tests.

I'm not yet super happy with the internal API in CoarseGrainedSchedulerBackend; sorry for going back and forth on that, but I guess when another implementation starts using that functionality we'll have a better idea of what it should look like.

vanzin · 2017-08-17T22:27:52Z

(I did some basic local testing on secure YARN, just in case, and it looks good.)

SparkQA · 2017-08-17T22:33:12Z

Test build #80802 has finished for PR 18519 at commit c3050d1.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class FeatureHasher(@Since(\"2.3.0\") override val uid: String) extends Transformer
sealed abstract class SummaryBuilder
case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends LeafNode
case class HiveTableRelation(

vanzin · 2017-08-17T22:46:20Z

Merging to master.

rvesse · 2017-08-31T16:13:06Z

@ArtRand Any plans to add delegation token renewal under Mesos in the future?

jerryshao · 2017-09-05T09:05:59Z

@ArtRand @vanzin , is it only worked in client deploy mode, am I understanding correctly? I don't see a code to ship tokens from local client to remote driver.

Add Kerberos Support to Mesos. This includes kinit and --keytab support, but does not include delegation token renewal. Manually against a Secure DC/OS Apache HDFS cluster. Author: ArtRand <[email protected]> Author: Michael Gummelt <[email protected]> Closes apache#18519 from mgummelt/SPARK-16742-kerberos.

Michael Gummelt added 8 commits July 3, 2017 13:05

Mesos Kerberos support

ff55b89

.

64ab5d5

.

b1a90a9

.

794d26e

.

860351d

comments

7e12dea

Add CredentialsSerializer

973dce2

remove extra comments

a9d8998

style

5c59daa

mgummelt changed the title ~~[SPARK-16742] kerberos~~ [SPARK-16742] Mesos Kerberos Support Jul 4, 2017

vanzin reviewed Jul 11, 2017

View reviewed changes

ArtRand added 2 commits July 20, 2017 15:54

addressed comments

5848a78

merge master

8662057

style

f903e6f

fixed old dep

e6a7357

add dep to mesos project

4ba8bab

ifilonenko mentioned this pull request Jul 25, 2017

Secure HDFS Support requires most recent PRs apache-spark-on-k8s/spark#390

Open

vanzin reviewed Aug 8, 2017

View reviewed changes

vanzin reviewed Aug 11, 2017

View reviewed changes

ArtRand added 2 commits August 11, 2017 12:55

wip, addressed comments and added tgt support

63ca4db

added some docs for tgt

4a86186

ArtRand added 3 commits August 11, 2017 20:23

reverted tgt changes

1f3ad35

re-added space in security.md

857cf31

whitespace

1d7ddbd

small formatting change

4c77d54

merge master

685e976

vanzin reviewed Aug 15, 2017

View reviewed changes

merge master, address comments

c3050d1

asfgit closed this in bfdc361 Aug 17, 2017


		import org.apache.hadoop.security.Credentials

		class CredentialsSerializer {

[SPARK-16742] Mesos Kerberos Support #18519

[SPARK-16742] Mesos Kerberos Support #18519

Uh oh!

Conversation

mgummelt commented Jul 3, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jul 3, 2017

Uh oh!

SparkQA commented Jul 4, 2017

Uh oh!

HyukjinKwon commented Jul 4, 2017

Uh oh!

mgummelt commented Jul 4, 2017

Uh oh!

vanzin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArtRand commented Jul 18, 2017

Uh oh!

SparkQA commented Jul 21, 2017

Uh oh!

SparkQA commented Jul 21, 2017

Uh oh!

SparkQA commented Jul 21, 2017

Uh oh!

SparkQA commented Jul 24, 2017

Uh oh!

ArtRand commented Jul 24, 2017

Uh oh!

ArtRand commented Aug 4, 2017

Uh oh!

vanzin commented Aug 4, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArtRand commented Aug 11, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArtRand commented Aug 11, 2017

Uh oh!

SparkQA commented Aug 11, 2017

Uh oh!

vanzin commented Aug 11, 2017

Uh oh!

SparkQA commented Aug 11, 2017