Skip to content

Conversation

@mgummelt
Copy link

@mgummelt mgummelt commented Jul 3, 2017

What changes were proposed in this pull request?

Add Kerberos Support to Mesos. This includes kinit and --keytab support, but does not include delegation token renewal.

How was this patch tested?

Manually against a Secure DC/OS Apache HDFS cluster.

@SparkQA
Copy link

SparkQA commented Jul 3, 2017

Test build #79111 has finished for PR 18519 at commit a9d8998.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 4, 2017

Test build #79112 has finished for PR 18519 at commit 5c59daa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Not a big deal but could we fix the PR title to be a bit more descriptive?

@mgummelt mgummelt changed the title [SPARK-16742] kerberos [SPARK-16742] Mesos Kerberos Support Jul 4, 2017
@mgummelt
Copy link
Author

mgummelt commented Jul 4, 2017

Whoops, fixed.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick first pass. Is there anything here that's unit testable?

* a layer over the different cluster managers and deploy modes that Spark supports.
*/
object SparkSubmit extends CommandLineUtils {
object SparkSubmit extends CommandLineUtils with Logging {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do this. This breaks the logging configuration of spark-shell and other shells (which is WARN by default instead of INFO).


import org.apache.hadoop.security.Credentials

class CredentialsSerializer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark]? Also feels like this could be just a couple of methods in SparkHadoopUtil instead of a new separate class.

}

def deserialize(tokenBytes: Array[Byte]): Credentials = {
val tokensBuf = new java.io.ByteArrayInputStream(tokenBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not import it like other classes?

logInfo(s"Adding ${creds.numberOfTokens()} tokens and ${creds.numberOfSecretKeys()} secret" +
s"keys to the current user's credentials.")

UserGroupInformation.getCurrentUser().addCredentials(creds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like SparkHadoopUtil.addCurrentUserCredentials. That's only implemented on YarnSparkHadoopUtil for historical reasons, but since we dropped Hadoop 1.x support, the implementation can move to core/ now, and you'd avoid this copy of that code.

private[spark]
class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: RpcEnv)
class CoarseGrainedSchedulerBackend(
scheduler: TaskSchedulerImpl,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one more indent level.

"messages.")))
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not needed

* of requesting a delta of executors risks double counting new executors when there are
* insufficient resources to satisfy the first request. We make the assumption here that the
* cluster manager will eventually fulfill all requests when resources free up.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: leave these as they were.

sc.env.rpcEnv,
Some(new HadoopDelegationTokenManager(
sc.conf,
SparkHadoopUtil.get.newConfiguration(sc.conf))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sc.hadoopConfiguration?

@ArtRand
Copy link

ArtRand commented Jul 18, 2017

Hello @vanzin, I'm taking over Michael's Spark duties at Mesosphere and will be addressing the comments on this PR. Should be able to have the revisions done in the next day or so. Thanks for the patience.

@SparkQA
Copy link

SparkQA commented Jul 21, 2017

Test build #79842 has finished for PR 18519 at commit 8662057.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 21, 2017

Test build #79847 has finished for PR 18519 at commit f903e6f.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 21, 2017

Test build #79849 has finished for PR 18519 at commit e6a7357.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2017

Test build #79914 has finished for PR 18519 at commit 4ba8bab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ArtRand
Copy link

ArtRand commented Jul 24, 2017

@vanzin all green, ready for a check.

@ArtRand
Copy link

ArtRand commented Aug 4, 2017

@vanzin any thoughts on this and the related discussion?

@vanzin
Copy link
Contributor

vanzin commented Aug 4, 2017

I've been pretty busy, probably can get to this sometime next week.

val shortUserName = UserGroupInformation.getCurrentUser.getShortUserName
val key = s"spark.hadoop.${YarnConfiguration.RM_PRINCIPAL}"
// scalastyle:off println
printStream.println(s"Setting ${key} to ${shortUserName}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want this to be printed out every time someone runs spark-submit? Sounds a bit noisy.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only prints when UserGroupInformation.isSecurityEnabled and I think it's useful information whenever a job is run.

scheduler: TaskSchedulerImpl,
val rpcEnv: RpcEnv,
hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager])
extends ExecutorAllocationClient with SchedulerBackend with Logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unindent this line one level.

class CoarseGrainedSchedulerBackend(
scheduler: TaskSchedulerImpl,
val rpcEnv: RpcEnv,
hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little torn on having this as a constructor argument. It seems cleaner at first, but it kinda make the constructor of sub-classes (like the mesos one) kinda ugly.

How about having a protected val hadoopDelegationTokenManager = None and overriding it where needed? That makes initialization in the sub-class more readable.

@ArtRand
Copy link

ArtRand commented Aug 11, 2017

@vanzin Fixed this up. Please have a look. Thanks.

class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: RpcEnv)
extends ExecutorAllocationClient with SchedulerBackend with Logging
{
class CoarseGrainedSchedulerBackend(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not changing anything here now, are you?

<scope>test</scope>
</dependency>

<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed?

I don't see you adding specific tests for this, so wonder why you need the explicit dependency when other modules that depend on spark-core don't.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, MesosClusterManagerSuite creates a MesosCoarseGrainedSchedulerBackend which contains a HadoopDelegationTokenManager... etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, ok... the credential manager code should be safe when Hive classes aren't present, but if there's a problem in that area it's not your fault.

override val hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager] =
Some(new HadoopDelegationTokenManager(sc.conf, sc.hadoopConfiguration))

override val hadoopDelegationCreds: Option[Array[Byte]] = getHadoopDelegationCreds()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to override this guy.

with org.apache.mesos.Scheduler
with MesosSchedulerUtils {

override val hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected

@ArtRand
Copy link

ArtRand commented Aug 11, 2017

Hello @vanzin thanks for the reviews, I believe I've addressed your comments. I also added support for the user to pass a ticket-granting ticket instead of a key tab. It's a very small change and I tested it against kerberized HDFS. Thanks.

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80551 has finished for PR 18519 at commit 63ca4db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: RpcEnv)

@vanzin
Copy link
Contributor

vanzin commented Aug 11, 2017

I also added support for the user to pass a ticket-granting ticket instead of a key tab

It'd be better to avoid adding new features after the patch has been reviewed and is mostly ready for checking in.

For example, you added a feature that is not necessary. UserGroupInformation automatically loads the kerberos ticket cache from its default location, or you can set KRB5CCNAME in your environment if you want to use a custom location.

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80552 has finished for PR 18519 at commit 4a86186.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 12, 2017

Test build #80565 has finished for PR 18519 at commit 857cf31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 12, 2017

Test build #80566 has finished for PR 18519 at commit 1d7ddbd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 12, 2017

Test build #80573 has finished for PR 18519 at commit 4c77d54.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ArtRand
Copy link

ArtRand commented Aug 13, 2017

Hello @vanzin, point taken. I reverted the change. The envvar trick works also, thanks for that.

@SparkQA
Copy link

SparkQA commented Aug 13, 2017

Test build #80574 has finished for PR 18519 at commit 685e976.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@volatile protected var currentExecutorIdCounter = 0

// hadoop token manager used by some sub-classes (e.g. Mesos)
protected var hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a val. Just override it in the subclass.

protected var hadoopDelegationTokenManager: Option[HadoopDelegationTokenManager] = None

// Hadoop delegation tokens to be sent to the executors.
protected var hadoopDelegationCreds: Option[Array[Byte]] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a val; you don't need to set in in the subclass. Because there might be some initialization order issue, it might need to be a lazy val.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, another option here is have this be a val and have hadoopDelegationTokenManager be a def. The latter means there's no initialization order issue (so no need for the lazy val hack). Since this is really only done once, that should be fine.

@vanzin
Copy link
Contributor

vanzin commented Aug 17, 2017

LGTM pending tests.

I'm not yet super happy with the internal API in CoarseGrainedSchedulerBackend; sorry for going back and forth on that, but I guess when another implementation starts using that functionality we'll have a better idea of what it should look like.

@vanzin
Copy link
Contributor

vanzin commented Aug 17, 2017

(I did some basic local testing on secure YARN, just in case, and it looks good.)

@SparkQA
Copy link

SparkQA commented Aug 17, 2017

Test build #80802 has finished for PR 18519 at commit c3050d1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class FeatureHasher(@Since(\"2.3.0\") override val uid: String) extends Transformer
  • sealed abstract class SummaryBuilder
  • case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends LeafNode
  • case class HiveTableRelation(

@vanzin
Copy link
Contributor

vanzin commented Aug 17, 2017

Merging to master.

@asfgit asfgit closed this in bfdc361 Aug 17, 2017
@rvesse
Copy link
Member

rvesse commented Aug 31, 2017

@ArtRand Any plans to add delegation token renewal under Mesos in the future?

@jerryshao
Copy link
Contributor

@ArtRand @vanzin , is it only worked in client deploy mode, am I understanding correctly? I don't see a code to ship tokens from local client to remote driver.

susanxhuynh pushed a commit to d2iq-archive/spark that referenced this pull request Jan 8, 2018
Add Kerberos Support to Mesos.   This includes kinit and --keytab support, but does not include delegation token renewal.

Manually against a Secure DC/OS Apache HDFS cluster.

Author: ArtRand <[email protected]>
Author: Michael Gummelt <[email protected]>

Closes apache#18519 from mgummelt/SPARK-16742-kerberos.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants