-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-4666] Improve YarnAllocator's parsing of "memory overhead" param #3525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3e1cc29
cba802c
0a717d7
6635945
96dfec0
5a3e4b8
63086eb
f54b0ce
fa3d69f
f265d15
c29da1d
23a77be
5057bd3
14bd3d5
6e69b08
48c5115
dc03bf2
40ac6ce
dd9be85
bb66b22
960b525
e038867
2ebb55a
50f0f52
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ package org.apache.spark.scheduler.local | |
| import java.nio.ByteBuffer | ||
|
|
||
| import scala.concurrent.duration._ | ||
| import scala.language.postfixOps | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Saw this while compiling:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes that's the right thing, if you think that the use of postfix ops is justified. |
||
|
|
||
| import akka.actor.{Actor, ActorRef, Props} | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -980,33 +980,76 @@ private[spark] object Utils extends Logging { | |
| ) | ||
| } | ||
|
|
||
|
|
||
| private val TB = 1L << 40 | ||
| private val GB = 1L << 30 | ||
| private val MB = 1L << 20 | ||
| private val KB = 1L << 10 | ||
|
|
||
| private val scaleCharToFactor: Map[Char, Long] = Map( | ||
| 'b' -> 1L, | ||
| 'k' -> KB, | ||
| 'm' -> MB, | ||
| 'g' -> GB, | ||
| 't' -> TB | ||
| ) | ||
|
|
||
| /** | ||
| * Convert a Java memory parameter passed to -Xmx (such as 300m or 1g) to a number of megabytes. | ||
| */ | ||
| def memoryStringToMb(str: String): Int = { | ||
| * Convert a Java memory parameter passed to -Xmx (such as "300m" or "1g") to a number of | ||
| * megabytes (or other byte-scale denominations as specified by @outputScaleChar). | ||
| * | ||
| * For @defaultInputScaleChar and @outputScaleChar, valid values are: 'b' (bytes), 'k' | ||
| * (kilobytes), 'm' (megabytes), 'g' (gigabytes), and 't' (terabytes). | ||
| * | ||
| * @param str String to parse an amount of memory out of | ||
| * @param defaultInputScaleChar if no "scale" is provided on the end of @str (i.e. @str is a | ||
| * plain numeric value), assume this scale (default: 'b' for | ||
| * 'bytes') | ||
| * @param outputScaleChar express the output in this scale, i.e. number of bytes, kilobytes, | ||
| * megabytes, or gigabytes. | ||
| */ | ||
| def parseMemoryString( | ||
| str: String, | ||
| defaultInputScaleChar: Char = 'b', | ||
| outputScaleChar: Char = 'm'): Long = { | ||
|
|
||
| val lower = str.toLowerCase | ||
| if (lower.endsWith("k")) { | ||
| (lower.substring(0, lower.length-1).toLong / 1024).toInt | ||
| } else if (lower.endsWith("m")) { | ||
| lower.substring(0, lower.length-1).toInt | ||
| } else if (lower.endsWith("g")) { | ||
| lower.substring(0, lower.length-1).toInt * 1024 | ||
| } else if (lower.endsWith("t")) { | ||
| lower.substring(0, lower.length-1).toInt * 1024 * 1024 | ||
| } else {// no suffix, so it's just a number in bytes | ||
| (lower.toLong / 1024 / 1024).toInt | ||
| } | ||
| val lastChar = lower(lower.length - 1) | ||
| val (num, inputScaleChar) = | ||
| if (lastChar.isDigit) { | ||
| (lower.toLong, defaultInputScaleChar) | ||
| } else { | ||
| (lower.substring(0, lower.length - 1).toLong, lastChar) | ||
| } | ||
|
|
||
| (for { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is a for construction used here? just to handle the invalid in / out scale param? My personal taste would be to just check that the Option[Long] exists directly and do away with it. I can't tell how much that's just me versus how the kids talk in Scala these days. A looping construct surprised me as there is no loop.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
vs. (I collapsed the So, it's not a "looping construct" so much as a "mapping" one, commonly used on However, it does get better as the number of chained Of course, handling such situations using a I prefer all of these to, say, the following straw-man that doesn't take advantage of any of the nice things that come from using but I'll defer to you on the approach you like best.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I get what it does. I was comparing in my mind to... Here's another instance where I wouldn't mind hearing another opinion as it's a good more general style question. |
||
| inputScale <- scaleCharToFactor.get(inputScaleChar) | ||
| outputScale <- scaleCharToFactor.get(outputScaleChar) | ||
| } yield { | ||
| inputScale * num / outputScale | ||
| }).getOrElse( | ||
| throw new IllegalArgumentException( | ||
| "Invalid memory string or scale: %s, %s, %s".format( | ||
| str, | ||
| defaultInputScaleChar, | ||
| outputScaleChar | ||
| ) | ||
| ) | ||
| ) | ||
| } | ||
|
|
||
| /** | ||
| * Wrapper for @parseMemoryString taking default arguments and returning an int, which is safe | ||
| * since we are converting to a number of megabytes. | ||
| */ | ||
| def memoryStringToMb(str: String): Int = memoryStringToMb(str, defaultInputScale = 'b') | ||
| def memoryStringToMb(str: String, defaultInputScale: Char = 'b'): Int = | ||
| parseMemoryString(str, defaultInputScale, 'm').toInt | ||
|
|
||
| /** | ||
| * Convert a quantity in bytes to a human-readable string such as "4.0 MB". | ||
| */ | ||
| def bytesToString(size: Long): String = { | ||
| val TB = 1L << 40 | ||
| val GB = 1L << 30 | ||
| val MB = 1L << 20 | ||
| val KB = 1L << 10 | ||
|
|
||
| val (value, unit) = { | ||
| if (size >= 2*TB) { | ||
| (size.asInstanceOf[Double] / TB, "TB") | ||
|
|
@@ -1047,7 +1090,7 @@ private[spark] object Utils extends Logging { | |
| * Convert a quantity in megabytes to a human-readable string such as "4.0 MB". | ||
| */ | ||
| def megabytesToString(megabytes: Long): String = { | ||
| bytesToString(megabytes * 1024L * 1024L) | ||
| bytesToString(megabytes * MB) | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -1905,11 +1948,6 @@ private[spark] object Utils extends Logging { | |
| method.invoke(obj, values.toSeq: _*) | ||
| } | ||
|
|
||
| // Limit of bytes for total size of results (default is 1GB) | ||
| def getMaxResultSize(conf: SparkConf): Long = { | ||
| memoryStringToMb(conf.get("spark.driver.maxResultSize", "1g")).toLong << 20 | ||
| } | ||
|
|
||
| /** | ||
| * Return the current system LD_LIBRARY_PATH name | ||
| */ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, why is this property special-cased here? the methods in this class are generally generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I weighed two options:
Utilsthat takes aSparkConfas a parameter and thinly wraps a method call on saidSparkConfSparkConfthat delegates to another method onSparkConf..and felt like the latter was better/cleaner. My feeling was that a kitchen-sink / generically-named
Utilsclass that wraps methods forSparkConf(and possibly other classes?) to maintain an illusion of simplicity in theSparkConfAPI is not helping code clarity.Of course, this is subjective and I'm open to putting it back in
Utils, lmk.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: that was an answer to why this property is special-cased here, as opposed to over in
Utils. You may be more interested in the question of why it's special-cased at all, but that seems reasonable to me given the couple magic strings and its greater-than-1 call-sites (namely: 2).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have picked
UtilsI suppose. Or is there somewhere less generic to put this that covers the 2 call sites? Other opinions?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 3. could be: Put such methods in a
SparkConfUtilsobject that would be less prone to kitchen-sink-iness.I'm impartial, I'll let you make the call between these three.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like another set of eyes on the change from @pwendell or @JoshRosen . The reason I hesitate before merging is the large number of small changes and merge conflict potential. Although I do definitely think the variable names are improved by the suffix.
For this particular issue, maybe expose just
getMemoryfrom this class, and inline the two simple calls to it that currently usegetMaxResultSize? writinggetMemory("spark.driver.maxResultSize", "1g", outputScale = 'b')in two places isn't bad versus constructing a new home for it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I'd vote we put it back in
Utilsover un-factoring those two calls that are doing the same thing as one another