Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
use lines per method as split threshold instead of chars per method
  • Loading branch information
kiszk committed Oct 4, 2017
commit 87578dbbd3a497a5962a9c04bd8843e977ccc0fb
Original file line number Diff line number Diff line change
Expand Up @@ -772,20 +772,21 @@ class CodegenContext {
foldFunctions: Seq[String] => String = _.mkString("", ";\n", ";")): String = {
val blocks = new ArrayBuffer[String]()
val blockBuilder = new StringBuilder()
val maxCharacters = SQLConf.get.maxCharsPerFunction
var length = 0
val maxLines = SQLConf.get.maxCodegenLinesPerFunction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag does not take an effect at runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Could you elaborate this review comment?

Copy link
Member

@gatorsmile gatorsmile Aug 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set the conf to a new value at runtime, can you get the value from SQLConf.get.maxCodegenLinesPerFunction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I had another interpretation that "this value may not change performance".
Let me check this while I did it like other flags.

Copy link
Member Author

@kiszk kiszk Aug 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Depends on calling context, it may take an effect or not. Should we pass SQLConf to this method? I am thinking whether we can pass SQLConf to a constructor or not.
Is there any other thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this option effective by executing SparkEnv.get.conf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Since to make it configurable takes long time, can we do it using hard-coded parameter?
Even in this case, this PR makes better since the estimation does not include characters for comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiszk You know, I am just afraid new regression could be introduced due to this change. Sorry for the delay. I really do not have a better solution. I kind of agree on your original solution. Just exclude the characters for comment. At least, it becomes better and take a less risk to hit a regression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile I understand your concern about the possibility of new performance regression. I will use the original threshold (max characters) as hard-coded value.

var line = 0
for (code <- expressions) {
// We can't know how many bytecode will be generated, so use the length of source code
// We can't know how many bytecode will be generated, so use the line of source code
// as metric. A method should not go beyond 8K, otherwise it will not be JITted, should
// also not be too small, or it will have many function calls (for wide table), see the
// results in BenchmarkWideTable.
if (length > maxCharacters) {
if (line > maxLines) {
blocks += blockBuilder.toString()
blockBuilder.clear()
length = 0
line = 0
}
blockBuilder.append(code)
length += CodeFormatter.stripExtraNewLinesAndComments(code).length
val lineOfCode = CodeFormatter.stripExtraNewLinesAndComments(code).count(_ == '\n')
line += (if (lineOfCode == 0) 1 else lineOfCode)
}
blocks += blockBuilder.toString()

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -586,14 +586,14 @@ object SQLConf {
.intConf
.createWithDefault(CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT)

val CODEGEN_MAX_CHARS_PER_FUNCTION = buildConf("spark.sql.codegen.maxCharactersPerFunction")
val CODEGEN_MAX_LINES_PER_FUNCTION = buildConf("spark.sql.codegen.maxCodegenLinesPerFunction")
.internal()
.doc("The maximum characters of a single Java function generated by codegen. " +
.doc("The maximum lines of a single Java function generated by codegen. " +
"When the generated function exceeds this threshold, the multiple statements, " +
"whose characters are less than the value, are splited into a function. " +
"The default value 1024 is the max length of byte code JIT supported.")
"whose lines are less than the value, are splited into a function. " +
"The default value 100 is the max length of byte code JIT supported.")
.intConf
.createWithDefault(1024)
.createWithDefault(100)

val FILES_MAX_PARTITION_BYTES = buildConf("spark.sql.files.maxPartitionBytes")
.doc("The maximum number of bytes to pack into a single partition when reading files.")
Expand Down Expand Up @@ -1070,7 +1070,7 @@ class SQLConf extends Serializable with Logging {

def hugeMethodLimit: Int = getConf(WHOLESTAGE_HUGE_METHOD_LIMIT)

def maxCharsPerFunction: Int = getConf(CODEGEN_MAX_CHARS_PER_FUNCTION)
def maxCodegenLinesPerFunction: Int = getConf(CODEGEN_MAX_LINES_PER_FUNCTION)

def tableRelationCacheSize: Int =
getConf(StaticSQLConf.FILESOURCE_TABLE_RELATION_CACHE_SIZE)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,16 @@ class BenchmarkWideTable extends BenchmarkBase {
/**
* Here are some numbers with different split threshold:
*
* Split threshold methods Rate(M/s) Per Row(ns)
* 10 400 0.4 2279
* 100 200 0.6 1554
* 1k 37 0.9 1116
* 8k 5 0.5 2025
* 64k 1 0.0 21649
* Split threshold Rate(M/s) Per Row(ns)
* 10 0.5 2131.3
* 20 0.5 2073.7
* 40 0.5 2085.2
* 64 0.5 2012.2
* 80 0.5 2112.2
* 100 0.5 1984.0
* 128 0.5 2097.9
* 256 0.5 2038.9
* 1024 0.5 2045.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this threshold impact this test case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Sorry, I updated the result. This threshold impacts this test case.

*/
}
}