[SPARK-9058][SQL] Split projectionCode if it is too large for JVM #7418

viirya · 2015-07-15T09:22:14Z

JIRA: https://issues.apache.org/jira/browse/SPARK-9058

There is a limit for code size in a function in JVM. If the generated projectionCode is too large, we split it to multiple functions and avoid causing JVM failed.

SparkQA · 2015-07-15T10:48:43Z

Test build #37349 has finished for PR 7418 at commit d715fd5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2015-07-15T15:38:37Z

This patch seems to be duplicated with #7076.
Why you make a new PR?

maropu · 2015-07-15T15:42:05Z

...main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala

How did you decide this number '50' for the JVM code size limitation?

viirya · 2015-07-15T15:52:34Z

This is created for the new JIRA ticket. I didn't notice there is already a related one.

SparkQA · 2015-07-15T15:57:14Z

Test build #37363 has finished for PR 7418 at commit 8775359.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala

chenghao-intel · 2015-07-16T03:11:26Z

...main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala

Make it private final? And also instead of the make the argument type as Object, can we make it as InternalRow?

Since the codegen aim to inline the execution, probably we'd better not to increase the overhead for type casting.
Even, we'd better to put every 50(says) expressions into a single codegen function?

SparkQA · 2015-07-16T04:31:09Z

Test build #37444 has finished for PR 7418 at commit 7435454.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-16T06:07:28Z

Test build #37461 has finished for PR 7418 at commit 12d3794.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-16T08:09:03Z

Test build #37464 has finished for PR 7418 at commit b8e274e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

chenghao-intel · 2015-07-17T00:37:17Z

My concern on this fix is, that probably cause performance issue, if we add a function call for each of the projection field, as codegen aim to inline the logic into a single function.
Can you give a benchmark result for this fixing?

chenghao-intel · 2015-07-17T00:39:32Z

Or at least, we can group several project fields in a codegened function, and give a configuration to enable the codegen grouping.

viirya · 2015-07-17T02:39:11Z

@chenghao-intel This pr already groups a certain number of projection columns (now it is 50) into a function and call it. It is due to the problem caused by a very long single function inlined with too many projection columns (as reported in the JIRA ticket, it is more than 100).

If the projection columns is less than the number, it doesn't doing column grouping and works as before.

I think this should be a very rare use case. For this problem, we shouldn't add a configuration because it is not a feature or function. It is a bug fixing to make this kind of projection work.

rxin · 2015-07-17T05:11:20Z

@viirya let's use #7076 since it was first submitted and had a test case. If @saurfang doesn't respond, let's merge this one. Thanks.

viirya · 2015-07-17T05:53:35Z

@rxin sure, no problem.

rxin · 2015-07-17T06:08:10Z

Thanks for understanding. Do you mind closing this one first? If we need to merge this one, let's reopen it.

viirya · 2015-07-17T06:26:28Z

ok. no problem.

chenghao-intel · 2015-07-17T06:30:14Z

@viirya, I mean you at least have a single group (less than 50 fields), it probably cause performance issues for case like:
select a+b from src, as the overhead of function invoke is heavy compare to the a+b itself.

viirya · 2015-07-17T06:48:44Z

@chenghao-intel no, if there are less 50 columns, the generated java codes are as the same as the before. No extra function is added.

chenghao-intel · 2015-07-17T07:05:16Z

...main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala

Oh, right! Here. Thanks for explanation.

Split projectionCode if it is too large for JVM.

d715fd5

Add semicolon.

8775359

maropu reviewed Jul 15, 2015
View reviewed changes

Merge remote-tracking branch 'upstream/master' into fix_codegen_size

7435454

Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala

chenghao-intel reviewed Jul 16, 2015
View reviewed changes

For comment.

12d3794

Fix it.

b8e274e

viirya closed this Jul 17, 2015

chenghao-intel mentioned this pull request Jul 17, 2015

[SPARK-8443][SQL] Split GenerateMutableProjection Codegen due to JVM Code Size Limits #7076

Closed

chenghao-intel reviewed Jul 17, 2015
View reviewed changes

viirya deleted the fix_codegen_size branch December 27, 2023 18:32

[SPARK-9058][SQL] Split projectionCode if it is too large for JVM #7418

[SPARK-9058][SQL] Split projectionCode if it is too large for JVM #7418

Uh oh!

Conversation

viirya commented Jul 15, 2015

Uh oh!

SparkQA commented Jul 15, 2015

Uh oh!

maropu commented Jul 15, 2015

Uh oh!

maropu Jul 15, 2015

Choose a reason for hiding this comment

Uh oh!

viirya commented Jul 15, 2015

Uh oh!

SparkQA commented Jul 15, 2015

Uh oh!

chenghao-intel Jul 16, 2015

Choose a reason for hiding this comment

Uh oh!

chenghao-intel Jul 16, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 16, 2015

Uh oh!

SparkQA commented Jul 16, 2015

Uh oh!

SparkQA commented Jul 16, 2015

Uh oh!

chenghao-intel commented Jul 17, 2015

Uh oh!

chenghao-intel commented Jul 17, 2015

Uh oh!

viirya commented Jul 17, 2015

Uh oh!

rxin commented Jul 17, 2015

Uh oh!

viirya commented Jul 17, 2015

Uh oh!

rxin commented Jul 17, 2015

Uh oh!

viirya commented Jul 17, 2015

Uh oh!

chenghao-intel commented Jul 17, 2015

Uh oh!

viirya commented Jul 17, 2015

Uh oh!

chenghao-intel Jul 17, 2015

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants