Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jul 15, 2015

JIRA: https://issues.apache.org/jira/browse/SPARK-9058

There is a limit for code size in a function in JVM. If the generated projectionCode is too large, we split it to multiple functions and avoid causing JVM failed.

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37349 has finished for PR 7418 at commit d715fd5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 15, 2015

This patch seems to be duplicated with #7076.
Why you make a new PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you decide this number '50' for the JVM code size limitation?

@viirya
Copy link
Member Author

viirya commented Jul 15, 2015

This is created for the new JIRA ticket. I didn't notice there is already a related one.

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37363 has finished for PR 7418 at commit 8775359.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it private final? And also instead of the make the argument type as Object, can we make it as InternalRow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the codegen aim to inline the execution, probably we'd better not to increase the overhead for type casting.
Even, we'd better to put every 50(says) expressions into a single codegen function?

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37444 has finished for PR 7418 at commit 7435454.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37461 has finished for PR 7418 at commit 12d3794.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37464 has finished for PR 7418 at commit b8e274e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor

My concern on this fix is, that probably cause performance issue, if we add a function call for each of the projection field, as codegen aim to inline the logic into a single function.
Can you give a benchmark result for this fixing?

@chenghao-intel
Copy link
Contributor

Or at least, we can group several project fields in a codegened function, and give a configuration to enable the codegen grouping.

@viirya
Copy link
Member Author

viirya commented Jul 17, 2015

@chenghao-intel This pr already groups a certain number of projection columns (now it is 50) into a function and call it. It is due to the problem caused by a very long single function inlined with too many projection columns (as reported in the JIRA ticket, it is more than 100).

If the projection columns is less than the number, it doesn't doing column grouping and works as before.

I think this should be a very rare use case. For this problem, we shouldn't add a configuration because it is not a feature or function. It is a bug fixing to make this kind of projection work.

@rxin
Copy link
Contributor

rxin commented Jul 17, 2015

@viirya let's use #7076 since it was first submitted and had a test case. If @saurfang doesn't respond, let's merge this one. Thanks.

@viirya
Copy link
Member Author

viirya commented Jul 17, 2015

@rxin sure, no problem.

@rxin
Copy link
Contributor

rxin commented Jul 17, 2015

Thanks for understanding. Do you mind closing this one first? If we need to merge this one, let's reopen it.

@viirya
Copy link
Member Author

viirya commented Jul 17, 2015

ok. no problem.

@viirya viirya closed this Jul 17, 2015
@chenghao-intel
Copy link
Contributor

@viirya, I mean you at least have a single group (less than 50 fields), it probably cause performance issues for case like:
select a+b from src, as the overhead of function invoke is heavy compare to the a+b itself.

@viirya
Copy link
Member Author

viirya commented Jul 17, 2015

@chenghao-intel no, if there are less 50 columns, the generated java codes are as the same as the before. No extra function is added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right! Here. Thanks for explanation.

@viirya viirya deleted the fix_codegen_size branch December 27, 2023 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants