-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9058][SQL] Split projectionCode if it is too large for JVM #7418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #37349 has finished for PR 7418 at commit
|
|
This patch seems to be duplicated with #7076. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you decide this number '50' for the JVM code size limitation?
|
This is created for the new JIRA ticket. I didn't notice there is already a related one. |
|
Test build #37363 has finished for PR 7418 at commit
|
Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it private final? And also instead of the make the argument type as Object, can we make it as InternalRow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the codegen aim to inline the execution, probably we'd better not to increase the overhead for type casting.
Even, we'd better to put every 50(says) expressions into a single codegen function?
|
Test build #37444 has finished for PR 7418 at commit
|
|
Test build #37461 has finished for PR 7418 at commit
|
|
Test build #37464 has finished for PR 7418 at commit
|
|
My concern on this fix is, that probably cause performance issue, if we add a function call for each of the projection field, as codegen aim to inline the logic into a single function. |
|
Or at least, we can group several project fields in a codegened function, and give a configuration to enable the codegen grouping. |
|
@chenghao-intel This pr already groups a certain number of projection columns (now it is 50) into a function and call it. It is due to the problem caused by a very long single function inlined with too many projection columns (as reported in the JIRA ticket, it is more than 100). If the projection columns is less than the number, it doesn't doing column grouping and works as before. I think this should be a very rare use case. For this problem, we shouldn't add a configuration because it is not a feature or function. It is a bug fixing to make this kind of projection work. |
|
@rxin sure, no problem. |
|
Thanks for understanding. Do you mind closing this one first? If we need to merge this one, let's reopen it. |
|
ok. no problem. |
|
@viirya, I mean you at least have a single group (less than 50 fields), it probably cause performance issues for case like: |
|
@chenghao-intel no, if there are less 50 columns, the generated java codes are as the same as the before. No extra function is added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right! Here. Thanks for explanation.
JIRA: https://issues.apache.org/jira/browse/SPARK-9058
There is a limit for code size in a function in JVM. If the generated
projectionCodeis too large, we split it to multiple functions and avoid causing JVM failed.