-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26066][SQL] Move truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf #23039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The PR extracts a part of changes from #22429 |
|
@hvanhovell @gatorsmile @HyukjinKwon @cloud-fan Could you take a look at this PR, please. |
|
@MaxGekk, I think the main purpose of this PR is rather to introduce |
|
@MaxGekk . One PR should have one theme with a proper title. We frequently search by commit title. Please don't split this PR into two sub PRs. |
|
Test build #98841 has finished for PR 23039 at commit
|
@HyukjinKwon @dongjoon-hyun I have renamed this PR. Is new title fine for you?
Sure, I will not. |
|
How about the following? |
truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf
truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
|
Test build #98915 has finished for PR 23039 at commit
|
| import org.apache.spark.SparkFunSuite | ||
| import org.apache.spark.sql.catalyst.util.truncatedString | ||
|
|
||
| class UtilsSuite extends SparkFunSuite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UtilsSuite -> UtilSuite since the package is org.apache.spark.sql.util. Previously, it was in Utils.scala.
…uncated-string-catalyst
|
Test build #98927 has finished for PR 23039 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
@HyukjinKwon @kiszk Do you have any objections for this PR? |
|
@gatorsmile @hvanhovell Could you look at the PR, please. I extracted a piece of changes from #22429. |
|
Retest this please. |
|
Test build #99030 has finished for PR 23039 at commit
|
|
Test build #99106 has finished for PR 23039 at commit
|
|
There are two failures, but it seems to be irrelevant to this PR. |
|
Retest this please. |
|
Test build #99121 has finished for PR 23039 at commit
|
|
Merged to master. Thank you, @MaxGekk and @HyukjinKwon , @kiszk . |
….sql.debug.maxToStringFields conf ## What changes were proposed in this pull request? In the PR, I propose: - new SQL config `spark.sql.debug.maxToStringFields` to control maximum number fields up to which `truncatedString` cuts its input sequences. - Moving `truncatedString` out of `core` to `sql/catalyst` because it is used only in the `sql/catalyst` packages for restricting number of fields converted to strings from `TreeNode` and expressions of`StructType`. ## How was this patch tested? Added a test to `QueryExecutionSuite` to check that `spark.sql.debug.maxToStringFields` impacts to behavior of `truncatedString`. Closes apache#23039 from MaxGekk/truncated-string-catalyst. Lead-authored-by: Maxim Gekk <[email protected]> Co-authored-by: Maxim Gekk <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
The PR puts in a limit on the size of a debug string generated for a tree node. Helps to fix out of memory errors when large plans have huge debug strings. In addition to SPARK-26103, this should also address SPARK-23904 and SPARK-25380. AN alternative solution was proposed in apache#23076, but that solution doesn't address all the cases that can cause a large query. This limit is only on calls treeString that don't pass a Writer, which makes it play nicely with apache#22429, apache#23018 and apache#23039. Full plans can be written to files, but truncated plans will be used when strings are held in memory, such as for the UI. - A new configuration parameter called spark.sql.debug.maxPlanLength was added to control the length of the plans. - When plans are truncated, "..." is printed to indicate that it isn't a full plan - A warning is printed out the first time a truncated plan is displayed. The warning explains what happened and how to adjust the limit. Unit tests were created for the new SizeLimitedWriter. Also a unit test for TreeNode was created that checks that a long plan is correctly truncated. Closes apache#23169 from DaveDeCaprio/text-plan-size. Lead-authored-by: Dave DeCaprio <[email protected]> Co-authored-by: David DeCaprio <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>
What changes were proposed in this pull request?
In the PR, I propose:
spark.sql.debug.maxToStringFieldsto control maximum number fields up to whichtruncatedStringcuts its input sequences.truncatedStringout ofcoretosql/catalystbecause it is used only in thesql/catalystpackages for restricting number of fields converted to strings fromTreeNodeand expressions ofStructType.How was this patch tested?
Added a test to
QueryExecutionSuiteto check thatspark.sql.debug.maxToStringFieldsimpacts to behavior oftruncatedString.