[SPARK-26066][SQL] Move truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf #23039

MaxGekk · 2018-11-14T21:43:47Z

What changes were proposed in this pull request?

In the PR, I propose:

new SQL config spark.sql.debug.maxToStringFields to control maximum number fields up to which truncatedString cuts its input sequences.
Moving truncatedString out of core to sql/catalyst because it is used only in the sql/catalyst packages for restricting number of fields converted to strings from TreeNode and expressions ofStructType.

How was this patch tested?

Added a test to QueryExecutionSuite to check that spark.sql.debug.maxToStringFields impacts to behavior of truncatedString.

MaxGekk · 2018-11-14T21:46:18Z

The PR extracts a part of changes from #22429

MaxGekk · 2018-11-14T21:47:57Z

@hvanhovell @gatorsmile @HyukjinKwon @cloud-fan Could you take a look at this PR, please.

HyukjinKwon · 2018-11-15T00:23:59Z

@MaxGekk, I think the main purpose of this PR is rather to introduce spark.sql.debug.maxToStringFields .. let's fix PR description and title for that.

dongjoon-hyun · 2018-11-15T01:43:36Z

@MaxGekk . One PR should have one theme with a proper title. We frequently search by commit title. Please don't split this PR into two sub PRs.

SparkQA · 2018-11-15T02:01:29Z

Test build #98841 has finished for PR 23039 at commit 36de047.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2018-11-15T07:57:06Z

let's fix PR description and title for that.
One PR should have one theme with a proper title.

@HyukjinKwon @dongjoon-hyun I have renamed this PR. Is new title fine for you?

We frequently search by commit title. Please don't split this PR into two sub PRs.

Sure, I will not.

dongjoon-hyun · 2018-11-15T23:00:28Z

How about the following?

Move `truncatedString` to `sql/catalyst` and add `spark.sql.debug.maxToStringFields` conf

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

SparkQA · 2018-11-16T18:08:52Z

Test build #98915 has finished for PR 23039 at commit 7180c2e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-11-16T18:24:03Z

sql/catalyst/src/test/scala/org/apache/spark/sql/util/UtilsSuite.scala

+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.util.truncatedString
+
+class UtilsSuite extends SparkFunSuite {


UtilsSuite -> UtilSuite since the package is org.apache.spark.sql.util. Previously, it was in Utils.scala.

…uncated-string-catalyst

SparkQA · 2018-11-17T00:18:12Z

Test build #98927 has finished for PR 23039 at commit 082254d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM.

MaxGekk · 2018-11-18T10:26:31Z

@HyukjinKwon @kiszk Do you have any objections for this PR?

MaxGekk · 2018-11-19T21:58:12Z

@gatorsmile @hvanhovell Could you look at the PR, please. I extracted a piece of changes from #22429.

dongjoon-hyun · 2018-11-20T00:12:06Z

Retest this please.

SparkQA · 2018-11-20T04:22:25Z

Test build #99030 has finished for PR 23039 at commit 082254d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-21T12:12:59Z

Test build #99106 has finished for PR 23039 at commit e90f888.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-11-21T12:15:10Z

There are two failures, but it seems to be irrelevant to this PR.

Test Result (2 failures / +2)
org.apache.spark.repl.SingletonReplSuite.(It is not a test it is a sbt.testing.SuiteSelector)
org.apache.spark.sql.streaming.continuous.ContinuousSuite.query without test harness

dongjoon-hyun · 2018-11-21T12:15:23Z

Retest this please.

SparkQA · 2018-11-21T16:50:57Z

Test build #99121 has finished for PR 23039 at commit e90f888.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-11-21T19:16:21Z

Merged to master. Thank you, @MaxGekk and @HyukjinKwon , @kiszk .

….sql.debug.maxToStringFields conf ## What changes were proposed in this pull request? In the PR, I propose: - new SQL config `spark.sql.debug.maxToStringFields` to control maximum number fields up to which `truncatedString` cuts its input sequences. - Moving `truncatedString` out of `core` to `sql/catalyst` because it is used only in the `sql/catalyst` packages for restricting number of fields converted to strings from `TreeNode` and expressions of`StructType`. ## How was this patch tested? Added a test to `QueryExecutionSuite` to check that `spark.sql.debug.maxToStringFields` impacts to behavior of `truncatedString`. Closes apache#23039 from MaxGekk/truncated-string-catalyst. Lead-authored-by: Maxim Gekk <[email protected]> Co-authored-by: Maxim Gekk <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

The PR puts in a limit on the size of a debug string generated for a tree node. Helps to fix out of memory errors when large plans have huge debug strings. In addition to SPARK-26103, this should also address SPARK-23904 and SPARK-25380. AN alternative solution was proposed in apache#23076, but that solution doesn't address all the cases that can cause a large query. This limit is only on calls treeString that don't pass a Writer, which makes it play nicely with apache#22429, apache#23018 and apache#23039. Full plans can be written to files, but truncated plans will be used when strings are held in memory, such as for the UI. - A new configuration parameter called spark.sql.debug.maxPlanLength was added to control the length of the plans. - When plans are truncated, "..." is printed to indicate that it isn't a full plan - A warning is printed out the first time a truncated plan is displayed. The warning explains what happened and how to adjust the limit. Unit tests were created for the new SizeLimitedWriter. Also a unit test for TreeNode was created that checks that a long plan is correctly truncated. Closes apache#23169 from DaveDeCaprio/text-plan-size. Lead-authored-by: Dave DeCaprio <[email protected]> Co-authored-by: David DeCaprio <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>

MaxGekk added 2 commits November 14, 2018 22:15

Moving truncatedString out of core

265a810

Adding spark.sql.debug.maxToStringFields

36de047

MaxGekk changed the title ~~[SPARK-26066][SQL] Moving truncatedString to sql/catalyst~~ [SPARK-26066][SQL] Using spark.sql.debug.maxToStringFields in truncatedString moved to sql/catalyst Nov 15, 2018

MaxGekk changed the title ~~[SPARK-26066][SQL] Using spark.sql.debug.maxToStringFields in truncatedString moved to sql/catalyst~~ [SPARK-26066][SQL] New SQL config in truncatedString moved to sql/catalyst Nov 15, 2018

MaxGekk changed the title ~~[SPARK-26066][SQL] New SQL config in truncatedString moved to sql/catalyst~~ [SPARK-26066][SQL] Using new SQL config spark.sql.debug.maxToStringFields in truncatedString moved to sql/catalyst Nov 15, 2018

MaxGekk changed the title ~~[SPARK-26066][SQL] Using new SQL config spark.sql.debug.maxToStringFields in truncatedString moved to sql/catalyst~~ [SPARK-26066][SQL] Move truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf Nov 15, 2018

MaxGekk changed the title ~~[SPARK-26066][SQL] Move truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf~~ [SPARK-26066][SQL] Move truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf Nov 15, 2018

kiszk reviewed Nov 16, 2018

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated Show resolved Hide resolved

Removing that

7180c2e

dongjoon-hyun reviewed Nov 16, 2018

View reviewed changes

MaxGekk added 2 commits November 16, 2018 20:35

UtilsSuite -> UtilSuite

23667df

Merge remote-tracking branch 'fork/truncated-string-catalyst' into tr…

082254d

…uncated-string-catalyst

dongjoon-hyun approved these changes Nov 17, 2018

View reviewed changes

MaxGekk mentioned this pull request Nov 18, 2018

[SPARK-26103][SQL] Added maxDepth to limit the length of text plans #23076

Closed

HyukjinKwon approved these changes Nov 21, 2018

View reviewed changes

Merge branch 'master' into truncated-string-catalyst

e90f888

asfgit closed this in 81550b3 Nov 21, 2018

DaveDeCaprio mentioned this pull request Nov 28, 2018

[SPARK-26103][SQL] Limit the length of debug strings for query plans #23169

Closed

MaxGekk deleted the truncated-string-catalyst branch August 17, 2019 13:33

[SPARK-26066][SQL] Move truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf #23039

[SPARK-26066][SQL] Move truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf #23039

Uh oh!

Conversation

MaxGekk commented Nov 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

MaxGekk commented Nov 14, 2018

Uh oh!

MaxGekk commented Nov 14, 2018

Uh oh!

HyukjinKwon commented Nov 15, 2018

Uh oh!

dongjoon-hyun commented Nov 15, 2018

Uh oh!

SparkQA commented Nov 15, 2018

Uh oh!

MaxGekk commented Nov 15, 2018

Uh oh!

dongjoon-hyun commented Nov 15, 2018

Uh oh!

Uh oh!

SparkQA commented Nov 16, 2018

Uh oh!

dongjoon-hyun Nov 16, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 17, 2018

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Nov 18, 2018

Uh oh!

MaxGekk commented Nov 19, 2018

Uh oh!

dongjoon-hyun commented Nov 20, 2018

Uh oh!

SparkQA commented Nov 20, 2018

Uh oh!

SparkQA commented Nov 21, 2018

Uh oh!

dongjoon-hyun commented Nov 21, 2018

Uh oh!

dongjoon-hyun commented Nov 21, 2018

Uh oh!

SparkQA commented Nov 21, 2018

Uh oh!

dongjoon-hyun commented Nov 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MaxGekk commented Nov 14, 2018 •

edited

Loading