Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 14, 2018

What changes were proposed in this pull request?

In the PR, I propose:

  • new SQL config spark.sql.debug.maxToStringFields to control maximum number fields up to which truncatedString cuts its input sequences.
  • Moving truncatedString out of core to sql/catalyst because it is used only in the sql/catalyst packages for restricting number of fields converted to strings from TreeNode and expressions ofStructType.

How was this patch tested?

Added a test to QueryExecutionSuite to check that spark.sql.debug.maxToStringFields impacts to behavior of truncatedString.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 14, 2018

The PR extracts a part of changes from #22429

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 14, 2018

@hvanhovell @gatorsmile @HyukjinKwon @cloud-fan Could you take a look at this PR, please.

@HyukjinKwon
Copy link
Member

@MaxGekk, I think the main purpose of this PR is rather to introduce spark.sql.debug.maxToStringFields .. let's fix PR description and title for that.

@dongjoon-hyun
Copy link
Member

@MaxGekk . One PR should have one theme with a proper title. We frequently search by commit title. Please don't split this PR into two sub PRs.

@SparkQA
Copy link

SparkQA commented Nov 15, 2018

Test build #98841 has finished for PR 23039 at commit 36de047.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [SPARK-26066][SQL] Moving truncatedString to sql/catalyst [SPARK-26066][SQL] Using spark.sql.debug.maxToStringFields in truncatedString moved to sql/catalyst Nov 15, 2018
@MaxGekk MaxGekk changed the title [SPARK-26066][SQL] Using spark.sql.debug.maxToStringFields in truncatedString moved to sql/catalyst [SPARK-26066][SQL] New SQL config in truncatedString moved to sql/catalyst Nov 15, 2018
@MaxGekk MaxGekk changed the title [SPARK-26066][SQL] New SQL config in truncatedString moved to sql/catalyst [SPARK-26066][SQL] Using new SQL config spark.sql.debug.maxToStringFields in truncatedString moved to sql/catalyst Nov 15, 2018
@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 15, 2018

let's fix PR description and title for that.
One PR should have one theme with a proper title.

@HyukjinKwon @dongjoon-hyun I have renamed this PR. Is new title fine for you?

We frequently search by commit title. Please don't split this PR into two sub PRs.

Sure, I will not.

@dongjoon-hyun
Copy link
Member

How about the following?

Move `truncatedString` to `sql/catalyst` and add `spark.sql.debug.maxToStringFields` conf

@MaxGekk MaxGekk changed the title [SPARK-26066][SQL] Using new SQL config spark.sql.debug.maxToStringFields in truncatedString moved to sql/catalyst [SPARK-26066][SQL] Move truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf Nov 15, 2018
@MaxGekk MaxGekk changed the title [SPARK-26066][SQL] Move truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf [SPARK-26066][SQL] Move truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf Nov 15, 2018
@SparkQA
Copy link

SparkQA commented Nov 16, 2018

Test build #98915 has finished for PR 23039 at commit 7180c2e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.catalyst.util.truncatedString

class UtilsSuite extends SparkFunSuite {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UtilsSuite -> UtilSuite since the package is org.apache.spark.sql.util. Previously, it was in Utils.scala.

@SparkQA
Copy link

SparkQA commented Nov 17, 2018

Test build #98927 has finished for PR 23039 at commit 082254d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 18, 2018

@HyukjinKwon @kiszk Do you have any objections for this PR?

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 19, 2018

@gatorsmile @hvanhovell Could you look at the PR, please. I extracted a piece of changes from #22429.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99030 has finished for PR 23039 at commit 082254d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 21, 2018

Test build #99106 has finished for PR 23039 at commit e90f888.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

There are two failures, but it seems to be irrelevant to this PR.

Test Result (2 failures / +2)
org.apache.spark.repl.SingletonReplSuite.(It is not a test it is a sbt.testing.SuiteSelector)
org.apache.spark.sql.streaming.continuous.ContinuousSuite.query without test harness

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Nov 21, 2018

Test build #99121 has finished for PR 23039 at commit e90f888.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Merged to master. Thank you, @MaxGekk and @HyukjinKwon , @kiszk .

@asfgit asfgit closed this in 81550b3 Nov 21, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
….sql.debug.maxToStringFields conf

## What changes were proposed in this pull request?

In the PR, I propose:
- new SQL config `spark.sql.debug.maxToStringFields` to control maximum number fields up to which `truncatedString` cuts its input sequences.
- Moving `truncatedString` out of `core` to `sql/catalyst` because it is used only in the `sql/catalyst` packages for restricting number of fields converted to strings from `TreeNode` and expressions of`StructType`.

## How was this patch tested?

Added a test to `QueryExecutionSuite` to check that `spark.sql.debug.maxToStringFields` impacts to behavior of `truncatedString`.

Closes apache#23039 from MaxGekk/truncated-string-catalyst.

Lead-authored-by: Maxim Gekk <[email protected]>
Co-authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@MaxGekk MaxGekk deleted the truncated-string-catalyst branch August 17, 2019 13:33
dvallejo pushed a commit to Telefonica/spark that referenced this pull request Aug 31, 2021
The PR puts in a limit on the size of a debug string generated for a tree node.  Helps to fix out of memory errors when large plans have huge debug strings.   In addition to SPARK-26103, this should also address SPARK-23904 and SPARK-25380.  AN alternative solution was proposed in apache#23076, but that solution doesn't address all the cases that can cause a large query.  This limit is only on calls treeString that don't pass a Writer, which makes it play nicely with apache#22429, apache#23018 and apache#23039.  Full plans can be written to files, but truncated plans will be used when strings are held in memory, such as for the UI.

- A new configuration parameter called spark.sql.debug.maxPlanLength was added to control the length of the plans.
- When plans are truncated, "..." is printed to indicate that it isn't a full plan
- A warning is printed out the first time a truncated plan is displayed. The warning explains what happened and how to adjust the limit.

Unit tests were created for the new SizeLimitedWriter.  Also a unit test for TreeNode was created that checks that a long plan is correctly truncated.

Closes apache#23169 from DaveDeCaprio/text-plan-size.

Lead-authored-by: Dave DeCaprio <[email protected]>
Co-authored-by: David DeCaprio <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants