Skip to content

Conversation

@gruuya
Copy link
Contributor

@gruuya gruuya commented Aug 1, 2024

Which issue does this PR close?

Relates to #9373 and #9375.

Rationale for this change

I'm dealing with a situation where we have deeply nested plans, which we want to execute and stream the data into storage (Parquet/Delta), and we're hitting the stack overflow problem observed in the aforementioned issues.

Since this is on a write path we don't really need the analyzer/optimizer rules (I think), which are a part of the problem due to tree node recursion that takes place there. This is not an issue, since those can easily be opted out of via with_analyzer_rules/with_optimizer_rules.

However, the tightest bottleneck as per lldb is actually ApplyFunctionRewrites, which can't be opted out of, even though after #11155 it has no rewrite rules by default.

What changes are included in this PR?

Make ApplyFunctionRewrites simply bail out of the plan transformation/rewrite if it has no rules to apply (the default case presently).

Are these changes tested?

I wanted to add a test that checks for reference equity of the in/out plans but then recalled AnalyzerRule::analyze takes ownership of it.

So the only test is that I see a higher stack overflow threshold with this change.

Are there any user-facing changes?

None.

@github-actions github-actions bot added the optimizer Optimizer rules label Aug 1, 2024
@jayzhan211
Copy link
Contributor

let expr_to_function: Arc<dyn AnalyzerRule + Send + Sync> =
Arc::new(ApplyFunctionRewrites::new(self.function_rewrites.clone()));
let rules = std::iter::once(&expr_to_function).chain(self.rules.iter());

Is it better to opt it out here?

@gruuya
Copy link
Contributor Author

gruuya commented Aug 2, 2024

let expr_to_function: Arc<dyn AnalyzerRule + Send + Sync> =
Arc::new(ApplyFunctionRewrites::new(self.function_rewrites.clone()));
let rules = std::iter::once(&expr_to_function).chain(self.rules.iter());

Is it better to opt it out here?

Makes sense to me, pushed the update.

@gruuya gruuya force-pushed the short-circuit-fn-rewrite branch from 8fa74b9 to 81bb6d7 Compare August 2, 2024 05:57
@gruuya gruuya force-pushed the short-circuit-fn-rewrite branch from 81bb6d7 to 82daf94 Compare August 2, 2024 06:59
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 2, 2024
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jayzhan211
Copy link
Contributor

Thanks @gruuya

@jayzhan211 jayzhan211 merged commit df4e6cc into apache:main Aug 2, 2024
@gruuya gruuya deleted the short-circuit-fn-rewrite branch August 2, 2024 09:56
@alamb
Copy link
Contributor

alamb commented Aug 5, 2024

However, the tightest bottleneck as per lldb is actually ApplyFunctionRewrites, which can't be opted out of, even though after #11155 it has no rewrite rules by default.

Maybe we should simply remove this API -- people can write their own rules directly 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants