-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15752] [SQL] Optimize metadata only query that has an aggregate whose children are deterministic project or filter operators. #13494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 4 commits
Commits
Show all changes
52 commits
Select commit
Hold shift + click to select a range
2ca2c38
init commit
lianhuiwang edea710
fix unit test
lianhuiwang 8426522
Merge branch 'apache-master' into metadata-only
lianhuiwang 153293e
fix unit test
lianhuiwang 7dfb743
update
lianhuiwang 68e6d6d
Revert "fix unit test"
lianhuiwang 595ef36
Revert "fix unit test"
lianhuiwang 7d7ece0
Merge branch 'apache-master' into metadata-only
lianhuiwang 2e55a9d
Merge branch 'apache-master' into metadata-only
lianhuiwang b2b6eba
update
lianhuiwang c5a291e
Merge branch 'apache-master' into metadata-only
lianhuiwang 6404c1f
update opt for core
lianhuiwang 1bb5812
refactor
lianhuiwang 7e3729e
add ut
lianhuiwang fbf5d61
fix ut
lianhuiwang 3411fd6
fix project
lianhuiwang aefab7f
address comments
lianhuiwang c5ccdea
fix cube/rollup
lianhuiwang ae6cf9f
fix style
lianhuiwang 159331b
refactor
lianhuiwang 3a1438b
refactor
lianhuiwang c0a7d59
update
lianhuiwang a4045ca
add comments
lianhuiwang 0a023e7
fix minor
lianhuiwang a9b38ab
rename
lianhuiwang a5ea995
update
lianhuiwang 1bed08d
fix monir
lianhuiwang a22e962
refactor
lianhuiwang 41fef2c
update
lianhuiwang bd53678
Merge branch 'apache-master' into metadata-only
lianhuiwang 88f7308
update
lianhuiwang 2568193
add ut
lianhuiwang 26a97f4
address comments
lianhuiwang 4297f9f
update name
lianhuiwang 1a65aa7
address comments
lianhuiwang d5e0df4
update
lianhuiwang 9d6dd76
update2
lianhuiwang 9cb01d8
update
lianhuiwang 3e2687d
doc improve
cloud-fan 2b4faf3
update
cloud-fan 88fd3bf
Merge pull request #2 from cloud-fan/metadata-only
lianhuiwang a894bb7
delete cases
lianhuiwang 9546b40
Merge branch 'metadata-only' of https://github.com/lianhuiwang/spark …
lianhuiwang 85b695b
update ut
lianhuiwang bcfe8e5
Merge branch 'master' of https://github.com/apache/spark into metadat…
lianhuiwang 67211be
Merge branch 'master' of https://github.com/apache/spark into metadat…
lianhuiwang 501f93b
address commetns
lianhuiwang 8ee2a8c
refactor
lianhuiwang d888c85
fix minor
lianhuiwang ff16509
update
lianhuiwang 358ad13
remove duplicate code
lianhuiwang 030776a
fix minor
lianhuiwang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -258,6 +258,11 @@ object SQLConf { | |
| .booleanConf | ||
| .createWithDefault(false) | ||
|
|
||
| val OPTIMIZER_METADATA_ONLY = SQLConfigBuilder("spark.sql.optimizer.metadataOnly") | ||
| .doc("When true, enable the metadata-only query optimization.") | ||
|
||
| .booleanConf | ||
| .createWithDefault(false) | ||
|
||
|
|
||
| val NATIVE_VIEW = SQLConfigBuilder("spark.sql.nativeView") | ||
| .internal() | ||
| .doc("When true, CREATE VIEW will be handled by Spark SQL instead of Hive native commands. " + | ||
|
|
@@ -599,6 +604,8 @@ private[sql] class SQLConf extends Serializable with CatalystConf with Logging { | |
|
|
||
| def metastorePartitionPruning: Boolean = getConf(HIVE_METASTORE_PARTITION_PRUNING) | ||
|
|
||
| def optimizerMetadataOnly: Boolean = getConf(OPTIMIZER_METADATA_ONLY) | ||
|
|
||
| def nativeView: Boolean = getConf(NATIVE_VIEW) | ||
|
|
||
| def wholeStageEnabled: Boolean = getConf(WHOLESTAGE_CODEGEN_ENABLED) | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if this partition has more than one data files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now in this PR, default of spark.sql.optimizer.metadataOnly is false, So if user needs this feature, he should set spark.sql.optimizer.metadataOnly=true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think optimizer should never affect the correctness of the query result. If this optimization is too hard to implement with current code base, we should improve the code base first, instead of rushing in a partial implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I rethink more and then i will add a metadataOnly optimizer to optimizer list.Thanks.