-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21973][SQL] Add an new option to filter queries in TPC-DS #19188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not directly related to this pr though, I added this log here cuz this change is much trivial and I think this log helps to easily check which query fails.
|
Test build #81633 has finished for PR 19188 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: class -> variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we need queries =?
9d9eff2 to
322c335
Compare
|
Test build #81637 has finished for PR 19188 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add an argument, instead of using the SQLConf? See #18592 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
ok, I'll update in a day! Thanks! |
9eedcd5 to
12767bc
Compare
12767bc to
be1a199
Compare
| benchmark.addCase(name) { i => | ||
| spark.sql(queryString).collect() | ||
| } | ||
| logInfo(s"\n\n===== TPCDS QUERY BENCHMARK OUTPUT FOR $name =====\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #19188 (comment)
| // If `--query-filter` defined, filters the queries that this option selects | ||
| val queriesToRun = if (benchmarkArgs.queryFilter.nonEmpty) { | ||
| val queries = tpcdsQueries.filter { case queryName => | ||
| benchmarkArgs.queryFilter.contains(queryName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add case insensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I like the idea.
| benchmarkArgs.queryFilter.contains(queryName) | ||
| } | ||
| if (queries.isEmpty) { | ||
| throw new RuntimeException("Bad query name filter: " + benchmarkArgs.queryFilter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Empty queries to run. Bad query name filter: " + benchmarkArgs.queryFilter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| /** | ||
| * Benchmark to measure TPCDS query performance. | ||
| * To run this: | ||
| * spark-submit --class <this class> <spark sql test jar> <TPCDS data location> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this usage text too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
Also, I manually checked if it worked. |
|
LGTM with few minor comments. |
|
better to add tests for |
|
Yeah, it's good if you can add one. |
|
But looks like current |
|
hmmm. ok, currently, we have two options only, so I feel okay to keep it now. |
|
Test build #81701 has finished for PR 19188 at commit
|
|
Test build #81702 has finished for PR 19188 at commit
|
|
Test build #81703 has finished for PR 19188 at commit
|
|
@gatorsmile could u check? Thanks~ |
| * Benchmark to measure TPCDS query performance. | ||
| * To run this: | ||
| * spark-submit --class <this class> <spark sql test jar> <TPCDS data location> | ||
| * spark-submit --class <this class> <spark sql test jar> --data-location <TPCDS data location> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, but I missed your point. what's correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--data-location <TPCDS data location> [--query-filter Queries to filter]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, thanks. better to add optional parameters here? I like a simple example here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I see.
| args = tail | ||
|
|
||
| case ("--query-filter") :: value :: tail => | ||
| queryFilter = value.toLowerCase(Locale.ROOT).split(",").map(_.trim).toSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also make "--data-location" case insensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mean the option name --data-location need to be case insensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! missed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SparkSubmitArguments handles options as case-sensitive, so better to make them case-insensitive, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
Test build #81740 has finished for PR 19188 at commit
|
c957e47 to
b543e71
Compare
|
Test build #81745 has finished for PR 19188 at commit
|
|
LGTM |
|
Merging to master. |
What changes were proposed in this pull request?
This pr added a new option to filter TPC-DS queries to run in
TPCDSQueryBenchmark.By default,
TPCDSQueryBenchmarkruns all the TPC-DS queries.This change could enable developers to run some of the TPC-DS queries by this option,
e.g., to run q2, q4, and q6 only:
How was this patch tested?
Manually checked.