Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Sep 11, 2017

What changes were proposed in this pull request?

This pr added a new option to filter TPC-DS queries to run in TPCDSQueryBenchmark.
By default, TPCDSQueryBenchmark runs all the TPC-DS queries.
This change could enable developers to run some of the TPC-DS queries by this option,
e.g., to run q2, q4, and q6 only:

spark-submit --class <this class> --conf spark.sql.tpcds.queryFilter="q2,q4,q6" --jars <spark sql test jar>

How was this patch tested?

Manually checked.

Copy link
Member Author

@maropu maropu Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not directly related to this pr though, I added this log here cuz this change is much trivial and I think this log helps to easily check which query fails.

@SparkQA
Copy link

SparkQA commented Sep 11, 2017

Test build #81633 has finished for PR 19188 at commit 4665e85.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class TpcdsQueries(spark: SparkSession, queries: Seq[String], dataLocation: String)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: class -> variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need queries =?

@maropu maropu force-pushed the RunPartialQueriesInTPCDS branch from 9d9eff2 to 322c335 Compare September 11, 2017 09:19
@SparkQA
Copy link

SparkQA commented Sep 11, 2017

Test build #81637 has finished for PR 19188 at commit 322c335.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add an argument, instead of using the SQLConf? See #18592 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@gatorsmile
Copy link
Member

gatorsmile commented Sep 12, 2017

This #18592 has been merged. @maropu Could you update this PR? Thanks!

@maropu
Copy link
Member Author

maropu commented Sep 12, 2017

ok, I'll update in a day! Thanks!

@maropu maropu force-pushed the RunPartialQueriesInTPCDS branch 2 times, most recently from 9eedcd5 to 12767bc Compare September 13, 2017 02:14
@maropu maropu force-pushed the RunPartialQueriesInTPCDS branch from 12767bc to be1a199 Compare September 13, 2017 02:24
benchmark.addCase(name) { i =>
spark.sql(queryString).collect()
}
logInfo(s"\n\n===== TPCDS QUERY BENCHMARK OUTPUT FOR $name =====\n")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// If `--query-filter` defined, filters the queries that this option selects
val queriesToRun = if (benchmarkArgs.queryFilter.nonEmpty) {
val queries = tpcdsQueries.filter { case queryName =>
benchmarkArgs.queryFilter.contains(queryName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add case insensitive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I like the idea.

benchmarkArgs.queryFilter.contains(queryName)
}
if (queries.isEmpty) {
throw new RuntimeException("Bad query name filter: " + benchmarkArgs.queryFilter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Empty queries to run. Bad query name filter: " + benchmarkArgs.queryFilter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

/**
* Benchmark to measure TPCDS query performance.
* To run this:
* spark-submit --class <this class> <spark sql test jar> <TPCDS data location>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this usage text too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@maropu
Copy link
Member Author

maropu commented Sep 13, 2017

Also, I manually checked if it worked.

@viirya
Copy link
Member

viirya commented Sep 13, 2017

LGTM with few minor comments.

@maropu
Copy link
Member Author

maropu commented Sep 13, 2017

better to add tests for TPCDSQueryBenchmarkArguments? we have tests for SparkSubmitArguments in SparkSubmitSuite though.

@viirya
Copy link
Member

viirya commented Sep 13, 2017

Yeah, it's good if you can add one.

@viirya
Copy link
Member

viirya commented Sep 13, 2017

But looks like current TPCDSQueryBenchmarkArguments is not good to test individually...

@maropu
Copy link
Member Author

maropu commented Sep 13, 2017

hmmm. ok, currently, we have two options only, so I feel okay to keep it now.

@SparkQA
Copy link

SparkQA commented Sep 13, 2017

Test build #81701 has finished for PR 19188 at commit 12767bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 13, 2017

Test build #81702 has finished for PR 19188 at commit be1a199.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 13, 2017

Test build #81703 has finished for PR 19188 at commit cc11163.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Sep 13, 2017

@gatorsmile could u check? Thanks~

* Benchmark to measure TPCDS query performance.
* To run this:
* spark-submit --class <this class> <spark sql test jar> <TPCDS data location>
* spark-submit --class <this class> <spark sql test jar> --data-location <TPCDS data location>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, but I missed your point. what's correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--data-location <TPCDS data location> [--query-filter Queries to filter]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, thanks. better to add optional parameters here? I like a simple example here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I see.

args = tail

case ("--query-filter") :: value :: tail =>
queryFilter = value.toLowerCase(Locale.ROOT).split(",").map(_.trim).toSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also make "--data-location" case insensitive?

Copy link
Member Author

@maropu maropu Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean the option name --data-location need to be case insensitive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! missed...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SparkSubmitArguments handles options as case-sensitive, so better to make them case-insensitive, too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about that one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@SparkQA
Copy link

SparkQA commented Sep 14, 2017

Test build #81740 has finished for PR 19188 at commit c957e47.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu force-pushed the RunPartialQueriesInTPCDS branch from c957e47 to b543e71 Compare September 14, 2017 01:15
@SparkQA
Copy link

SparkQA commented Sep 14, 2017

Test build #81745 has finished for PR 19188 at commit b543e71.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Merging to master.

@asfgit asfgit closed this in 8be7e6b Sep 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants