Skip to content
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ object TPCDSQueryBenchmark {
"please modify the value of dataLocation to point to your local TPCDS data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we don't need this check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is no longer needed.

val tableSizes = setupTables(dataLocation)
queries.foreach { name =>
val queryString = fileToString(new File(Thread.currentThread().getContextClassLoader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz drop import java.io.File.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped.

.getResource(s"tpcds/$name.sql").getFile))
val queryString = resourceToString(s"tpcds/$name.sql", "UTF-8",
Thread.currentThread().getContextClassLoader)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need encoding explicitly? I feel it'd be better like;

val queryString = resourceToString(s"tpcds/$name.sql",
  classLoader = Thread.currentThread().getContextClassLoader)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


// This is an indirect hack to estimate the size of each query's input by traversing the
// logical plan and adding up the sizes of all tables that appear in the plan. Note that this
Expand Down Expand Up @@ -99,6 +99,13 @@ object TPCDSQueryBenchmark {
}

def main(args: Array[String]): Unit = {
if (args.length < 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also allow another way to run this benchmark?

We can hardcode the value of dataLocation and run it in IntelliJ directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarutak kindly ping

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can pass the argument through the run-configuration even though we use IDE like IntelliJ right?
Or, how about give dataLocation through a new property?

Copy link
Member

@gatorsmile gatorsmile Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarutak @maropu Could we do something like https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterArguments.scala?

Later, we also can add another argument for outputing the plans of TPC-DS queries, instead of running the actual queries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll add TPCDSQueryBenchmarkArguments.

// scalastyle:off println
println(
"Usage: spark-submit --class <this class> --jars <spark sql test jar> <data location>")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about also printing the description below like;

    if (args.length < 1) {
      // scalastyle:off println
      println(
        s"""
           |Usage: spark-submit --class <this class> <spark sql test jar> <TPCDS data location>
           |
           |In order to run this benchmark, please follow the instructions at
           |https://github.com/databricks/spark-sql-perf/blob/master/README.md to generate the TPCDS data
           |locally (preferably with a scale factor of 5 for benchmarking). Thereafter, the value of
           |dataLocation below needs to be set to the location where the generated data is stored.
         """.stripMargin)
      // scalastyle:on println
      System.exit(1)
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing like as you mentioned.

// scalastyle:on println
System.exit(1)
}

// List of all TPC-DS queries
val tpcdsQueries = Seq(
Expand All @@ -117,7 +124,7 @@ object TPCDSQueryBenchmark {
// https://github.com/databricks/spark-sql-perf/blob/master/README.md to generate the TPCDS data
// locally (preferably with a scale factor of 5 for benchmarking). Thereafter, the value of
// dataLocation below needs to be set to the location where the generated data is stored.
val dataLocation = ""
val dataLocation = args(0)

tpcdsAll(dataLocation, queries = tpcdsQueries)
}
Expand Down