Skip to content
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala
Original file line number Diff line number Diff line change
Expand Up @@ -111,16 +111,17 @@ private[spark] class Benchmark(
// The results are going to be processor specific so it is useful to include that.
out.println(Benchmark.getJVMOSInfo())
out.println(Benchmark.getProcessorName())
out.printf("%-40s %16s %12s %13s %10s\n", name + ":", "Best/Avg Time(ms)", "Rate(M/s)",
"Per Row(ns)", "Relative")
out.println("-" * 96)
out.printf("%-40s %16s %12s %13s %10s %13s\n", name + ":", "Best/Avg Time(ms)", "Rate(M/s)",
"Per Row(ns)", "Relative", "Stdev (ms)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this adds the new value at the end. Can we move this to Best/Avg Time(ms) group? For example, Best/Avg/Stdev Time(ms)?

Limiting:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative    Stdev (ms)
--------------------------------------------------------------------------------------------------
Top-level column                   231 /  240          4.3         230.7       1.0X            11
Nested column                     1833 / 1957          0.5        1833.1       0.1X            68

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess Best/Avg/Stdev (ms) will be enough because we use Per Row(ns) already.

Copy link
Member

@gengliangwang gengliangwang Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun I thought about this. But then the readability of the numbers might be worse.
How about make each of them a single column? E.g.
Best Time(ms) Avg Time(ms) Stdev Time(ms)
I don't have a strong preference here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to add it, it doesn't make sense to do it separately at the end. I think best, avg, and stdev should be their own columns now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it, @srowen .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I can separate it and place it after "avg" and before "rate"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok it looks like this now:

[info] agg w/o group:                            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] agg w/o group wholestage off                      43309          43591         283         48.4          20.7       1.0X
[info] agg w/o group wholestage on                        1032           1111         111       2032.4           0.5      42.0X

out.println("-" * 110)
results.zip(benchmarks).foreach { case (result, benchmark) =>
out.printf("%-40s %16s %12s %13s %10s\n",
out.printf("%-40s %16s %12s %13s %10s %13s\n",
benchmark.name,
"%5.0f / %4.0f" format (result.bestMs, result.avgMs),
"%10.1f" format result.bestRate,
"%6.1f" format (1000 / result.bestRate),
"%3.1fX" format (firstBest / result.bestMs))
"%3.1fX" format (firstBest / result.bestMs),
"%5.0f" format result.stdevMs)
}
out.println
// scalastyle:on
Expand Down Expand Up @@ -158,7 +159,8 @@ private[spark] class Benchmark(
// scalastyle:on
val best = runTimes.min
val avg = runTimes.sum / runTimes.size
Result(avg / 1000000.0, num / (best / 1000.0), best / 1000000.0)
val stdev = math.sqrt(runTimes.map(time => math.pow(time - avg, 2)).sum / runTimes.size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it really matters, but (time - avg) * (time - avg) is fine here and faster than pow.
Super nit but I'd suggest it's more reasonable to use the sample rather than population stdev: divide by runTimes.size - 1. I suppose this means also checking that there are at least 2 runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I agree with you on both. If there aren't enough runs, should we just put "N/A" then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can assume (or assert) that there was at least 1 benchmarking run, or none of the metrics mean anything. (maybe it's already asserted)

While the sample stdev is not really defined for 1 run, "0" is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's asserted anywhere. i'll add it

Result(avg / 1000000.0, num / (best / 1000.0), best / 1000000.0, stdev / 1000000.0)
}
}

Expand Down Expand Up @@ -191,7 +193,7 @@ private[spark] object Benchmark {
}

case class Case(name: String, fn: Timer => Unit, numIters: Int)
case class Result(avgMs: Double, bestRate: Double, bestMs: Double)
case class Result(avgMs: Double, bestRate: Double, bestMs: Double, stdevMs: Double)

/**
* This should return a user helpful processor information. Getting at this depends on the OS.
Expand Down