Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
address comment
  • Loading branch information
mgaido91 committed Apr 27, 2018
commit 2fa8da744b1726284577deca6c70d184cdae3579
Original file line number Diff line number Diff line change
Expand Up @@ -284,8 +284,9 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSQLContext wi
}

test("SPARK-24013: unneeded compress can cause performance issues with sorted input") {
failAfter(20 seconds) {
assert(sql("select approx_percentile(id, array(0.1)) from range(10000000)").count() == 1)
failAfter(30 seconds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test looks pretty weird. Can we add some kind of unit test and move this test to PR description and say the perf has improved a lot after this patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is not the best UT, but I couldn't find any better way to test this. If anybody has any idea of a better test, I am happy to follow your right suggestion...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a UT for ApproximatePercentile, and check that after calling add, isCompressed is still false.

checkAnswer(sql("select approx_percentile(id, array(0.1)) from range(10000000)"),
Row(Array(999160)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
With the approx nature of the algorithm, could the exact answer not get flakty through some small changes in code or config? (like e.g. the split of range into tasks, and then different merging of partial aggrs producing slightly different results)
maybe just asserting on collect().length == 1 would do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not the only place where it is checked with an exact answer, so I don't think it is an issue, a small change would anyway require to change many test cases answers. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Yeah, looking at the other tests in this suite it's definitely fine :-).

}
}
}