Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-11207] Fix random values used by unit tests
  • Loading branch information
Lewuathe committed Oct 25, 2015
commit 59383fd41f1d6b96274c564eb2fb7c96f5ab07e0
Original file line number Diff line number Diff line change
Expand Up @@ -139,8 +139,9 @@ object LinearDataGenerator {
x.foreach { v =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you have sparsity, randomly choose n = numFeatures * (1 - sparsity) as non-zero features, and zero the rest out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also add the variance of sparsity such that the num of non zeros will not be constant.

var i = 0
val len = v.length
val sparceRnd = new Random(seed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you seed rnd and sparceRnd with the same seed, both of them will generate the same sequence of random numbers which is not what you want. You should be able to use the same random number generator which will give you uncorrelated random numbers in both creating the features and choice which columns to zero out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use same random generator for both creating features and choice which columns to zero, x is different from current ones. This cause unit test failures. Can we change the assertion tolerance or target written in LinearRegressionSuite?

while (i < len) {
if (rnd.nextDouble() <= sparcity) {
if (sparceRnd.nextDouble() < sparcity) {
v(i) = 0.0
} else {
v(i) = (v(i) - 0.5) * math.sqrt(12.0 * xVariance(i)) + xMean(i)
Expand Down