Skip to content

Conversation

@yanboliang
Copy link
Contributor

@yanboliang yanboliang commented Aug 18, 2017

What changes were proposed in this pull request?

MLlib LinearRegression/LogisticRegression/LinearSVC always standardize the data during training to improve the rate of convergence regardless of standardization is true or false. If standardization is false, we perform reverse standardization by penalizing each component differently to get effectively the same objective function when the training dataset is not standardized. We should keep these comments in the code to let developers understand how we handle it correctly.

How was this patch tested?

Existing tests, only adding some comments in code.

@yanboliang
Copy link
Contributor Author

yanboliang commented Aug 18, 2017

@sethah @srowen Thanks for your great contributions of #17094. I wish you would not mind, I found these annotation section was missing. I think these are very important to let users/developers to understand how we handle standardization, so I add them back. Would you mind to have a look? Thanks. cc @MLnick @WeichenXu123

@SparkQA
Copy link

SparkQA commented Aug 18, 2017

Test build #80839 has finished for PR 18992 at commit 88b20dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@WeichenXu123 WeichenXu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// to improve the rate of convergence; as a result, we have to
// perform this reverse standardization by penalizing each component
// differently to get effectively the same objective function when
// the training dataset is not standardized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is an important comment we need keep.

@yanboliang
Copy link
Contributor Author

Merged into master. Thanks.

@asfgit asfgit closed this in c108a5d Aug 22, 2017
@yanboliang yanboliang deleted the SPARK-19762 branch August 22, 2017 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants