Skip to content

Conversation

@yanboliang
Copy link
Contributor

Currently RFormula can only handle label with NumericType or BinaryType (cast it to DoubleType as the label of Linear Regression training), we should also support label of StringType which is needed for Logistic Regression (glm with family = "binomial").
For label of StringType, we should use StringIndexer to transform it to 0-based index.

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #44437 has finished for PR 9302 at commit 233ce5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

I think further more we should provide a param named family for RFormula to indicate the estimator/model which will be applied to the dataframe transformed by this RFormula transformer, and then we can do more strict label validation check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not happen inside transform. we cannot afford one fit call per transform. We should inside the label indexer into the RFormula.fitinstead of RFormulaModel.transform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently disable this assert due to StringIndexer output inconsistent nullable, I opened SPARK-11478 to track that issue. After it resolved, this assert will be enabled.

@SparkQA
Copy link

SparkQA commented Nov 3, 2015

Test build #44925 has finished for PR 9302 at commit 40b9a27.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Nov 3, 2015

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in f54ff19 Nov 3, 2015
@yanboliang yanboliang deleted the spark-11349 branch November 4, 2015 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants