-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20114][ML][FOLLOW-UP] spark.ml parity for sequential pattern mining - PrefixSpan #21393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #90953 has finished for PR 21393 at commit
|
| @Since("2.4.0") | ||
| val minSupport = new DoubleParam(this, "minSupport", "the minimal support level of the " + | ||
| "sequential pattern, any pattern that appears more than (minSupport * size-of-the-dataset) " + | ||
| "times will be output", ParamValidators.gt(0.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gt -> gtEq
| @Since("2.4.0") | ||
| @Experimental | ||
| object PrefixSpan { | ||
| final class PrefixSpan(@Since("2.4.0") override val uid: String) extends Params { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the doc, mention that this class is not yet an Estimator/Transformer and link to findFrequentSequentialPatterns method.
| def this() = this(Identifiable.randomUID("prefixSpan")) | ||
|
|
||
| /** | ||
| * the minimal support level of the sequential pattern, any pattern that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use uppercase for the first char:
"""
Param for the minimal support level (default: 0.1).
Sequential patterns that appear more than (minSupport * size-of-the-dataset) times are identified as frequent sequential patterns.
"""
|
|
||
| /** | ||
| * Set the minSupport parameter. | ||
| * Default is 1.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is wrong. We don't need doc for the setters and getters. Just leave "@group setParam".
| * | ||
| * @group setParam | ||
| */ | ||
| @Since("1.3.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also wrong. The method is new.
| def setMinSupport(value: Double): this.type = set(minSupport, value) | ||
|
|
||
| /** | ||
| * the maximal length of the sequential pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Param for the maximal pattern length (default: `10`).
Just copy the doc from mllib.fpm.PrefixSpan.
|
|
||
| /** @group getParam */ | ||
| @Since("2.4.0") | ||
| def getMaxPatternLength: Double = $(maxPatternLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Int
| def setMaxPatternLength(value: Int): this.type = set(maxPatternLength, value) | ||
|
|
||
| /** | ||
| * The maximum number of items (including delimiters used in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Just copy the doc from mllib.fpm.PrefixSpan.
|
|
||
| /** @group getParam */ | ||
| @Since("2.4.0") | ||
| def getMaxLocalProjDBSize: Double = $(maxLocalProjDBSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long
| minSupport: Double, | ||
| maxPatternLength: Int, | ||
| maxLocalProjDBSize: Long): DataFrame = { | ||
| sequenceCol: String): DataFrame = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not making it a param?
|
Test build #91028 has finished for PR 21393 at commit
|
|
Test build #91027 has finished for PR 21393 at commit
|
|
LGTM. Merged into master. Thanks! Please also update #21265. |
What changes were proposed in this pull request?
Change
PrefixSpaninto a class with param setter/getters.This address issues mentioned here:
#20973 (comment)
How was this patch tested?
UT.
Please review http://spark.apache.org/contributing.html before opening a pull request.