Skip to content

Conversation

@WeichenXu123
Copy link
Contributor

What changes were proposed in this pull request?

Change PrefixSpan into a class with param setter/getters.
This address issues mentioned here:
#20973 (comment)

How was this patch tested?

UT.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@WeichenXu123
Copy link
Contributor Author

@mengxr @jkbradley

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90953 has finished for PR 21393 at commit 0732786.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • final class PrefixSpan(@Since(\"2.4.0\") override val uid: String) extends Params

@Since("2.4.0")
val minSupport = new DoubleParam(this, "minSupport", "the minimal support level of the " +
"sequential pattern, any pattern that appears more than (minSupport * size-of-the-dataset) " +
"times will be output", ParamValidators.gt(0.0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gt -> gtEq

@Since("2.4.0")
@Experimental
object PrefixSpan {
final class PrefixSpan(@Since("2.4.0") override val uid: String) extends Params {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the doc, mention that this class is not yet an Estimator/Transformer and link to findFrequentSequentialPatterns method.

def this() = this(Identifiable.randomUID("prefixSpan"))

/**
* the minimal support level of the sequential pattern, any pattern that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use uppercase for the first char:

"""
Param for the minimal support level (default: 0.1).
Sequential patterns that appear more than (minSupport * size-of-the-dataset) times are identified as frequent sequential patterns.
"""


/**
* Set the minSupport parameter.
* Default is 1.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is wrong. We don't need doc for the setters and getters. Just leave "@group setParam".

*
* @group setParam
*/
@Since("1.3.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also wrong. The method is new.

def setMinSupport(value: Double): this.type = set(minSupport, value)

/**
* the maximal length of the sequential pattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Param for the maximal pattern length (default: `10`).

Just copy the doc from mllib.fpm.PrefixSpan.


/** @group getParam */
@Since("2.4.0")
def getMaxPatternLength: Double = $(maxPatternLength)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return Int

def setMaxPatternLength(value: Int): this.type = set(maxPatternLength, value)

/**
* The maximum number of items (including delimiters used in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Just copy the doc from mllib.fpm.PrefixSpan.


/** @group getParam */
@Since("2.4.0")
def getMaxLocalProjDBSize: Double = $(maxLocalProjDBSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long

minSupport: Double,
maxPatternLength: Int,
maxLocalProjDBSize: Long): DataFrame = {
sequenceCol: String): DataFrame = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not making it a param?

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91028 has finished for PR 21393 at commit 6e0c59f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91027 has finished for PR 21393 at commit 90d71e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented May 23, 2018

LGTM. Merged into master. Thanks! Please also update #21265.

@asfgit asfgit closed this in df12506 May 23, 2018
@WeichenXu123 WeichenXu123 deleted the fix_prefix_span branch April 24, 2019 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants