Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix python and R test
  • Loading branch information
YY-OnCall committed Jun 13, 2017
commit eb25f287393ccffeb1a5905f7752298808945a2c
3 changes: 2 additions & 1 deletion R/pkg/tests/fulltests/test_mllib_fpm.R
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ test_that("spark.fpGrowth", {
expected_association_rules <- data.frame(
antecedent = I(list(list("2"), list("3"))),
consequent = I(list(list("1"), list("1"))),
confidence = c(1, 1)
confidence = c(1, 1),
support = c(0.75, 0.5)
)

expect_equivalent(expected_association_rules, collect(spark.associationRules(model)))
Expand Down
38 changes: 19 additions & 19 deletions python/pyspark/ml/fpm.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,29 +186,29 @@ class FPGrowth(JavaEstimator, HasItemsCol, HasPredictionCol,
|[z] |
|[x, z, y, r, q, t, p] |
+------------------------+
>>> fp = FPGrowth(minSupport=0.2, minConfidence=0.7)
>>> fp = FPGrowth(minSupport=0.4, minConfidence=0.7)
>>> fpm = fp.fit(data)
>>> fpm.freqItemsets.show(5)
+---------+----+
| items|freq|
+---------+----+
| [s]| 3|
| [s, x]| 3|
|[s, x, z]| 2|
| [s, z]| 2|
| [r]| 3|
+---------+----+
+------+----+
| items|freq|
+------+----+
| [s]| 3|
|[s, x]| 3|
| [r]| 3|
| [y]| 3|
|[y, x]| 3|
+------+----+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to change the result quite a bit, is this expected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set a higher minSupport to avoid the 0.3333333... in the support column.

only showing top 5 rows
>>> fpm.associationRules.show(5)
+----------+----------+----------+
|antecedent|consequent|confidence|
+----------+----------+----------+
| [t, s]| [y]| 1.0|
| [t, s]| [x]| 1.0|
| [t, s]| [z]| 1.0|
| [p]| [r]| 1.0|
| [p]| [z]| 1.0|
+----------+----------+----------+
+----------+----------+----------+-------+
|antecedent|consequent|confidence|support|
+----------+----------+----------+-------+
| [t]| [y]| 1.0| 0.5|
| [t]| [x]| 1.0| 0.5|
| [t]| [z]| 1.0| 0.5|
| [y, t, x]| [z]| 1.0| 0.5|
| [x]| [s]| 0.75| 0.5|
+----------+----------+----------+-------+
only showing top 5 rows
>>> new_data = spark.createDataFrame([(["t", "s"], )], ["items"])
>>> sorted(fpm.transform(new_data).first().prediction)
Expand Down
4 changes: 2 additions & 2 deletions python/pyspark/ml/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -1283,8 +1283,8 @@ def test_association_rules(self):
fpm = fp.fit(self.data)

expected_association_rules = self.spark.createDataFrame(
[([3], [1], 1.0), ([2], [1], 1.0)],
["antecedent", "consequent", "confidence"]
[([3], [1], 1.0, 0.5), ([2], [1], 1.0, 0.75)],
["antecedent", "consequent", "confidence", "support"]
)
actual_association_rules = fpm.associationRules

Expand Down