Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
doctest
Signed-off-by: VinceShieh <[email protected]>
  • Loading branch information
VinceShieh authored and yanboliang committed Jul 1, 2017
commit 79dbf81fb26d4c1d85f68d0b9dec73b8810f5157
6 changes: 4 additions & 2 deletions python/pyspark/ml/feature.py
Original file line number Diff line number Diff line change
Expand Up @@ -2100,8 +2100,10 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, JavaMLReadable, Ja
>>> testData2 = sc.parallelize([Row(id=0, label="a"), Row(id=1, label="d"),
... Row(id=2, label="e")], 2)
>>> dfKeep= spark.createDataFrame(testData2)
>>> tdK = stringIndexer.setHandleInvalid("keep").fit(stringIndDf).transform(dfKeep)
>>> itdK = inverter.transform(tdK)
>>> modelKeep = stringIndexer.setHandleInvalid("keep").fit(stringIndDf)
>>> tdK = modelKeep.transform(dfKeep)
>>> itdK = IndexToString(inputCol="indexed", outputCol="label2",
... labels=modelKeep.labels).transform(tdK)
>>> sorted(set([(i[0], str(i[1])) for i in itdK.select(itdK.id, itdK.label2).collect()]),
... key=lambda x: x[0])
[(0, 'a'), (6, 'd'), (6, 'e')]
Expand Down