re-organization of docs + feedback

apache · bllchmbrs · Mar 25, 2015 · Mar 25, 2015 · Dec 7, 2015 · Dec 7, 2015
commit eb3f99c93d6a91d1d6da1765dbdc96d64ab3bf13
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
@@ -613,7 +613,49 @@ for row in selected.collect():
 
 {% endhighlight %}
 </div>
+</div>
+
+## Example: Saving and Loading a Pipeline
+
+Often times it is worth it to save a model to disk for later use. In Spark 1.6, model import/export functionality was added to the Pipeline API. Most basic transformers are supported as well as some of the more basic ML models such as:
+
+* K-Means
+* Naive Bayes
+* ALS
+* Linear Regression
+* Logistic Regression
+
+Below is an example of how a pipeline can be persisted and loaded. This example uses a model that we trained and created above.
+<div class="codetabs">
+
+<div data-lang="scala" markdown="1">
+{% highlight scala %}
+// fit the model as we did in the previous example
+val model = pipeline.fit(training)
+// now save it to disk
+model.save("/tmp/spark-logistic-regression-model")
 
+// load in the model
+val loadedModel = Pipeline.load("/tmp/spark-logistic-regression-model")
+// or equivalently
+val loadedModel = Pipeline.read.load("/tmp/spark-logistic-regression-model")
+
+val test = sqlContext.createDataFrame(Seq(
+  (4L, "spark i j k"),
+  (5L, "l m n"),
+  (6L, "mapreduce spark"),
+  (7L, "apache hadoop")
+)).toDF("id", "text")
+
+// Make predictions on test documents
+loadedModel.transform(test)
+  .select("id", "text", "probability", "prediction")
+  .collect()
+  .foreach { case Row(id: Long, text: String, prob: Vector, prediction: Double) =>
+    println(s"($id, $text) --> prob=$prob, prediction=$prediction")
+  }
+{% endhighlight %}
+</div>
 </div>
 
 ## Example: model selection via cross-validation
@@ -952,70 +994,3 @@ model.transform(test)
 </div>
 
 </div>
-
-## Example: Saving and Loading a Previously Created Model Pipeline
-
-Often times it is worth it to save a model to disk for usage later. In Spark 1.6, similar model import/export functionality was added to the Pipeline API. Most basic transformers are supported as well as some of the more basic ML Models such as:
-
-* K-Means
-* Naive Bayes
-* ALS
-* Linear Regression
-* Logistic Regression
-
-Below is an example of how a pipeline can be persisted and loaded. This example uses a model that we trained and created above.
-<div data-lang="scala" markdown="1">
-{% highlight scala %}
-import org.apache.spark.ml.Pipeline
-import org.apache.spark.ml.classification.LogisticRegression
-import org.apache.spark.ml.feature.{HashingTF, Tokenizer}
-import org.apache.spark.mllib.linalg.Vector
-import org.apache.spark.sql.Row
-
-// Prepare training documents from a list of (id, text, label) tuples.
-val training = sqlContext.createDataFrame(Seq(
-  (0L, "a b c d e spark", 1.0),
-  (1L, "b d", 0.0),
-  (2L, "spark f g h", 1.0),
-  (3L, "hadoop mapreduce", 0.0)
-)).toDF("id", "text", "label")
-
-// Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.
-val tokenizer = new Tokenizer()
-  .setInputCol("text")
-  .setOutputCol("words")
-val hashingTF = new HashingTF()
-  .setNumFeatures(1000)
-  .setInputCol(tokenizer.getOutputCol)
-  .setOutputCol("features")
-val lr = new LogisticRegression()
-  .setMaxIter(10)
-  .setRegParam(0.01)
-
-val pipeline = new Pipeline()
-  .setStages(Array(tokenizer, hashingTF, lr))
-
-val model = pipeline.fit(training)
-model.save("/tmp/spark-logistic-regression-model")
-
-// load in the model
-val loadedModel = Pipeline.load("/tmp/spark-logistic-regression-model")
-// or equivalently
-val loadedModel = Pipeline.read.load("/tmp/spark-logistic-regression-model")
-
-val test = sqlContext.createDataFrame(Seq(
-  (4L, "spark i j k"),
-  (5L, "l m n"),
-  (6L, "mapreduce spark"),
-  (7L, "apache hadoop")
-)).toDF("id", "text")
-
-// Make predictions on test documents.
-loadedModel.transform(test)
-  .select("id", "text", "probability", "prediction")
-  .collect()
-  .foreach { case Row(id: Long, text: String, prob: Vector, prediction: Double) =>
-    println(s"($id, $text) --> prob=$prob, prediction=$prediction")
-  }
-{% endhighlight %}
-</div>