[SPARK-2309][MLlib] Generalize the binary logistic regression into multinomial logistic regression #1379

dbtsai · 2014-07-12T00:01:17Z

Currently, there is no multi-class classifier in mllib. Logistic regression can be extended to multinomial classifier straightforwardly.
The following formula will be implemented.
http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297/25

Note: When multi-classes mode, there will be multiple intercepts, so we don't use the single intercept in GeneralizedLinearModel, and have all the intercepts into weights. It makes some inconsistency. For example, in the binary mode, the intercept can not be specified by users, but since in the multinomial mode, the intercepts are combined into weights, users can specify them.

@mengxr Should we just deprecate the intercept, and have everything in weights? It makes sense in term of optimization point of view, and also make the interface cleaner. Thanks.

SparkQA · 2014-07-12T00:07:29Z

QA tests have started for PR 1379. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16579/consoleFull

SparkQA · 2014-07-12T00:07:34Z

QA results for PR 1379:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
* as used in multi-class classification (it is also used in binary logistic regression).

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16579/consoleFull

mengxr · 2014-07-21T23:57:13Z

Jenkins, retest this please.

dbtsai · 2014-07-21T23:59:27Z

I think it fails due to the apache license is not in the test file. As you suggest, I'll move it to be generated in the runtime. Would like to know the general feedback. I'll make the test pass tomorrow. Thanks.

SparkQA · 2014-07-22T00:03:21Z

QA tests have started for PR 1379. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16937/consoleFull

SparkQA · 2014-07-22T00:03:26Z

QA results for PR 1379:
- This patch FAILED unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16937/consoleFull

mengxr · 2014-07-22T00:11:27Z

It is easier to review if it passes the tests. @SparkQA shows new public classes and interface changes. Could you remove the data file and generate some synthetic data for unit tests? Thanks!

dbtsai · 2014-08-03T05:45:26Z

@mengxr Is there any problem with asfgit? This is not finished yet, why asfgit said it's merged into apache:master.

mengxr · 2014-08-03T06:41:15Z

... I have no idea. Let me check.

mengxr · 2014-08-03T06:44:15Z

@pwendell I didn't see Closes #1379 in the merged commit. Is something wrong with asfgit?

BigCrunsh · 2014-10-28T17:05:42Z

What is the current state of the PR? Can't see any changes in the code...

dbtsai · 2014-10-28T19:10:30Z

@BigCrunsh I'm working on this. Let's see if we can merge in Spark 1.2

avulanov · 2014-11-19T01:20:39Z

@dbtsai Hi! What is the current state of PR? I would like to download and test. Could you suggest where are the sources?

avulanov · 2014-11-20T01:52:39Z

Apparently, I've found this implementation https://github.com/dbtsai/spark/tree/dbtsai-mlor. It did work on my examples producing reasonable results. Could you comment on the following? Why the number of parameters (weights) is equal to (num_features + 1)(num_classes-1) ? I would expect (num_features + 1)(num_classes) as it is here for example: http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression

dbtsai · 2014-11-21T00:10:50Z

@avulanov I will merge this on Spark 1.3, and sorry for delay since I was very busy recently. Yes, the branch you found should work, but it can not be cleanly merged in upstream, and I'm working on it. You can try that branch for now. Also, in the branch, we don't use LBFGS as optimizer, so the convergent rate will be slow.

Basically, you can model the whole problem using (num_features + 1)(num_classes), but the solution will not be unique. You can chose one of the class as base class to make the solution unique, and I chose the first class as base class. See Properties of softmax regression parameterization in the wiki page you refer. Or my presentation http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297 for more technical detail. You can think about binary logistic regression, and you only have (num_features + 1) coefficients instead of 2 * (num_features + 1)

avulanov · 2014-11-21T00:32:20Z

@dbtsai Thanks for explanation! Do I understand correct, that if I want to get (num_features+1)*(num_classes) parameters from your model, I need to concatenate a vector of length (num_features+1) with zeros at the beginning of the vector that your model returns with model.weights?

dbtsai · 2014-11-21T00:38:49Z

no, in the algorithm, I already model the problem http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297/24 , so there will always be only (num_features + 1)(num_classes-1) parameters. Of course, you can chose any transformation to make it over-parameterize, see Properties of softmax regression parameterization session in wiki for detail.

avulanov · 2014-12-03T01:46:30Z

@dbtsai I've tried your implementation with LBFGS optimizer and it seems to have similar performance in terms of running time and accuracy to SGD that you have right now. Do you think it worth testing it against our implementation of artificial neural network with no hidden layer #1290? It uses a different cost function but it still might be interesting to compare.

dbtsai · 2014-12-03T02:02:23Z

@avulanov Sure, it's interesting to see the comparison. Let me know the result once you have it. I'm going to make it merge in 1.3, so will be easier to use it in the future.

avulanov · 2014-12-06T01:27:03Z

@dbtsai Here are the results of my tests:

Settings:
- Spark: latest Spark merged with https://github.com/dbtsai/spark/tree/dbtsai-mlor (manual merge) and https://github.com/avulanov/spark/tree/annclassifier. Optimizer in MLOR was changed to LBFGS to make a correct comparison with ANN which uses LBFGS.
- Hadoop 1.2.1, dataset is loaded from hdfs
- Cluster: 6 machines Xeon 3.3GHz 4 cores, 16GB RAM, each machine has 2 Spark Workers with maximum 8GB or RAM and 2GB used, total 12 workers
- Dataset: mnist8m; classes: 10; data: 8,100,000 instances; features: 784; random split 99% train, 1% test
- Link to the dataset: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/mnist8m.scale.bz2
- Learning settings: 40 iterations, tolerance=1e-4 (both); ANN classifier: hidden layer Array[Int]() (no hidden layer - the same as regression)
Result
- ANN classifier: training time: 00:47:55; accuracy: 0.848
- MLOR: training time: 01:30:45; accuracy: 0.864
Average gradient(?) compute time (mapPartitionsWithIndex at RDDFunctions.scala:108)
- ANN classifier: 51 seconds
- MLOR: 2.1 minutes
Average update(optimize?) time (reduce at RDDFunctions.scala:112)
- ANN classifier: 90 ms
- MLOR: 90 ms

It seems that ANN is almost 2x faster (with the mentioned settings), though accuracy is 1.6% smaller. The difference in accuracy can be explained by the fact that ANN uses (half) squared error cost function instead of cross entropy and no softmax. They are supposed to be better for classification.

dbtsai · 2014-12-08T21:42:48Z

@avulanov I did couple performance turning in the MLOR gradient calculation in my company's proprietary implementation which results 4x faster than the open source one in github you tested. I'm trying to make it open source and merge into spark soon. (ps, simple polynomial expansion with MLOR can increase the mnist8m accuracy from 86% to 94% in my experiment. See Prof. CJ Lin's talk - https://www.youtube.com/watch?v=GCIJP0cLSmU )

jkbradley · 2014-12-08T23:34:27Z

@avulanov Nice tests! A few comments:

Computing accuracy: It would be good to test on the original MNIST test set, rather than a subset of the training set. The training set includes a bunch of duplicates of images with slight modifications, so results on it might be misleading.
The timing tests look pretty convincing for ANN! Can you please confirm whether both algorithms did all 40 iterations? Or did they sometimes stop early b/c of the convergence tolerance?

avulanov · 2014-12-09T17:09:59Z

@dbtsai 1) Could you elaborate on what kind of optimizations did you do? Probably, they could be applied to the broader MLlib, which is beneficial. 2) Do you know the reason why our ANN implementation worked faster than the MLOR you shared? This could also be interesting in terms of MLlib optimization. 3) Did you mean fitting a n-th degree polynom instead of a linear function? Thanks for the link, it seems very interesting!

avulanov · 2014-12-09T17:23:12Z

@jkbradley Thank you! They took some time.

I totally agree with you, I need to perform tests on the original test set. It contains less attributes, i.e. 778 vs 784 in mnist8m, so one needs to add zeros to it to make it compatible.
They both did all 40 iterations.

dbtsai · 2014-12-09T18:55:07Z

@avulanov

I did the same optimization for MLlib in my recently PRs.

Accessing the values in dense/sparse vector directly is very slow without having a local reference of primitive array due to the dereference. See [SPARK-4717][MLlib] Optimize BLAS library to avoid de-reference multiple times in loop #3577 and [SPARK-4581][MLlib] Refactorize StandardScaler to improve the transformation performance #3435. There is bytecode analysis for this issue in [SPARK-4581][MLlib] Refactorize StandardScaler to improve the transformation performance #3435
Breeze's foreachActive is very slow, so I implemented a 4x faster version in [SPARK-4431][MLlib] Implement efficient foreachActive for dense and sparse vector #3288 My experience is that if Breeze is used in critical code path, it has to be cautious.

I don't check out your ANN implementation yet, but I will check today. I'll send you our optimized Gradient Computation code for MLOR. Will be interesting to see the new benchmark compared with the one you tested.
See page 27 at Prof. CJ Lin's slide. http://www.csie.ntu.edu.tw/~cjlin/talks/SFmeetup.pdf It's just doing the feature expansion by mapping the data into higher dimension space.

avulanov · 2014-12-10T17:34:31Z

@dbtsai Thank you, I look forward for your code to perform benchmarks. Thanks again for the video! I've enjoy ed it, especially Q&A after the talk. At 51:23 Prof CJ Lin mentiones that "we released dataset of about 600 Gigabytes". Do you know where I can download it? It should be quite a challenging workload for classification in Spark! Upd: is it this one? http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#splice-site

dbtsai · 2014-12-10T20:00:03Z

@avulanov I remembered CJ Lin said he posted the 600GB dataset on his website.

avulanov · 2014-12-17T00:20:19Z

@dbtsai Hi! Did you have a chance to check our implementation and send me the optimized one?

dbtsai · 2014-12-19T20:40:58Z

@avulanov I don't check your implementation yet, but I'm ready to have the optimized MLOR for you to test. Can you try the LogisticGradient in https://github.com/AlpineNow/spark/commits/mlor

@DeveloperApi
class LogisticGradient extends Gradient {
  override def compute(data: Vector, label: Double, weights: Vector): (Vector, Double) = {
    val gradient = Vectors.zeros(weights.size)
    val loss = compute(data, label, weights, gradient)
    (gradient, loss)
  }

  override def compute(
      data: Vector,
      label: Double,
      weights: Vector,
      cumGradient: Vector): Double = {
    assert((weights.size % data.size) == 0)
    val dataSize = data.size
    // (n + 1) is number of classes
    val n = (weights.size / dataSize)
    val numerators = Array.ofDim[Double](n)

    var denominator = 0.0
    var margin = 0.0

    val weightsArray = weights match {
      case dv: DenseVector => dv.values
      case _ =>
        throw new IllegalArgumentException(
          s"weights only supports dense vector but got type ${weights.getClass}.")
    }
    val cumGradientArray = cumGradient match {
      case dv: DenseVector => dv.values
      case _ =>
        throw new IllegalArgumentException(
          s"cumGradient only supports dense vector but got type ${cumGradient.getClass}.")
    }

    var i = 0
    while (i < n) {
      var sum = 0.0
      data.foreachActive { (index, value) =>
        if (value != 0.0) sum += value * weightsArray((i * dataSize) + index)
      }
      if (i == label.toInt - 1) margin = sum
      numerators(i) = math.exp(sum)
      denominator += numerators(i)
      i += 1
    }

    i = 0
    while (i < n) {
      val multiplier = numerators(i) / (denominator + 1.0) - {
        if (label != 0.0 && label == i + 1) 1.0 else 0.0
      }
      data.foreachActive { (index, value) =>
        if (value != 0.0) cumGradientArray(i * dataSize + index) += multiplier * value
      }
      i += 1
    }

    if (label > 0.0) {
      math.log1p(denominator) - margin
    } else {
      math.log1p(denominator)
    }
  }
}

dbtsai · 2014-12-20T00:29:44Z

@avulanov PS, you can just replace the gradient function without doing any change. Let me know how much performance gain you see, and I'm very interested in this. Thanks.

avulanov · 2014-12-20T00:55:19Z

@dbtsai Thank you! Should I use the latest Spark with this Gradient?

dbtsai · 2014-12-20T00:57:04Z

Yes, foreachActive is the new API in Spark 1.2.

avulanov · 2014-12-20T01:35:58Z

@dbtsai GeneralizedLinearAlgorithm throws exception org.apache.spark.SparkException: Input validation failed.. Moreover, there is no regression with LBFGS. Probably I need to use some other your files, like I did it before. Should I clone https://github.com/AlpineNow/spark/commits/mlor and merge it with latest Spark?

dbtsai · 2014-12-20T01:54:06Z

@avulanov The new branch is not finished yet. You need to rebase https://github.com/dbtsai/spark/tree/dbtsai-mlor to master, and just replace the gradient function.

avulanov · 2014-12-22T18:31:38Z

@dbtsai I did local experiment on mnist and your new implementation seems to be more than 2x faster than the previous one! I am going to perform bigger experiments. In the meantime, could you suggest if optimizations that you did are applicable for ANN Gradient? It will be extremely helpful for us. https://github.com/bgreeven/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala#L467

avulanov · 2014-12-23T21:56:48Z

New results of experiments with optimized ANN and MLOR are below. I used the same cluster of 6 machines with 12 workers total, mnist8m dataset as train and the standard mnist test converted to 784 attributes.

Results
- ANN classifier: training time: 00:16:58 (was 00:47:55); accuracy: 0.9021
- MLOR: training time: 00:09:46 (was 01:30:45); accuracy: 0.9084
Average step time (reduce at RDDFunctions.scala:112):
- ANN classifier: 23 seconds (was 51 s)
- MLOR: 14 seconds (was 2.1 mins)

The ANN became ~3x and MLOR ~10x faster (!) than before. The current MLOR is ~60% faster than current ANN. I assume that there are the following overheads in ANN: 1) it uses back-propagation, so there are two matrix vector multiplications on forward and backward passes 2) it does rolling the parameters stored in matrices to the vector form. I will be happy to know how these overheads can be reduced. We can't compare with previously obtained accuracy because I used different test sets.

dbtsai · 2014-12-24T06:13:09Z

@avulanov It's very encouraging benchmark result you saw in real world cluster setup. Since I'm on vacation recently, I don't actually deploy the new code and benchmark in our cluster. Great to see such huge 10x performance gain (actually bigger than what I thought, and in my local single machine testing, I only saw 2~4x difference.)

What optimization do you do on your ANN implementation? The same things in MLOR?

@mengxr Is it possible to reopne this closed PR in github? There are lots of useful discussion here, so I don't want to open another PR in github. I think I'm mostly done except the unit-test, and I can push the code for code review now before our meeting. (PS, the now code is more generalized than binary one, and has the same performance in the binary special case in my local testing.)

avulanov · 2015-01-05T17:22:50Z

@dbtsai
Just back from vacation too:)

I used my old implementation of the matrix form of back propagation and made sure that it properly uses stride of matrices in breeze. Also, I optimized roll of parameters into vector combined with in-place update of cumulative sum.

avulanov · 2015-01-08T01:43:12Z

@dbtsai BTW., have you thought about batch processing of input vectors, i.e. stack N vectors into matrix and perform optimization with this matrix instead of vector? With native BLAS enabled this might improve the performance.

dbtsai · 2015-01-08T02:03:49Z

@avulanov I've thought about that. However, @mengxr told me that they have a intern trying to do this type of experiment last year, and they don't see significant performance gain. I'm thinking to implement the whole gradient function using native code/SIMD by batching the input vectors as matrix. Since for MLOR, the computation of objective function is very expensive.

avulanov · 2015-01-26T20:24:50Z

@dbtsai I did batching for artificial neural networks and the performance improved ~5x #1290 (comment)

#1379 is automatically closed by asfgit, and github can not reopen it once it's closed, so this will be the new PR. Binary Logistic Regression can be extended to Multinomial Logistic Regression by running K-1 independent Binary Logistic Regression models. The following formula is implemented. http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297/25 Author: DB Tsai <[email protected]> Closes #3833 from dbtsai/mlor and squashes the following commits: 4e2f354 [DB Tsai] triger jenkins 697b7c9 [DB Tsai] address some feedback 4ce4d33 [DB Tsai] refactoring ff843b3 [DB Tsai] rebase f114135 [DB Tsai] refactoring 4348426 [DB Tsai] Addressed feedback from Sean Owen a252197 [DB Tsai] first commit

… S3 rate limiter (apache#1379) ### What changes were proposed in this pull request? This PR aims to improve `Fallback Storage` upload speed by randomizing the path in order to avoid S3 rate limiter. ### Why are the changes needed? Currently, `Fallback Storage` is using `a single prefix per shuffle`. This PR aims to randomize the upload prefixes even in a single shuffle to avoid S3 rate limiter. - https://aws.amazon.com/premiumsupport/knowledge-center/s3-request-limit-avoid-throttling/ ### Does this PR introduce _any_ user-facing change? No. This is used internally during the runtime. ### How was this patch tested? Pass the CIs to verify read and write operations. To check the layout, check the uploaded path manually with the following configs. ``` spark.decommission.enabled true spark.storage.decommission.enabled true spark.storage.decommission.shuffleBlocks.enabled true spark.storage.decommission.fallbackStorage.path file:///tmp/fallback/ ``` Start one master and worker. Connect with `spark-shell` and generate shuffle data. ``` scala> sc.parallelize(1 to 11, 10).map(x => (x % 3, 1)).reduceByKey(_ + _).count() res0: Long = 3 ``` Invoke decommission and check. Since we have only one worker, the shuffle data go to the fallback storage directly. ``` $ kill -PWR <CoarseGrainedExecutorBackend JVM PID> $ tree /tmp/fallback /tmp/fallback └── app-20211130135922-0001 └── 0 ├── 103417883 │ └── shuffle_0_7_0.data ├── 1036881592 │ └── shuffle_0_4_0.data ├── 1094002679 │ └── shuffle_0_7_0.index ├── 1393510154 │ └── shuffle_0_6_0.index ├── 1515275369 │ └── shuffle_0_3_0.data ├── 1541340402 │ └── shuffle_0_2_0.index ├── 1639392452 │ └── shuffle_0_8_0.data ├── 1774061049 │ └── shuffle_0_9_0.index ├── 1846228218 │ └── shuffle_0_6_0.data ├── 1970345301 │ └── shuffle_0_1_0.data ├── 2073568524 │ └── shuffle_0_4_0.index ├── 227534966 │ └── shuffle_0_2_0.data ├── 266114061 │ └── shuffle_0_3_0.index ├── 413944309 │ └── shuffle_0_5_0.index ├── 581811660 │ └── shuffle_0_0_0.data ├── 705928743 │ └── shuffle_0_5_0.data ├── 713451784 │ └── shuffle_0_8_0.index ├── 861282032 │ └── shuffle_0_0_0.index ├── 912764509 │ └── shuffle_0_9_0.data └── 946172431 └── shuffle_0_1_0.index ``` Closes apache#34762 from dongjoon-hyun/SPARK-37509. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit ca25534) Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit c88b258) Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]>

asfgit merged commit 18f29b9 into apache:master Aug 3, 2014

avulanov mentioned this pull request Dec 6, 2014

[MLLIB] [spark-2352] Implementation of an Artificial Neural Network (ANN) #1290

Closed

dbtsai mentioned this pull request Dec 30, 2014

[SPARK-2309][MLlib] Multinomial Logistic Regression #3833

Closed

[SPARK-2309][MLlib] Generalize the binary logistic regression into multinomial logistic regression #1379

[SPARK-2309][MLlib] Generalize the binary logistic regression into multinomial logistic regression #1379

Uh oh!

Conversation

dbtsai commented Jul 12, 2014

Uh oh!

SparkQA commented Jul 12, 2014

Uh oh!

SparkQA commented Jul 12, 2014

Uh oh!

mengxr commented Jul 21, 2014

Uh oh!

dbtsai commented Jul 21, 2014

Uh oh!

SparkQA commented Jul 22, 2014

Uh oh!

SparkQA commented Jul 22, 2014

Uh oh!

mengxr commented Jul 22, 2014

Uh oh!

dbtsai commented Aug 3, 2014

Uh oh!

mengxr commented Aug 3, 2014

Uh oh!

mengxr commented Aug 3, 2014

Uh oh!

BigCrunsh commented Oct 28, 2014

Uh oh!

dbtsai commented Oct 28, 2014

Uh oh!

avulanov commented Nov 19, 2014

Uh oh!

avulanov commented Nov 20, 2014

Uh oh!

dbtsai commented Nov 21, 2014

Uh oh!

avulanov commented Nov 21, 2014

Uh oh!

dbtsai commented Nov 21, 2014

Uh oh!

avulanov commented Dec 3, 2014

Uh oh!

dbtsai commented Dec 3, 2014

Uh oh!

avulanov commented Dec 6, 2014

Uh oh!

dbtsai commented Dec 8, 2014

Uh oh!

jkbradley commented Dec 8, 2014

Uh oh!

avulanov commented Dec 9, 2014

Uh oh!

avulanov commented Dec 9, 2014

Uh oh!

dbtsai commented Dec 9, 2014

Uh oh!

avulanov commented Dec 10, 2014

Uh oh!

dbtsai commented Dec 10, 2014

Uh oh!

avulanov commented Dec 17, 2014

Uh oh!

dbtsai commented Dec 19, 2014

Uh oh!

dbtsai commented Dec 20, 2014

Uh oh!

avulanov commented Dec 20, 2014

Uh oh!

dbtsai commented Dec 20, 2014

Uh oh!

avulanov commented Dec 20, 2014

Uh oh!

dbtsai commented Dec 20, 2014

Uh oh!

avulanov commented Dec 22, 2014

Uh oh!

avulanov commented Dec 23, 2014

Uh oh!

dbtsai commented Dec 24, 2014

Uh oh!