Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
fb74645
Initial commit and skeleton for NonlinearMinimizer
Jan 30, 2015
a7ee059
Merge branch 'qp' of https://github.com/debasish83/breeze into nlqp
Jan 31, 2015
679bb5f
Skeleton for approximate eigen value calculation
Feb 2, 2015
3d80b31
Copyright message for NonlinearMinimizer
Feb 2, 2015
3781e37
merge with qp branch; NOTICE file updated
Feb 3, 2015
536886d
Initial checkin for PowerMethod and PowerMethodTest;Eigen value extra…
Feb 3, 2015
f98bd80
Compilation fixes to LBFGS eigenMin and eigenMax
Feb 4, 2015
ae795b6
Power Method merged; NonlinearMinimizer now supports preserving histo…
Feb 11, 2015
51b4224
Generating PQN.CompactHessian from BFGS.ApproximateInverseHessian not…
Feb 11, 2015
ee697bf
Linear Regression formulation added for comparisons
Feb 11, 2015
ce8638f
Fixed LBFGS.maxEigen using power law on CompactHessian
Feb 12, 2015
bbc3edd
Merge branch 'qp' of https://github.com/debasish83/breeze into nlqp
Feb 17, 2015
f85ff86
Merge branch 'qp' of https://github.com/debasish83/breeze into nlqp
Feb 22, 2015
e3a61a9
Added a proximal interface to ProjectQuasiNewton solver; Added projec…
Feb 23, 2015
928de32
probability simplex benchmark
Feb 24, 2015
91f2e17
After experimentation NonlinearMinimizer now users PQN/OWLQN and supp…
Feb 28, 2015
33d28ff
Add testcases for Least square variants
Mar 1, 2015
6cba897
merge with upstream
Mar 1, 2015
9bef354
I dunno.
dlwh Mar 1, 2015
18c7789
PQN fixes from David's fix_pqn branch; added strong wolfe line search…
Mar 2, 2015
43794c0
Unused import from FirstOrderMinimizer; PQN migrated to Strong Wolfe …
Mar 5, 2015
e2c1db8
Used BacktrackingLineSearch in SPG and PQN; Updated NonlinearMinimize…
Mar 5, 2015
defaff5
NonlinearMinimizer println changed to nl from pqn
Mar 5, 2015
610027f
Updated with cforRange in proximal operations
Mar 7, 2015
8c6a6c8
BacktrackingLineSearch takes an initfval;OWLQN, PQN and SPG updated t…
Mar 7, 2015
b4d86e8
Merge branch 'master' of https://github.com/scalanlp/breeze into nlqp
Mar 7, 2015
3a6fc97
infiniteIteration API in FirstOrderMinimizer takes initialState;PQN b…
Mar 11, 2015
8533ada
migrate LBFGS Eigen calculation to https://github.com/debasish83/bree…
Mar 11, 2015
a0bbd33
cleaned up minEigen call from QuadraticMinimizer
Mar 11, 2015
40a45a8
NonlinearMinimizer inner iterations through BFGS cleaned
Mar 12, 2015
7308c7a
Updated contributions in README.md
Mar 12, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
merge with qp branch; NOTICE file updated
  • Loading branch information
Debasish Das committed Feb 3, 2015
commit 3781e3741e6aa7e9515f16b5dae7a89d98a08123
6 changes: 5 additions & 1 deletion NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ Copyright (c) 2014 by Debasish Das (Verizon), all rights reserved.

===============================================================================

QuadraticMinimizer class in package breeze.optimize.proximal is distributed with Copyright (c)
2014, Debasish Das (Verizon), all rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, all rights reserved has no legal meaning in modern copyright law.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PQN can do simplex projection but it does not separate f(x) and g(z)....what we really want to do is to solve f (x) through a blocked cyclic coordinate descent using bfgs and satisfy g (z) through proximal operator...that's our version of parameter server (distributed solver)...I am still thinking if we can replace coordinate descent with distributed bfgs by using some tricks...if I use owlqn or pqn I have to change code in all of them...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I looked at the paper...that's the same algorithm I am implementing....it is a proximal algorithm....may be we can plugin to pqn as well....i dont know pqn that well...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the PQN paper and it is possible to stick all the proximal operators inside PQN...I have tested bounds with PQN already and it works well compared to default ADMM.

If it holds for ProximalL1() compared to OWLQN then I am good to use PQN as the core of NonlinearMinimizer...PQN right now accepts closure of form DenseVector[Double] => DenseVector[Double]. Would it be fine if I change the signature to (x: DenseVector[Double], rho: Double) => DenseVector[Double] ?

This is in-tune to generic proximal algorithms:
minimize g(x) + rho||x - v||_{2}^{2}

g(x) can be constraints here: x \in C
g(x) can be L1 here as well through soft-thresholding I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally open to that, but I'm not quite sure how it will be used. PQN
won't manipulate it, right? Why not have your driver specify it in a new
instance of PQN?

-- David

On Tue, Feb 17, 2015 at 12:01 PM, Debasish Das [email protected]
wrote:

In NOTICE
#364 (comment):

@@ -1 +1,17 @@
Breeze is distributed under an Apache License V2.0 (See LICENSE)
+
+===============================================================================
+
+Proximal algorithms outlined in Proximal.scala (package breeze.optimize.quadratic)
+are based on https://github.com/cvxgrp/proximal (see LICENSE for details) and distributed with
+Copyright (c) 2014 by Debasish Das (Verizon), all rights reserved.
+
+===============================================================================
+
+QuadraticMinimizer class in package breeze.optimize.proximal is distributed with Copyright (c)
+2014, Debasish Das (Verizon), all rights reserved.

I read the PQN paper and it is possible to stick all the proximal
operators inside PQN...I have tested bounds with PQN already and it works
well compared to default ADMM.

If it holds for ProximalL1() compared to OWLQN then I am good to use PQN
as the core of NonlinearMinimizer...PQN right now accepts closure of form
DenseVector[Double] => DenseVector[Double]. Would it be fine if I change
the signature to (x: DenseVector[Double], rho: Double) =>
DenseVector[Double] ?

This is in-tune to generic proximal algorithms:
minimize g(x) + rho||x - v||_{2}^{2}

g(x) can be constraints here: x \in C
g(x) can be L1 here as well through soft-thresholding I think


Reply to this email directly or view it on GitHub
https://github.com/scalanlp/breeze/pull/364/files#r24848414.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not an interface like:

trait SeparableDiffFunction[T] {
def apply(x: T): IndexedSeq[(Double, T)]
}

?

maybe with a method to turn it into a normal DiffFunction given an OpAdd
for T?

On Sat, Feb 21, 2015 at 7:53 AM, Debasish Das [email protected]
wrote:

In NOTICE
#364 (comment):

@@ -1 +1,17 @@
Breeze is distributed under an Apache License V2.0 (See LICENSE)
+
+===============================================================================
+
+Proximal algorithms outlined in Proximal.scala (package breeze.optimize.quadratic)
+are based on https://github.com/cvxgrp/proximal (see LICENSE for details) and distributed with
+Copyright (c) 2014 by Debasish Das (Verizon), all rights reserved.
+
+===============================================================================
+
+QuadraticMinimizer class in package breeze.optimize.proximal is distributed with Copyright (c)
+2014, Debasish Das (Verizon), all rights reserved.

@mengxr https://github.com/mengxr
I added a driver which creates new instance of PQN and maps Proximal
operators to Projection operators..I can handle L1 and probability simplex
as well...Will update the code soon...Note that a variant of PQN can do
proximal quasi newton but it's still research-y...Several recent papers
focused on it http://arxiv.org/pdf/1206.1623v13

Basically with PQN as the solver running on each worker, ADMM will do
distributed/multi-core consensus on top of it...For separable functions
(like logistic) it will look like this:
minimize F1(x1) + F2(x2) + ...Fn(xn)
s.t x1 = z, x2 = z, ..., xn = z
Here F1 is composite function which satisfies both f(x) and constraint
g(z).

From Spark's perspective the master only keeps one vector z and all the
solver memory is moved to workers. With 16 GB RAM, we can still do logistic
with 2 x 10^9 x 8 byte = 2B features with theoretical guarantees.

For non-separable function (neural net) it will be more fun...we will come
to it after handling separable functions.
The baseline will be block coordinate descent with PQN on each block until
we come up with something better (Gauss Sidel Iterations)
http://opt.kyb.tuebingen.mpg.de/papers/opt2013_submission_1.pdf

Of course in breeze we are not allowed to use RDD, but does it make sense
to define an interface for ADMM based consensus solvers ? We can discuss in
person as well..


Reply to this email directly or view it on GitHub
https://github.com/scalanlp/breeze/pull/364/files#r25123016.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add the experimental version with RDD in it and you can help defining the clean interface...I feel we will need different interfaces for separable and non-separable functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the PQN driver and the projection operators for L1(x) <= c and ProbabilitySimplex...Somehow in my experiments I am unable to reproduce the results from Figure 1 of PQN paper...OWLQN is first run to extract the parameter lambda_L1(x_) to generate the constraint L1(x) <= c for PQN

runMain breeze.optimize.proximal.NonlinearMinimizer 500 0.4 0.99

Elastic Net with L1 and L2 regularization.

Linear Regression:
Issues:

  1. Max Iter-ed at 500
  2. Worse than OWLQN and naive ADMM in Objective

owlqn 678.072 ms iters 173
pqnSparseTime 30600.365 ms iters 500
owlqnObj -145.38669700980395 pqnSparseObj -135.15057488775597

Logistic Regression:
Cons:

  1. Runtime higher than OWLQN
  2. Worse Objective than OWLQN and naive ADMM

owlqn 187.894 ms iters 74 pqn 758.073 ms iters 28
objective owlqnLogistic 52.32713379333781 pqnLogistic 81.37098398138012

I am debugging the code further but any pointers will be great. I don't think this is the expected behavior from PQN on L1/ProbabilitySimplex constraint as per paper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random question: does it seem to be line searching a lot (relative to
OWLQN, or runs with a box constraint?)

On Mon, Feb 23, 2015 at 2:09 PM, Debasish Das [email protected]
wrote:

In NOTICE
#364 (comment):

@@ -1 +1,17 @@
Breeze is distributed under an Apache License V2.0 (See LICENSE)
+
+===============================================================================
+
+Proximal algorithms outlined in Proximal.scala (package breeze.optimize.quadratic)
+are based on https://github.com/cvxgrp/proximal (see LICENSE for details) and distributed with
+Copyright (c) 2014 by Debasish Das (Verizon), all rights reserved.
+
+===============================================================================
+
+QuadraticMinimizer class in package breeze.optimize.proximal is distributed with Copyright (c)
+2014, Debasish Das (Verizon), all rights reserved.

I added the PQN driver and the projection operators for L1(x) <= c and
ProbabilitySimplex...Somehow in my experiments I am unable to reproduce the
results from Figure 1 of PQN paper...OWLQN is first run to extract the
parameter lambda_L1(x_) to generate the constraint L1(x) <= c for PQN

runMain breeze.optimize.proximal.NonlinearMinimizer 500 0.4 0.99

Elastic Net with L1 and L2 regularization.

Linear Regression:
Issues:

  1. Max Iter-ed at 500
  2. Worse than OWLQN and naive ADMM in Objective

owlqn 678.072 ms iters 173
pqnSparseTime 30600.365 ms iters 500
owlqnObj -145.38669700980395 pqnSparseObj -135.15057488775597

Logistic Regression:
Cons:

  1. Runtime higher than OWLQN
  2. Worse Objective than OWLQN and naive ADMM

owlqn 187.894 ms iters 74 pqn 758.073 ms iters 28
objective owlqnLogistic 52.32713379333781 pqnLogistic 81.37098398138012

I am debugging the code further but any pointers will be great. I don't
think this is the expected behavior from PQN on L1/ProbabilitySimplex
constraint as per paper.


Reply to this email directly or view it on GitHub
https://github.com/scalanlp/breeze/pull/364/files#r25207778.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my L1 projection has bugs in it...L1(x) is built using ProbabilitySimplex and PQN is beating naive ADMM and at par with QuadraticMinimizer for ProbabilitySimplex.

I will take a closer look into L1 projection for bugs but for ADMM based multicore/distributed consensus I will most likely choose OWLQN for elastic net and PQN for other constraints.

Here are the results with ProbabilitySimplex:

minimize f(x) s.t 1'x = c, x >= 0, c = 1

Linear Regression with ProbabilitySimplex, f(x) = ||Ax - b||^2

Objective pqn 85.34613832959954 nl 84.95320179604967 qp 85.33114863196366
Constraint pqn 0.9999999999999997 nl 1.001707509961072 qp 1.000000000000004
time pqn 150.552 ms nl 15847.125 ms qp 96.105 ms

Logistic Regression with ProbabilitySimplex, f(x) logistic loss from mllib

Objective pqn 257.6058563777358 nl 257.4025971846134
Constraint pqn 0.9999999999999998 nl 1.0007230450802203
time pqn 94.19 ms nl 25160.317 ms


===============================================================================

NonlinearMinimizer class in package breeze.optimize.quadratic is distributed with Copyright (c)
2015, Debasish Das (Verizon), all rights reserved.

Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@
* limitations under the License.
*/

package breeze.optimize.quadratic
package breeze.optimize.linear

import breeze.linalg.DenseVector
import breeze.linalg.DenseMatrix
import breeze.linalg.axpy
import breeze.linalg.{DenseMatrix, DenseVector, axpy}
import breeze.optimize.proximal.QpGenerator
import breeze.stats.distributions.Rand
import breeze.util.Implicits._
/**
* Object used to solve nonnegative least squares problems using a modified
* projected gradient method.
Expand All @@ -30,13 +30,14 @@ class NNLS(val maxIters: Int = -1) {
type BDM = DenseMatrix[Double]
type BDV = DenseVector[Double]

case class State(x: BDV, grad: BDV, res: BDV, iterations: Int, converged: Boolean)
case class State(x: BDV, grad: BDV, dir: BDV, lastDir: BDV, res: BDV,
lastNorm: Double, lastWall: Int, iter: Int, converged: Boolean)

// find the optimal unconstrained step
def steplen(ata: BDM, dir: BDV, res: BDV,
tmp: BDV): Double = {
val top = dir.dot(res)
tmp := ata*dir
tmp := ata * dir
// Push the denominator upward very slightly to avoid infinities and silliness
top / (tmp.dot(dir) + 1e-20)
}
Expand All @@ -56,47 +57,45 @@ class NNLS(val maxIters: Int = -1) {
* projected gradient method. That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.
*
* We solve the problem
* min_x 1/2 x' ata x' - x'atb
* subject to x >= 0
* min_x 1/2 x' ata x' - x'atb
* subject to x >= 0
*
* The method used is similar to one described by Polyak (B. T. Polyak, The conjugate gradient
* method in extremal problems, Zh. Vychisl. Mat. Mat. Fiz. 9(4)(1969), pp. 94-112) for bound-
* constrained nonlinear programming. Polyak unconditionally uses a conjugate gradient
* direction, however, while this method only uses a conjugate gradient direction if the last
* iteration did not cause a previously-inactive constraint to become active.
*/
def minimize(ata: DenseMatrix[Double], atb: DenseVector[Double]) : DenseVector[Double] = {
iterations(ata, atb).x
}

def iterations(ata: DenseMatrix[Double], atb: DenseVector[Double]) : State = {
def initialState(ata: DenseMatrix[Double], atb: DenseVector[Double], n: Int): State = {
require(ata.cols == ata.rows, s"NNLS:iterations gram matrix must be symmetric")
require(ata.rows == atb.length, s"NNLS:iterations gram matrix rows must be same as length of linear term")

val n = atb.length

val tmp = DenseVector.zeros[Double](n)
val grad = DenseVector.zeros[Double](n)
val x = DenseVector.zeros[Double](n)
val x = DenseVector.zeros[Double](n)
val dir = DenseVector.zeros[Double](n)
val lastDir = DenseVector.zeros[Double](n)
val res = DenseVector.zeros[Double](n)
val lastNorm = 0.0
val lastWall = 0
State(x, grad, dir, lastDir, res, lastNorm, lastWall, 0, false)
}

val iterMax = if (maxIters < 0) Math.max(400, 20 * n) else maxIters
def iterations(ata: DenseMatrix[Double],
atb: DenseVector[Double]): Iterator[State] =
Iterator.iterate(initialState(ata, atb, atb.length)) { state =>
import state._
val n = atb.length
val tmp = DenseVector.zeros[Double](atb.length)

var lastNorm = 0.0
var iterno = 0
var lastWall = 0 // Last iteration when we hit a bound constraint.
var i = 0
val iterMax = if (maxIters < 0) Math.max(400, 20 * n) else maxIters

while (iterno < iterMax) {
// find the residual
res := ata*x
res := ata * x
res -= atb
grad := res

// project the gradient
i = 0
var i = 0
while (i < n) {
if (grad.data(i) > 0.0 && x.data(i) == 0.0) {
grad.data(i) = 0.0
Expand All @@ -105,13 +104,13 @@ class NNLS(val maxIters: Int = -1) {
}
val ngrad = grad.dot(grad)
dir := grad

// use a CG direction under certain conditions
var step = steplen(ata, grad, res, tmp)
var ndir = 0.0
val nx = x.dot(x)
if (iterno > lastWall + 1) {

if (iter > lastWall + 1) {
val alpha = ngrad / lastNorm
axpy(alpha, lastDir, dir)
val dstep = steplen(ata, dir, res, tmp)
Expand All @@ -128,36 +127,44 @@ class NNLS(val maxIters: Int = -1) {
}

// terminate?
if (stop(step, ndir, nx)) {
return State(x, grad, res, iterno, true)
}

// don't run through the walls
i = 0
while (i < n) {
if (step * dir.data(i) > x.data(i)) {
step = x.data(i) / dir.data(i)
if (stop(step, ndir, nx) || iter > iterMax)
State(x, grad, dir, lastDir, res, lastNorm, lastWall, iter + 1, true)
else {
// don't run through the walls
i = 0
while (i < n) {
if (step * dir.data(i) > x.data(i)) {
step = x.data(i) / dir.data(i)
}
i = i + 1
}
i = i + 1
}

// take the step
i = 0
while (i < n) {
if (step * dir.data(i) > x.data(i) * (1 - 1e-14)) {
x.data(i) = 0
lastWall = iterno
} else {
x.data(i) -= step * dir.data(i)
var nextWall = lastWall

// take the step
i = 0
while (i < n) {
if (step * dir.data(i) > x.data(i) * (1 - 1e-14)) {
x.data(i) = 0
nextWall = iter
} else {
x.data(i) -= step * dir.data(i)
}
i = i + 1
}
i = i + 1
lastDir := dir
val nextNorm = ngrad
State(x, grad, dir, lastDir, res, nextNorm, nextWall, iter + 1, false)
}
}.takeUpToWhere(_.converged)

iterno = iterno + 1
lastDir := dir
lastNorm = ngrad
}
State(x, grad, res, iterno, false)
def minimizeAndReturnState(ata: DenseMatrix[Double],
atb: DenseVector[Double]): State = {
iterations(ata, atb).last
}

def minimize(ata: DenseMatrix[Double], atb: DenseVector[Double]): DenseVector[Double] = {
minimizeAndReturnState(ata, atb).x
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package breeze.optimize.quadratic
package breeze.optimize.proximal

object Constraint extends Enumeration {
type Constraint = Value
val SMOOTH, POSITIVE, BOUNDS, SPARSE, EQUALITY = Value
val SMOOTH, POSITIVE, BOX, SPARSE, EQUALITY = Value
}
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@
* limitations under the License.
*/

package breeze.optimize.quadratic
package breeze.optimize.proximal

import breeze.linalg.{norm, DenseVector}
import breeze.optimize.quadratic.Constraint._
import breeze.optimize.{LBFGS, DiffFunction}
import breeze.linalg.{DenseVector, norm}
import breeze.optimize.{DiffFunction, LBFGS}
import breeze.stats.distributions.Rand

import scala.math._

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
* Library of Proximal Algorithms adapted from https://github.com/cvxgrp/proximal
*/

package breeze.optimize.quadratic
package breeze.optimize.proximal

import scala.math.max
import scala.math.min
Expand Down Expand Up @@ -122,7 +122,7 @@ case class ProximalL1() extends Proximal {
def prox(x: DenseVector[Double], rho: Double) = {
var i = 0
while (i < x.length) {
x.update(i, max(0, x(i) - rho) - max(0, -x(i) - rho))
x.update(i, max(0, x(i) - lambda/rho) - max(0, -x(i) - lambda/rho))
i = i + 1
}
}
Expand Down Expand Up @@ -243,4 +243,4 @@ case class ProximalLp(c: DenseVector[Double]) extends Proximal {
i = i + 1
}
}
}
}
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.