Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
fb74645
Initial commit and skeleton for NonlinearMinimizer
Jan 30, 2015
a7ee059
Merge branch 'qp' of https://github.com/debasish83/breeze into nlqp
Jan 31, 2015
679bb5f
Skeleton for approximate eigen value calculation
Feb 2, 2015
3d80b31
Copyright message for NonlinearMinimizer
Feb 2, 2015
3781e37
merge with qp branch; NOTICE file updated
Feb 3, 2015
536886d
Initial checkin for PowerMethod and PowerMethodTest;Eigen value extra…
Feb 3, 2015
f98bd80
Compilation fixes to LBFGS eigenMin and eigenMax
Feb 4, 2015
ae795b6
Power Method merged; NonlinearMinimizer now supports preserving histo…
Feb 11, 2015
51b4224
Generating PQN.CompactHessian from BFGS.ApproximateInverseHessian not…
Feb 11, 2015
ee697bf
Linear Regression formulation added for comparisons
Feb 11, 2015
ce8638f
Fixed LBFGS.maxEigen using power law on CompactHessian
Feb 12, 2015
bbc3edd
Merge branch 'qp' of https://github.com/debasish83/breeze into nlqp
Feb 17, 2015
f85ff86
Merge branch 'qp' of https://github.com/debasish83/breeze into nlqp
Feb 22, 2015
e3a61a9
Added a proximal interface to ProjectQuasiNewton solver; Added projec…
Feb 23, 2015
928de32
probability simplex benchmark
Feb 24, 2015
91f2e17
After experimentation NonlinearMinimizer now users PQN/OWLQN and supp…
Feb 28, 2015
33d28ff
Add testcases for Least square variants
Mar 1, 2015
6cba897
merge with upstream
Mar 1, 2015
9bef354
I dunno.
dlwh Mar 1, 2015
18c7789
PQN fixes from David's fix_pqn branch; added strong wolfe line search…
Mar 2, 2015
43794c0
Unused import from FirstOrderMinimizer; PQN migrated to Strong Wolfe …
Mar 5, 2015
e2c1db8
Used BacktrackingLineSearch in SPG and PQN; Updated NonlinearMinimize…
Mar 5, 2015
defaff5
NonlinearMinimizer println changed to nl from pqn
Mar 5, 2015
610027f
Updated with cforRange in proximal operations
Mar 7, 2015
8c6a6c8
BacktrackingLineSearch takes an initfval;OWLQN, PQN and SPG updated t…
Mar 7, 2015
b4d86e8
Merge branch 'master' of https://github.com/scalanlp/breeze into nlqp
Mar 7, 2015
3a6fc97
infiniteIteration API in FirstOrderMinimizer takes initialState;PQN b…
Mar 11, 2015
8533ada
migrate LBFGS Eigen calculation to https://github.com/debasish83/bree…
Mar 11, 2015
a0bbd33
cleaned up minEigen call from QuadraticMinimizer
Mar 11, 2015
40a45a8
NonlinearMinimizer inner iterations through BFGS cleaned
Mar 12, 2015
7308c7a
Updated contributions in README.md
Mar 12, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Breeze is distributed under an Apache License V2.0 (See LICENSE)

===============================================================================

Proximal algorithms outlined in Proximal.scala (package breeze.optimize.quadratic)
Proximal algorithms outlined in Proximal.scala (package breeze.optimize.proximal)
are based on https://github.com/cvxgrp/proximal (see LICENSE for details) and distributed with
Copyright (c) 2014 by Debasish Das (Verizon), all rights reserved.

Expand All @@ -11,3 +11,7 @@ Copyright (c) 2014 by Debasish Das (Verizon), all rights reserved.
QuadraticMinimizer class in package breeze.optimize.proximal is distributed with Copyright (c)
2014, Debasish Das (Verizon), all rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, all rights reserved has no legal meaning in modern copyright law.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PQN can do simplex projection but it does not separate f(x) and g(z)....what we really want to do is to solve f (x) through a blocked cyclic coordinate descent using bfgs and satisfy g (z) through proximal operator...that's our version of parameter server (distributed solver)...I am still thinking if we can replace coordinate descent with distributed bfgs by using some tricks...if I use owlqn or pqn I have to change code in all of them...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I looked at the paper...that's the same algorithm I am implementing....it is a proximal algorithm....may be we can plugin to pqn as well....i dont know pqn that well...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the PQN paper and it is possible to stick all the proximal operators inside PQN...I have tested bounds with PQN already and it works well compared to default ADMM.

If it holds for ProximalL1() compared to OWLQN then I am good to use PQN as the core of NonlinearMinimizer...PQN right now accepts closure of form DenseVector[Double] => DenseVector[Double]. Would it be fine if I change the signature to (x: DenseVector[Double], rho: Double) => DenseVector[Double] ?

This is in-tune to generic proximal algorithms:
minimize g(x) + rho||x - v||_{2}^{2}

g(x) can be constraints here: x \in C
g(x) can be L1 here as well through soft-thresholding I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally open to that, but I'm not quite sure how it will be used. PQN
won't manipulate it, right? Why not have your driver specify it in a new
instance of PQN?

-- David

On Tue, Feb 17, 2015 at 12:01 PM, Debasish Das [email protected]
wrote:

In NOTICE
#364 (comment):

@@ -1 +1,17 @@
Breeze is distributed under an Apache License V2.0 (See LICENSE)
+
+===============================================================================
+
+Proximal algorithms outlined in Proximal.scala (package breeze.optimize.quadratic)
+are based on https://github.com/cvxgrp/proximal (see LICENSE for details) and distributed with
+Copyright (c) 2014 by Debasish Das (Verizon), all rights reserved.
+
+===============================================================================
+
+QuadraticMinimizer class in package breeze.optimize.proximal is distributed with Copyright (c)
+2014, Debasish Das (Verizon), all rights reserved.

I read the PQN paper and it is possible to stick all the proximal
operators inside PQN...I have tested bounds with PQN already and it works
well compared to default ADMM.

If it holds for ProximalL1() compared to OWLQN then I am good to use PQN
as the core of NonlinearMinimizer...PQN right now accepts closure of form
DenseVector[Double] => DenseVector[Double]. Would it be fine if I change
the signature to (x: DenseVector[Double], rho: Double) =>
DenseVector[Double] ?

This is in-tune to generic proximal algorithms:
minimize g(x) + rho||x - v||_{2}^{2}

g(x) can be constraints here: x \in C
g(x) can be L1 here as well through soft-thresholding I think


Reply to this email directly or view it on GitHub
https://github.com/scalanlp/breeze/pull/364/files#r24848414.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not an interface like:

trait SeparableDiffFunction[T] {
def apply(x: T): IndexedSeq[(Double, T)]
}

?

maybe with a method to turn it into a normal DiffFunction given an OpAdd
for T?

On Sat, Feb 21, 2015 at 7:53 AM, Debasish Das [email protected]
wrote:

In NOTICE
#364 (comment):

@@ -1 +1,17 @@
Breeze is distributed under an Apache License V2.0 (See LICENSE)
+
+===============================================================================
+
+Proximal algorithms outlined in Proximal.scala (package breeze.optimize.quadratic)
+are based on https://github.com/cvxgrp/proximal (see LICENSE for details) and distributed with
+Copyright (c) 2014 by Debasish Das (Verizon), all rights reserved.
+
+===============================================================================
+
+QuadraticMinimizer class in package breeze.optimize.proximal is distributed with Copyright (c)
+2014, Debasish Das (Verizon), all rights reserved.

@mengxr https://github.com/mengxr
I added a driver which creates new instance of PQN and maps Proximal
operators to Projection operators..I can handle L1 and probability simplex
as well...Will update the code soon...Note that a variant of PQN can do
proximal quasi newton but it's still research-y...Several recent papers
focused on it http://arxiv.org/pdf/1206.1623v13

Basically with PQN as the solver running on each worker, ADMM will do
distributed/multi-core consensus on top of it...For separable functions
(like logistic) it will look like this:
minimize F1(x1) + F2(x2) + ...Fn(xn)
s.t x1 = z, x2 = z, ..., xn = z
Here F1 is composite function which satisfies both f(x) and constraint
g(z).

From Spark's perspective the master only keeps one vector z and all the
solver memory is moved to workers. With 16 GB RAM, we can still do logistic
with 2 x 10^9 x 8 byte = 2B features with theoretical guarantees.

For non-separable function (neural net) it will be more fun...we will come
to it after handling separable functions.
The baseline will be block coordinate descent with PQN on each block until
we come up with something better (Gauss Sidel Iterations)
http://opt.kyb.tuebingen.mpg.de/papers/opt2013_submission_1.pdf

Of course in breeze we are not allowed to use RDD, but does it make sense
to define an interface for ADMM based consensus solvers ? We can discuss in
person as well..


Reply to this email directly or view it on GitHub
https://github.com/scalanlp/breeze/pull/364/files#r25123016.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add the experimental version with RDD in it and you can help defining the clean interface...I feel we will need different interfaces for separable and non-separable functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the PQN driver and the projection operators for L1(x) <= c and ProbabilitySimplex...Somehow in my experiments I am unable to reproduce the results from Figure 1 of PQN paper...OWLQN is first run to extract the parameter lambda_L1(x_) to generate the constraint L1(x) <= c for PQN

runMain breeze.optimize.proximal.NonlinearMinimizer 500 0.4 0.99

Elastic Net with L1 and L2 regularization.

Linear Regression:
Issues:

  1. Max Iter-ed at 500
  2. Worse than OWLQN and naive ADMM in Objective

owlqn 678.072 ms iters 173
pqnSparseTime 30600.365 ms iters 500
owlqnObj -145.38669700980395 pqnSparseObj -135.15057488775597

Logistic Regression:
Cons:

  1. Runtime higher than OWLQN
  2. Worse Objective than OWLQN and naive ADMM

owlqn 187.894 ms iters 74 pqn 758.073 ms iters 28
objective owlqnLogistic 52.32713379333781 pqnLogistic 81.37098398138012

I am debugging the code further but any pointers will be great. I don't think this is the expected behavior from PQN on L1/ProbabilitySimplex constraint as per paper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random question: does it seem to be line searching a lot (relative to
OWLQN, or runs with a box constraint?)

On Mon, Feb 23, 2015 at 2:09 PM, Debasish Das [email protected]
wrote:

In NOTICE
#364 (comment):

@@ -1 +1,17 @@
Breeze is distributed under an Apache License V2.0 (See LICENSE)
+
+===============================================================================
+
+Proximal algorithms outlined in Proximal.scala (package breeze.optimize.quadratic)
+are based on https://github.com/cvxgrp/proximal (see LICENSE for details) and distributed with
+Copyright (c) 2014 by Debasish Das (Verizon), all rights reserved.
+
+===============================================================================
+
+QuadraticMinimizer class in package breeze.optimize.proximal is distributed with Copyright (c)
+2014, Debasish Das (Verizon), all rights reserved.

I added the PQN driver and the projection operators for L1(x) <= c and
ProbabilitySimplex...Somehow in my experiments I am unable to reproduce the
results from Figure 1 of PQN paper...OWLQN is first run to extract the
parameter lambda_L1(x_) to generate the constraint L1(x) <= c for PQN

runMain breeze.optimize.proximal.NonlinearMinimizer 500 0.4 0.99

Elastic Net with L1 and L2 regularization.

Linear Regression:
Issues:

  1. Max Iter-ed at 500
  2. Worse than OWLQN and naive ADMM in Objective

owlqn 678.072 ms iters 173
pqnSparseTime 30600.365 ms iters 500
owlqnObj -145.38669700980395 pqnSparseObj -135.15057488775597

Logistic Regression:
Cons:

  1. Runtime higher than OWLQN
  2. Worse Objective than OWLQN and naive ADMM

owlqn 187.894 ms iters 74 pqn 758.073 ms iters 28
objective owlqnLogistic 52.32713379333781 pqnLogistic 81.37098398138012

I am debugging the code further but any pointers will be great. I don't
think this is the expected behavior from PQN on L1/ProbabilitySimplex
constraint as per paper.


Reply to this email directly or view it on GitHub
https://github.com/scalanlp/breeze/pull/364/files#r25207778.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my L1 projection has bugs in it...L1(x) is built using ProbabilitySimplex and PQN is beating naive ADMM and at par with QuadraticMinimizer for ProbabilitySimplex.

I will take a closer look into L1 projection for bugs but for ADMM based multicore/distributed consensus I will most likely choose OWLQN for elastic net and PQN for other constraints.

Here are the results with ProbabilitySimplex:

minimize f(x) s.t 1'x = c, x >= 0, c = 1

Linear Regression with ProbabilitySimplex, f(x) = ||Ax - b||^2

Objective pqn 85.34613832959954 nl 84.95320179604967 qp 85.33114863196366
Constraint pqn 0.9999999999999997 nl 1.001707509961072 qp 1.000000000000004
time pqn 150.552 ms nl 15847.125 ms qp 96.105 ms

Logistic Regression with ProbabilitySimplex, f(x) logistic loss from mllib

Objective pqn 257.6058563777358 nl 257.4025971846134
Constraint pqn 0.9999999999999998 nl 1.0007230450802203
time pqn 94.19 ms nl 25160.317 ms


===============================================================================

NonlinearMinimizer class in package breeze.optimize.proximal is distributed with Copyright (c)
2015, Debasish Das (Verizon), all rights reserved.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ Contributions from:
* Chris Stucchio (@stucchio)
* Xiangrui Meng (@mengxr)
* Gabriel Schubiner (@gabeos)

* Debasish Das (@debasish83)


And others (contact David Hall if you've contributed code and aren't listed).
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ trait DenseMatrixMultiplyStuff extends DenseMatrixOps
// square: LUSolve
val X = DenseMatrix.zeros[Double](V.rows, V.cols)
X := V
LUSolve(X,A)
LUSolve(X, A)
X
} else {
// non-square: QRSolve
Expand All @@ -137,15 +137,13 @@ trait DenseMatrixMultiplyStuff extends DenseMatrixOps
/** X := A \ X, for square A */
def LUSolve(X: DenseMatrix[Double], A: DenseMatrix[Double]): DenseMatrix[Double] = {

require(X.offset == 0)
require(A.offset == 0)
val piv = new Array[Int](A.rows)
val newA = A.copy
assert(!newA.isTranspose)

val info: Int = {
val info = new intW(0)
lapack.dgesv(A.rows, X.cols, newA.data, newA.majorStride, piv, X.data, X.majorStride, info)
lapack.dgesv(A.rows, X.cols, newA.data, newA.offset, newA.majorStride, piv, 0, X.data, X.offset, X.majorStride, info)
info.`val`
}

Expand Down
16 changes: 8 additions & 8 deletions math/src/main/scala/breeze/linalg/support/CanMapValues.scala
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,14 @@ trait CanMapValuesLowPrio {
object CanMapValues extends CanMapValuesLowPrio {
class HandHold[From, ValueType]

/*
implicit def canMapSelf[V, V2]: CanMapValues[V, V, V2, V2] = {
new CanMapValues[V, V, V2, V2] {
def map(from: V, fn: (V) => V2) = fn(from)
def mapActive(from: V, fn: (V) => V2) = fn(from)
}
}
*/
implicit def canMapSelfDouble[V2]: CanMapValues[Double, Double, V2, V2] = canMapSelf[Double, V2]
implicit def canMapSelfInt[V2]: CanMapValues[Int, Int, V2, V2] = canMapSelf[Int, V2]
implicit def canMapSelfFloat[V2]: CanMapValues[Float, Float, V2, V2] = canMapSelf[Float, V2]
implicit def canMapSelfLong[V2]: CanMapValues[Long, Long, V2, V2] = canMapSelf[Long, V2]
implicit def canMapSelfShort[V2]: CanMapValues[Short, Short, V2, V2] = canMapSelf[Short, V2]
implicit def canMapSelfByte[V2]: CanMapValues[Byte, Byte, V2, V2] = canMapSelf[Byte, V2]
implicit def canMapSelfChar[V2]: CanMapValues[Char, Char, V2, V2] = canMapSelf[Char, V2]


type Op[From, A, B, To] = CanMapValues[From, A, B, To]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ package breeze.optimize
*
* @author dlwh
*/
class BacktrackingLineSearch(maxIterations: Int = 20,
class BacktrackingLineSearch(initfval: Double,
maxIterations: Int = 20,
shrinkStep: Double = 0.5,
growStep: Double = 2.1,
cArmijo: Double = 1E-4,
Expand All @@ -24,7 +25,8 @@ class BacktrackingLineSearch(maxIterations: Int = 20,
require(cWolfe < 1.0)
def iterations(f: DiffFunction[Double], init: Double = 1.0): Iterator[State] = {
val (f0, df0) = f.calculate(0.0)
val (initfval, initfderiv) = f.calculate(init)
val initfderiv = f.calculate(init)._2
//val (initfval, initfderiv) = f.calculate(init)
Iterator.iterate( (State(init, initfval, initfderiv), false, 0)) { case (state@State(alpha, fval, fderiv), _, iter) =>
val multiplier = if(fval > f0 + alpha * df0 * cArmijo) {
shrinkStep
Expand Down
10 changes: 6 additions & 4 deletions math/src/main/scala/breeze/optimize/FirstOrderMinimizer.scala
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
package breeze.optimize

import breeze.linalg.norm
import breeze.math.{MutableEnumeratedCoordinateField, MutableCoordinateField, MutableFiniteCoordinateField, NormedModule}
import breeze.math.{MutableEnumeratedCoordinateField, MutableFiniteCoordinateField, NormedModule}
import breeze.optimize.FirstOrderMinimizer.ConvergenceReason
import breeze.stats.distributions.{RandBasis, ThreadLocalRandomGenerator}
import breeze.util.Implicits._
Expand Down Expand Up @@ -108,10 +108,11 @@ abstract class FirstOrderMinimizer[T, DF<:StochasticDiffFunction[T]](maxIter: In
f.calculate(x)
}

def infiniteIterations(f: DF, init: T): Iterator[State] = {
def infiniteIterations(f: DF, state: State): Iterator[State] = {
var failedOnce = false
val adjustedFun = adjustFunction(f)
Iterator.iterate(initialState(adjustedFun,init)) { state => try {

Iterator.iterate(state) { state => try {
val dir = chooseDescentDirection(state, adjustedFun)
val stepSize = determineStepSize(state, adjustedFun, dir)
logger.info(f"Step Size: $stepSize%.4g")
Expand Down Expand Up @@ -141,7 +142,8 @@ abstract class FirstOrderMinimizer[T, DF<:StochasticDiffFunction[T]](maxIter: In
}

def iterations(f: DF, init: T): Iterator[State] = {
infiniteIterations(f, init).takeUpToWhere(_.converged)
val adjustedFun = adjustFunction(f)
infiniteIterations(f, initialState(adjustedFun, init)).takeUpToWhere(_.converged)
}

def minimize(f: DF, init: T): T = {
Expand Down
8 changes: 2 additions & 6 deletions math/src/main/scala/breeze/optimize/LBFGS.scala
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ package breeze.optimize
import breeze.linalg._
import breeze.linalg.operators.OpMulMatrix
import breeze.math.MutableInnerProductModule
import breeze.optimize.linear.PowerMethod
import breeze.util.SerializableLogging


/**
* Port of LBFGS to Scala.
*
*
* Special note for LBFGS:
* If you use it in published work, you must cite one of:
* * J. Nocedal. Updating Quasi-Newton Matrices with Limited Storage
Expand Down Expand Up @@ -80,7 +80,6 @@ class LBFGS[T](maxIter: Int = -1, m: Int=10, tolerance: Double=1E-9)
throw new StepSizeUnderflow
alpha
}

}

object LBFGS {
Expand Down Expand Up @@ -140,13 +139,10 @@ object LBFGS {
}
}


implicit def multiplyInverseHessian[T](implicit vspace: MutableInnerProductModule[T, Double]):OpMulMatrix.Impl2[ApproximateInverseHessian[T], T, T] = {
new OpMulMatrix.Impl2[ApproximateInverseHessian[T], T, T] {
def apply(a: ApproximateInverseHessian[T], b: T): T = a * b
}

}

}

2 changes: 1 addition & 1 deletion math/src/main/scala/breeze/optimize/OWLQN.scala
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ class OWLQN[K, T](maxIter: Int, m: Int, l1reg: K => Double, tolerance: Double)(i
adjv -> (adjgrad dot dir)
}
}
val search = new BacktrackingLineSearch(shrinkStep= if(iter < 1) 0.1 else 0.5)
val search = new BacktrackingLineSearch(state.value, shrinkStep= if(iter < 1) 0.1 else 0.5)
val alpha = search.minimize(ff, if(iter < 1) .5/norm(state.grad) else 1.0)

alpha
Expand Down
112 changes: 55 additions & 57 deletions math/src/main/scala/breeze/optimize/ProjectedQuasiNewton.scala
Original file line number Diff line number Diff line change
@@ -1,8 +1,24 @@
package breeze.optimize

/*
Copyright 2015 David Hall, Debasish Das

Licensed under the Apache License, Version 2.0 (the "License")
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

import breeze.linalg._
import breeze.collection.mutable.RingBuffer
import breeze.math.{MutableInnerProductModule, MutableVectorField}
import breeze.math.MutableInnerProductModule
import breeze.util.SerializableLogging

// Compact representation of an n x n Hessian, maintained via L-BFGS updates
Expand Down Expand Up @@ -66,32 +82,32 @@ class ProjectedQuasiNewton(tolerance: Double = 1e-6,
val m: Int = 10,
val initFeas: Boolean = false,
val testOpt: Boolean = true,
val maxNumIt: Int = 500,
maxIter: Int = -1,
val maxSrchIt: Int = 50,
val gamma: Double = 1e-4,
val projection: DenseVector[Double] => DenseVector[Double] = identity)
(implicit space: MutableInnerProductModule[DenseVector[Double],Double])
extends FirstOrderMinimizer[DenseVector[Double], DiffFunction[DenseVector[Double]]](maxIter = maxNumIt, tolerance = tolerance) with Projecting[DenseVector[Double]] with SerializableLogging {
val innerOptimizer = new SpectralProjectedGradient[DenseVector[Double], DiffFunction[DenseVector[Double]]](
testOpt = true,
extends FirstOrderMinimizer[DenseVector[Double], DiffFunction[DenseVector[Double]]](maxIter = maxIter, tolerance = tolerance) with Projecting[DenseVector[Double]] with SerializableLogging {
type BDV = DenseVector[Double]

val innerOptimizer = new SpectralProjectedGradient[BDV](
tolerance = tolerance,
maxIter = 500,
maxIter = 50,
bbMemory = 5,
initFeas = true,
minImprovementWindow = 10,
projection = projection
)

type History = CompactHessian


protected def initialHistory(f: DiffFunction[DenseVector[Double]], init: DenseVector[Double]): History = {
new CompactHessian(m)
}

override protected def adjust(newX: DenseVector[Double], newGrad: DenseVector[Double], newVal: Double):(Double,DenseVector[Double]) = (newVal,-projectedVector(newX, -newGrad))
override protected def adjust(newX: DenseVector[Double], newGrad: DenseVector[Double], newVal: Double):(Double,DenseVector[Double]) = (newVal,projectedVector(newX, -newGrad))

private def computeGradient(x: DenseVector[Double], g: DenseVector[Double]): DenseVector[Double] = projectedVector(x, -g)
private def computeGradientNorm(x: DenseVector[Double], g: DenseVector[Double]): Double = norm(computeGradient(x, g),Double.PositiveInfinity)

protected def chooseDescentDirection(state: State, fn: DiffFunction[DenseVector[Double]]): DenseVector[Double] = {
import state._
Expand All @@ -101,72 +117,54 @@ class ProjectedQuasiNewton(tolerance: Double = 1e-6,
// Update the limited-memory BFGS approximation to the Hessian
//B.update(y, s)
// Solve subproblem; we use the current iterate x as a guess
val subprob = new ProjectedQuasiNewton.QuadraticSubproblem(fn, state.adjustedValue, x, grad, history)
val p = innerOptimizer.minimize(new CachedDiffFunction(subprob), x)
p - x
val subprob = new ProjectedQuasiNewton.QuadraticSubproblem(state.adjustedValue, x, grad, history)
val spgResult = innerOptimizer.minimizeAndReturnState(new CachedDiffFunction(subprob), x)
logger.info(f"ProjectedQuasiNewton: outerIter ${state.iter} innerIters ${spgResult.iter}")
spgResult.x - x
// time += subprob.time
}
}


protected def determineStepSize(state: State, fn: DiffFunction[DenseVector[Double]], dir: DenseVector[Double]): Double = {
if (state.iter == 0)
return scala.math.min(1.0, 1.0 / norm(state.grad,1.0))
val dirnorm = norm(dir, Double.PositiveInfinity)
if(dirnorm < 1E-10) return 0.0
import state._
// Backtracking line-search
var accepted = false
var lambda = 1.0
val gTd = grad dot dir
var srchit = 0

do {
val candx = x + dir * lambda
val candf = fn.valueAt(candx)
val suffdec = gamma * lambda * gTd

if (testOpt && srchit > 0) {
logger.debug(f"PQN: SrchIt $srchit%4d: f $candf%-10.4f t $lambda%-10.4f\n")
}

if (candf < state.adjustedValue + suffdec) {
accepted = true
} else if (srchit >= maxSrchIt) {
accepted = true
} else {
lambda *= 0.5
srchit = srchit + 1
}
} while (!accepted)

if (srchit >= maxSrchIt) {
logger.info("PQN: Line search cannot make further progress")
throw new LineSearchFailed(norm(state.grad,Double.PositiveInfinity), norm(dir, Double.PositiveInfinity))
}
lambda
/**
* Given a direction, perform a Strong Wolfe Line Search
*
* TO DO: Compare performance with Cubic Interpolation based line search from Mark's PQN paper
*
* @param state the current state
* @param f The objective
* @param dir The step direction
* @return stepSize
*/
protected def determineStepSize(state: State, f: DiffFunction[DenseVector[Double]], dir: DenseVector[Double]) = {
val x = state.x
val grad = state.grad

val ff = LineSearch.functionFromSearchDirection(f, x, dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not need to project inside the line search?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matlab code and the paper did not project inside PQN line search...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. Right, alpha is probably always <= 1, so it's safe.

val search = new BacktrackingLineSearch(state.value, maxIterations = maxSrchIt, shrinkStep= if(state.iter < 1) 0.1 else 0.5)
var alpha = if(state.iter == 0.0) min(1.0, 1.0/norm(dir)) else 1.0
alpha = search.minimize(ff, alpha)

if(alpha * norm(grad) < 1E-10) throw new StepSizeUnderflow

alpha
}


protected def takeStep(state: State, dir: DenseVector[Double], stepSize: Double): DenseVector[Double] = {
projection(state.x + dir * stepSize)
}


protected def updateHistory(newX: DenseVector[Double], newGrad: DenseVector[Double], newVal: Double, f: DiffFunction[DenseVector[Double]], oldState: State): History = {
import oldState._
val s = newX - oldState.x
val y = newGrad - oldState.grad
val s = newX - x
val y = newGrad - grad
oldState.history.updated(y, s)
}

}

object ProjectedQuasiNewton {
object ProjectedQuasiNewton extends SerializableLogging {
// Forms a quadratic model around fun, the argmin of which is then a feasible
// quasi-Newton descent direction
class QuadraticSubproblem(fun: DiffFunction[DenseVector[Double]],
fk: Double,
class QuadraticSubproblem(fk: Double,
xk: DenseVector[Double],
gk: DenseVector[Double],
B: CompactHessian) extends DiffFunction[DenseVector[Double]] {
Expand Down
2 changes: 1 addition & 1 deletion math/src/main/scala/breeze/optimize/Projecting.scala
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package breeze.optimize

import breeze.math.{Module, NormedVectorSpace}
import breeze.math.{Module}

trait Projecting[T] {
def projection: T => T
Expand Down
Loading