Iurii Marymonskyi's Tech Blog: Comparison of misc optimization methods available in R

Introduction

In this article I would like to describe the results of my comparison of several popular optimization algorithms available in R. First of all, it would be useful for educational purposes (become familiar with usage of optimization methods in R, etc.). At second, it allows to understand "power" of optimization methods, their strengths, and weaknesses.

Disclaimer. The article does not cover all optimization methods available for R. Also it does not claim that one algorithm is better than other. It just presents results of my personal investigation.

Test functions

In the capacity of benchmark I choose to use the following test functions:

These functions are widely used for testing of optimization algorithms. They are characterized by complex non-linear shape with lot of local minimums but with one (or few)global minimum(s). It is considered that that it is difficult to find global minimum for these functions.

In additional these functions can be generalized to multidimensional case. This is important, since I am going to test optimization methods for different counts of optimization variables to understand how methods behave when count of dimensions is small / high.

Below are more detailed information about each function.

Sphere function

Here is R code for Sphere function (two- and multi-dimensional variants):

sphere = function(x, y) {
 x^2 + y^2
}

sphere.md = function(x) {
 sum(x*x)
}

Boundaries for function arguments: -Inf..+Inf

Global minimum: f(0, ..., 0) = 0

Here is 3D plot of this function:

Note: here and after I am using rgl package to draw 3D plots of the functions. The code for drawing function is below:

library(rgl)

plot_func_3D = function(func, from, to, by) {
 x = seq(from, to, by)
 y = seq(from, to, by)
 z = outer(x, y, func)
 persp3d(x, y, z, col = "red", alpha = 0.7)
}

plot_func_3D(sphere, -1, 1, 0.1)
plot_func_3D(rosenbrock, -2, 2, 0.1)
plot_func_3D(rastrigin, -1, 1, 0.1)
plot_func_3D(styblinski_tang, -5, 5, 0.1)

Rosenbrock function

Here is R code for Rosenbrock function (two- and multi-dimensional variants):

rosenbrock = function(x, y) {
 (1-x)^2+100*(y-x^2)^2
}

rosenbrock.md = function(x) {
 n = length(x)
 sum( (x[1:(n-1)]-1)^2 + 100*(x[2:n] - x[1:(n-1)]^2)^2 )
}

Boundaries for function arguments: -Inf..+Inf

Global minimum: f(-1, 1, ..., 1) = 0

Here is 3D plot of this function:

Rastrigin function

Here is R code for Rastrigin function (two- and multi-dimensional variants):

rastrigin = function(x, y) {
 20 + x^2 - 10*cos(2*pi*x) + y^2 - 10*cos(2*pi*y)
}

rastrigin.md = function(x) {
 length(x)*10 + sum(x^2 - 10*cos(2*pi*x))
}

Boundaries for function arguments: -Inf..+Inf; but search domain usually: -5.12..5.12

Global minimum: f(0, ..., 0) = 0

Here is 3D plot of this function:

Styblinski-Tang function

Here is R code for Styblinski-Tang function (two- and multi-dimensional variants):

styblinski_tang = function(x, y) {
 0.5*(x^4 - 16*x^2 + 5*x + y^4 - 16*y^2 + 5*y)
}

styblinski_tang.md = function(x) {
 0.5*sum(x^4 - 16*x^2 + 5*x)
}

Boundaries for function arguments: -Inf..+Inf

Global minimum: f(-2.903534, ..., -2.903534) = -39.16599n, where n is a count of function arguments

Here is 3D plot of this function:

Optimization methods

I choose to test following optimization methods:

Sure, there are a lot of other optimization methods available in R (here is the full list), but I prefer to use those ones as starting point.

Below are more detailed info about each method.

Nelder-Mead method (NM)

This method is available through optim function from stats package when it (function) is called with argument method set to "Nelder-Mead".

Method parameters:

alpha (by default 1.0), beta (0.5), and gamma (2.0) - scaling factors
Initial values for optimization
Stop criteria - iterations count, tolerance

Here is R code which illustrates usage of Nelder-Mead optimization:

res1 = optim(par = rep(1e6, 2), sphere.md, method = "Nelder-Mead", control = list(maxit = 1e3))
res2 = optim(par = rep(1e6, 2), rastrigin.md, method = "Nelder-Mead", control = list(maxit = 1e3))
res3 = optim(par = rep(1e6, 2), rosenbrock.md, method = "Nelder-Mead", control = list(maxit = 1e3))
res4 = optim(par = rep(1e6, 2), styblinski_tang.md, method = "Nelder-Mead", control = list(maxit = 1e3))

Broyden–Fletcher–Goldfarb–Shanno method (BFGS)

This method is available through optim function from stats package when it's called with argument method set to "BFGS".

Here is code which illustrates how to run BFGS optimization in R:

res1 = optim(par = rep(1e6, 2), sphere.md, method = "BFGS", control = list(maxit = 1e3))
res2 = optim(par = rep(1e6, 2), rastrigin.md, method = "BFGS", control = list(maxit = 1e3))
res3 = optim(par = rep(1e6, 2), rosenbrock.md, method = "BFGS", control = list(maxit = 1e3))
res4 = optim(par = rep(1e6, 2), styblinski_tang.md, method = "BFGS", control = list(maxit = 1e3))

Conjugate gradients method (CG)

This method is available through optim function from stats package when it's called with argument method set to "CG".

Here is R code which illustrates how to optimize function by using CG method:

res1 = optim(par = rep(1e6, 2), sphere.md, method = "CG", control = list(maxit = 1e3))
res2 = optim(par = rep(1e6, 2), rastrigin.md, method = "CG", control = list(maxit = 1e3))
res3 = optim(par = rep(1e6, 2), rosenbrock.md, method = "CG", control = list(maxit = 1e3))
res4 = optim(par = rep(1e6, 2), styblinski_tang.md, method = "CG", control = list(maxit = 1e3))

Simulated annealing (SA)

This method is available through optim function from stats package when it's called with argument method set to "SANN". Also there is separate package GenSA which implements simulated annealing algorithm.

Method parameters:

Initial values, lower and upper bounds
Stop criteria
Initial temperature
Etc.

Here is code which illustrates usage of Simulated Annealing via optim function:

res = optim(par = rep(1e6, 2), fn = sphere.md, method = "SANN", control = list(maxit = 1e3))
res = optim(par = rep(1e6, 2), fn = rastrigin.md, method = "SANN", control = list(maxit = 1e3))
res = optim(par = rep(1e6, 2), fn = rosenbrock.md, method = "SANN", control = list(maxit = 1e3))
res = optim(par = rep(1e6, 2), fn = styblinski_tang.md, method = "SANN", control = list(maxit = 1e3))

And here is code for GenSA function:

library(GenSA)

res = GenSA(par = rep(1e6, 2), fn = rastrigin.md, lower = rep(lower_bound, variable_count), upper = rep(upper_bound, variable_count), control = list(maxit = max_iter, smooth = F))
res = GenSA(par = rep(1e6, 2), fn = sphere.md, lower = rep(lower_bound, variable_count), upper = rep(upper_bound, variable_count), control = list(maxit = max_iter, smooth = T))
res = GenSA(par = rep(1e6, 2), fn = rosenbrock.md, lower = rep(lower_bound, variable_count), upper = rep(upper_bound, variable_count), control = list(maxit = max_iter, smooth = F))
res = GenSA(par = rep(1e6, 2), fn = styblinski_tang.md, lower = rep(lower_bound, variable_count), upper = rep(upper_bound, variable_count), control = list(maxit = max_iter, smooth = F))

Particle swarm optimization (PSO)

This method is available through psoptim function from pso package.

Method parameters:

Intial values, lower and upper bounds
Stop criteria - tolerance, iterations count, evaluations count, etc.
The swarm size
The exponent for calculating number of informants
Etc.

Here is R code which illustrates how to find function minimum by using PSO method:

library(pso)

res1 = psoptim(par = rep(1e6, 2), fn = sphere.md, lower = -1e6, upper = 1e6, control = list(maxit = 1e3))
res2 = psoptim(par = rep(1e6, 2), fn = rastrigin.md, lower = -1e6, upper = 1e6, control = list(maxit = 1e3))
res3 = psoptim(par = rep(1e6, 2), fn = rosenbrock.md, lower = -1e6, upper = 1e6, control = list(maxit = 1e3))
res4 = psoptim(par = rep(1e6, 2), fn = styblinski_tang.md, lower = -1e6, upper = 1e6, control = list(maxit = 1e3))

Genetic algoritm (GA)

This method is available through rbga function from genalg package.

Method parameters:

Initial values, lower and upper bounds
Population size
Mutation chance - the chance that a gene in the chromosome mutates
Elitism - the number of chromosomes which are kept in the next generation
Stop criteria - iterations count

Here is R code which illustrates how to find function minimum by using genetic algorithm:

library(genalg)

res = rbga(evalFunc = sphere.md, stringMin = rep(-1e6, 2), stringMax = rep(1e6, 2), iters = 1e3)
res = rbga(evalFunc = rastrigin.md, stringMin = rep(-1e6, 2), stringMax = rep(1e6, 2), iters = 1e3)
res = rbga(evalFunc = rosenbrock.md, stringMin = rep(-1e6, 2), stringMax = rep(1e6, 2), iters = 1e3)
res = rbga(evalFunc = styblinski_tang.md, stringMin = rep(-1e6, 2), stringMax = rep(1e6, 2), iters = 1e3)

Comparison results

I choose to run optimization methods with the following parameters:

Stop criteria - max iteration count is set to 1,000
Lower and upper bounds - from -1,000,000 to 1,000,000
Initial value - 1,000,000
Variables count - 2, 5, 50, and 500

Note: I would prefer to do not specify initial values and lower / upper bounds at all, but these are mandatory parameters for some methods. So I decided to specify "wide enough" range and completely wrong initial value to see how each method will manage with this.

All other parameters use their default values.

And last but not the least, below I would treat optimization result as "good" if the absolute difference between global minimum and value returned by optimization method is less than 1. I would treat that optimization method improves initial value if after optimization it become closer to global minimum. In such case I would calculate optimization error as absolute difference between global minimum for a function and value returned by optimization. Otherwise I would treat optimization results as "bad".

Variables count = 2

Surprisingly, Nelder-Mead method did not show good results even in case of two variables. For Sphere and Rastriging functions - error is about 1e3, for Rosenbrock and ST functions - error is 1e14 and higher. Increasing of max iterations count does not improve the results.

BFGS method showed good results for ST and Sphere functions (error is less than 1), for Rastrigin function error is about 1e3 and for Rosenbrock function is about 1e14. Increasing of max iterations count slightly improves the results.

CG method shows good results for ST, Sphere, and Rastriging functions and bad results for Rosenbrock function (error is about 1e6).

Simulated annealing. It's interesting that results showed by GenSA function and SANN-optim are cardinally different. GenSA showed good results in case of all functions and SANN-optim - bad results in all cases (error is 1e12 and higher).

PSO method showed good results for all functions.

Genetic algorithm improved slightly Rasrtigin (error - 1e4), Rosenbrock (1e5) and Sphere (1e3) functions and did not show any improvement in case of ST function.

Variables count = 5

As expected, the situation does not change dramatically when count of variables was increased from 2 to 5. Nelder-Mead, BFGS, CG, PSO, and SA showed the relatively same results as in previous cases. Geneticl algorithm showed slightly worst results.

Variables count = 50

Nelder-Mead - same results (bad) as in previous cases.

BFGS - good results for Rastrigin and Sphere function, bad results for Rosenbrock function, and some improvement in case of ST function (error - 1e2).

CG - good results in case of Sphere function, some improvement in case of ST and Rastrigin functions (error - 1e2), insignificant improvement in case of Rosenbrock function (error - 1e5).

GenSA - now method took much more time to complete, but in case of Rastriging and Sphere function results are still good.

PSO - the results became much worse, although method showed some improvement in case of Sphrere and Rastrigin functions.

GA - bad results in all cases.

Variables count = 500

Nelder-Mead - same results (bad) as in previous cases.

BFGS - good results in case of Sphere and Rastrigin functions, no improvement in case of Rosenbrock and ST functions.

CG - good results in case of Sphere function, some improvement in case of Rastrigin and ST functions.

GenSA - n/a (execution takes too much time)

PSO - bad results in all cases

Genetic algorithm - bad results in all cases

Conclusion

BSFG and CG methods showed relatively good results both when count of variables is small and when it is high.

It's strange but Nelder-Mead method did not work for me at all. Either it is not applicable for such functions, or I used it in a wrong way.

Simulated annealing showed good results when count of variables is small, but with increasing of count of variables its execution time also significantly increased.

PSO method showed good results when count of variables is small and bad results when count of variables is high.

Genetic algorithm improves objective function values somewhat only when count of variables is small. Also it does not look like that it converges to global minimum.

Iurii Marymonskyi's Tech Blog

воскресенье, 7 июля 2013 г.

Comparison of misc optimization methods available in R

Introduction

Test functions

Sphere function

Rosenbrock function

Rastrigin function

Styblinski-Tang function

Optimization methods

Nelder-Mead method (NM)

Broyden–Fletcher–Goldfarb–Shanno method (BFGS)

Conjugate gradients method (CG)

Simulated annealing (SA)

Particle swarm optimization (PSO)

Genetic algoritm (GA)

Comparison results

Variables count = 2

Variables count = 50

Conclusion

Комментариев нет:

Отправить комментарий

воскресенье, 7 июля 2013 г.

Comparison of misc optimization methods available in R

Introduction

Test functions

Sphere function

Rosenbrock function

Rastrigin function

Styblinski-Tang function

Optimization methods

Nelder-Mead method (NM)

Broyden–Fletcher–Goldfarb–Shanno method (BFGS)

Conjugate gradients method (CG)

Simulated annealing (SA)

Particle swarm optimization (PSO)

Genetic algoritm (GA)

Comparison results

Variables count = 2

Variables count = 50

Conclusion

Комментариев нет:

Отправить комментарий

воскресенье, 7 июля 2013 г.