In this small article I would like to collect and to describe briefly popular techniques and approaches which can be used to improve performance of Monte-Carlo calculations. Computational performance is one of the key factors for successful Monte-Carlo methods application, since results accuracy is growing with inverse ratio to square root of numbers of simulations (in classical case). So if you need to improve accuracy twice, you will have to perform four times more simulations. On other hand, the fact that Monte-Carlo method produces "inaccurate" results may be used to use "shortcuts" and improve calculation performance in some cases.
I've divided all such techniques in the four following groups:
- Parallelization. The natural approach to improve speed up Monte-Carlo simulations is to span calculations across multiple threads on the single on the same computer, and (or) acrsoss multiple computers.
- Vectorization (aka array programming). This is programming technique which generalizes operations on scalar numbers, so that they can be applied to numeric vectors and matrices. Again, since Monte-Carlo method implies application of same operations to the numbers across different simulations, it's quite naturally to use vectorization to improve performance. The main benefit is that we avoid overhead on calling math operations multiple times and instead perform them once for all input data. The main disadvantage of this approach is that it requires development effort to write vectorized code. Also the potential problem is that there are algorithms which are not easy to vectorize (e.g. such as Brent's method). But anyway potential performance gain from vectorization is huge and I consider this technique as one of the key ones for any high-performance Monte-Carlo application. The are several hardware/software solutions which allows to fully use benefits of vectorization:
- General-Purpose Computing on Graphical Processing Units (GP-GPU). For example, Compute Unified Device Architecture (CUDA) from Nvidia. Here Monte-Carlo calculations are performed on GPU's and then results are passed back to CPU. Code written for CUDA on single GPU, then can be relatively easy scaled across grip of GPU's. The one potential drawback is time required to exchange data between GPU and CPU. In some cases, bandwidth between GPU and CPU may be bottle neck, so performance gain may be much smaller then expected.
- Single-Instruction Multiple-Data (SIMD) computation on CPU. For example, Streaming SIMD Extensions from Intel. This technique is similar to GPU processing, but in this case calculations are done on CPU itself by using special extended CPU instructions set which allows to apply SIMD math operations to numeric vectors. Still, usually size of this numeric vectors is much smaller than in case of GPU (e.g. 4 single-floating point numbers, sometimes 8 or even 16), so performance gain is also smaller. On the other hand, since computations are done on CPU side, no time is required to exchange data with GPU. SSE instructions has been introduced by Intel relatively long ago (as well as by AMD with 3D Now!) and now almost all modern general purpose CPU's supports them.
- Field Programmable Gate Arrays (FPGA).
- General Coding Optimizations. Obviously, Monte-Carlo applications code can be optimized by using general-purpose approaches as any other programming app. I would not describe all possible approaches here, but would like to mention only couple of them:
- Usage of optimized math functions. Usually, Monte-Carlo applications use quite limited set of math functions. For example, in finance area these are usually such operations as exp(), log(), sqrt(). These functions are available in standard libraries for any programming language. But such standard implementations usually work quite slow, although provide results of high accuracy. The alternative may be to used alternative implementations which may produce lower accuracy results in some cases, but work much faster. Such trade off (accuracy vs performance) may be OK for Monte-Carlo applications which are by desing produce approximate results only. For example, refer to this article with description of fast & compact implementation of exponent function.
- Carefully choose data types. E.g. if double-precision numbers are twise bigger than single-precision floating point numbers, thus their usage will require more time to keep and process data. Depending on chosen target accuracy of Monte-Carlo simulations, it may make sense to use single floats instead of double ones.
- Variance Reduction Techniques. This is group of techniques which allows to decrease variance (~error) of Monte-Carlo simulations and thus improve their accuracy. Such techniques may be useful for app performance as well, since, for example, they may allow to reach better accuracy by using the same numbers of simulations.
- Quasi Monte Carlo. In contrast to traditional Monte-Carlo which uses regular pseudo-random numbers, Quasi MC uses so called low-discrepancy numbers which on one hand share some characteristics of random numbers, but on other hands allows to get better simulation results due to better coverage of probability space. There are many various methods to generate such sequences - Sobol, Halton, Torus - and there are several similar approaches such as Latin hypercube sampling.
- Control Variates. This technique requires introduction of new special control variable which is correlated with simulated random variable but has now expected value. Then resulting simulations can be corrected by using difference between simulated control variable and its expected value. This allows to decrease overall results variance significantly. The trickiest part of this approach is choose of control variate. In some case, it may be relatively easy (e.g. use stock spot price as control variate for option pricing), in others - it may be much harder.
- Antithetic Variates. Simulations accuracy is improved by using in additional to normal (pseudo-)random numbers their antithetic path. Since normal and antithetic path have large negative correlation, resulting variance is much lower.
- Stopping Criteria. This approach doesn't allow to decrease variance of results, instead it allows to stop simulations when required accuracy has been achieved or when results are stable enough. This allows to stop simulating even when target numbers of simulations has not been achieved yet.
- Multe Level Monte Carlo. This approach is similar to control variate somehow, but instead of introduction of single control variate, it is proposed to introduce "cascade" of such variables to get better accuracy.
- Roulette / Splitting. This is so-called population splitting technique which allows to spend more time simulating important points, then unimportant ones. It is applicable in case when it is possible to identify whether particular simulation is important for final result or now. So that it is not important, then it makes sense to terminate it and don't spend computation efforts anymore.
- Common Random Numbers.
- Importance Sampling.