воскресенье, 18 июня 2017 г.

Comparing approaches to correlated random numbers generation

Introduction

Correlated random numbers generation is crucial part of market data simulations and thus it is one of the important functions within Monte-Carlo risk engines. The most popular approaches here are usage of Cholesky decomposition, Singular value decomposition, or Eigen decomposition (aka Spectral decomposition). These approaches have their own advantages and disadvantages. In this article I would like to perform small comparison of these methods on real life data.


Approaches to Correlated Random Numbers Generation

In general, correlated random generation consists of two steps:

1. Decomposition of  the correlation matrix C:


2. Then correlated random numbers can be generated by using U matrix as follows:


So, let's imagine that we have variable corr_matrix with correlation matrix:
corr_matrix = matrix(c(1.0, 0.3, 0.6, 0.3, 1.0, 0.4, 0.6, 0.4, 1.0 ), 
nrow = 3, ncol = 3)

And we have matrix rnd with 3 independent series of random numbers:
rnd = matrix(rnorm(10000 * 3), nrow = 10000, ncol = 3)


Then, for Cholesky decomposition this approach looks as follows:
u = chol(corr_matrix) corr_rnd = rnd %*% u

For SVD:
svd_m = svd(corr_matrix) 
u = svd_m$u %*% diag(sqrt(svd_m$d)) %*%
t(svd_m$v)corr_rnd = rnd %*% u

For Eigen decomposition:
e = eigen(corr_matrix, symmetric = T)u = e$vectors %*%
diag(sqrt(e$values))corr_rnd = rnd %*% u

Correlation Matrix vs Covariance Matrix

Btw, the interesting question here is what matrix to use for random numbers correlation - correlation matrix or covariance matrix.

Since correlation matrix standardizes the values, one usually should use covariance matrix when variables scales are similar and correlation matrix when scales are different.

Comparison Approach

The comparison will be done by using the following approach:

1. Retrieve live market data and calculate correlation matrix out of them.

2. Generate correlated random numbers basing on different sets of market factors retrieved in step #1.

3. Estimate quality of calculated random numbers by comparing correlations of resulted random numbers which original correlations.

4. Estimate the time required for calculations.

Market Data Retrieval

I've chosen to use equity prices for about 900 tickers from Yahoo Finance as real life market data.

I've downloaded these prices by using quantmod package (nb: version 0.4-9 must be used to accommodate the latest changes in Yahoo Finance API).

The list of tickers to use can be retrieved here.

Then I've calculated returns of the prices and calculated the correlation matrix.

The code looks as follows:
library(quantmod)library(zoo)library(data.table) tickers = unique(toupper(readLines("tickers.txt"))) get_prices = function(ticker, from = as.Date("2015-03-01"), to = as.Date("2017-03-01")) { ticks = getSymbols(ticker, auto.assign = F, from = from, to = to) ticks = data.table(date = as.Date(index(ticks)), coredata(ticks)) setnames(ticks, c("date", "open", "high", "low", "close", "volume", "adjusted")) ticks[, .(date = date, price = close), keyby = date]} valid_dates = get_prices("^gspc")$date prices = sapply(tickers, function(ticker) { tryCatch({ print(which(ticker == tickers)) raw_prices = get_prices(ticker) prices = raw_prices[J(valid_dates)] if (is.na(prices[1, price])) return(NULL) prices[, price := na.locf(price, na.rm = F)] diff(log(prices$price)) }, error = function(e) { NULL })}) prices = do.call(cbind, prices) prices = prices[, -which(sapply(seq_len(ncol(prices)), function(i) sd(prices[, i])) == 0)] prices_corr_matrix = cor(prices)
The resulted correlation matrix looks as follows (the first 50 rows & columns):


Results & Conclusions

In order to compare different approaches, the following function is used. It compares original correlation matrix calculated on historical equity prices returns with correlation matrix calculate on random numbers after transformation.

corr_matrix_compare = function(m1, m2) { abs(sum(m1 - m2)) / length(m1) }

I've run calculations for cases with 3, 10, 20, 30, 40, 50, 100, 200, 250, 300, 350, 400, 450, 500, 550, 600, and 868 equities.

The results are following (column status shows whether calculation was successful of failed; column res showed accuracy calculated by function specified above, and column time shows how much time in seconds calculation took):


Accuracy between different approaches look as follows:



Some conclusions:

  1. In general, all the methods work fine on smaller dimensions sizes. As expected, Cholesky decomposition started to fail with larger dimensions and correlation matrix became non-definite. What's interesting, Eigen decomposition also doesn't work on larger dimensions counts.
  2. From performance point of view, Cholesky demonstrated the best result.
  3. SVD worked fine and produced results in all cases.

Комментариев нет:

Отправить комментарий