SHLee AI Financial Model: ETF Tracking Error Minimization using R code

This post explains how to construct ETF tracking error (TE) minimization and introduces R packages which perform a (sparse) index tracking. ETF (Exchange Traded Fund) is a traded fund listed on the exchange. ETF tries to mimic or follow a target benchmark index (BM) such as S&P500. This is called the tracking error (TE) minimization.

Index Tracking

ETF selects a small number or subset of constituents of BM index to mimic it. Since ETF does not contain all constituents of BM index (full replication), tracking error (TE) take places. Furthermore, the optimal subset is not fixed but variable according to the market developments so that a frequent or periodic rebalancing is required.

We will use ROI optimization package so, for detailed information, refer to the following post.

Using NEOS Optimization Solver in R code

The number of constituents of BM index is so large that the full replication is impossible due to the transaction costs and liquidity problem. Therefore, Index tracking is finding the optimal combination of subset securities for minimizing tracking errors and its objective function is formulated as follows.
\[\begin{align} TE = \frac{1}{T} \sum_{t=1}^{T} \left( \sum_{i=1}^{N} \left( w_i r_{it} - R_t^I \right)^2 \right) \end{align}\] Here, \(R_t^I\) adn \(r_{it}\) are time \(t \) returns of BM index and its constituents respectively and \(w_i\) is the weight of \(i\) constituent.

Using vector-matrix notation, the above problem is reformulated with its constraints as follows. \[\begin{align} &\min_{w} \frac{1}{T} || Rw - R^I ||_2^2 \\ \text{subject to}& \\ &e^T w = 1 \\ &\eta_i Z_i \leq w_i < Z_i \delta_i \\ &\sum_{t=1}^{N} Z_i = K \\ &Z_i = 0 \quad or \quad 1, \quad i=1,2,...,N \end{align}\] Here, \(N\) is the number of constituents of BM index and \(K\) is the number of constituents of ETF. \(R^I=(R_1^I,R_2^I,…,R_T^I )^T\) is a \(T×1\) vector of BM index return and \(R=(R_1,R_2,…,R_T)\) is a \(T×N\) matrix which is concatenated with all \(T×1\) vector of \(R_i=(r_i1,r_i2,…,r_iT )^T\) horizontally. \(w=(w_1,w_2,…,w_N )^T\) is a \(T×1\) vector of allocation weights.

Seeing the above constraints, first condition is so called budget constraint which means all capital is invested into ETF portfolio. Second condition denotes the lower and upper bound for allocation weights. Third condition is a cardinality constraints that \(Z_i\) may take on 0 or 1 and sum of it is \(K\). This constraints means only \(K\) securities from all \(N\) are invested.

But this problem is considered a difficult problem because cardinality constraints make this NP hard problem, in other words, \(\sum_{t=1}^{N} Z_i = K\) make this problem highly dimensional discrete problem.. This means only when we calculate all combinations by using mixed integer programming, we can select the optimal combination. But the number of combination is too large to calculate it. For this reason, this problem is also called the sparse index tracking problem. Of course, recently Fengmin, Xu, and Xue (2015) suggest \(L_{1/2}\) Regularization for this problem.

For this post, we use the sparseIndexTracking R package for the sparse index tracking and also use the ROI.plugin.ecos R package for index tracking and finally compare these two results.

Second-order conic programming (SOCP)

For index tracking, we use the ROI and ROI.plugin.ecos which provide a solver for the second-order cone programming (SOCP).

What is a SOCP and what is the relationship between SOCP and index tracking?

Second-order cone programming (SOCP) problem is a convex optimization problem in which a linear function is minimized over the intersection of an affine linear manifold with the Cartesian product of second-order cones.

Index tracking problem can be rewritten into the SOCP format and ROI.plugin.ecos or other index tracking solver need SOCP format as input format. Therefore we need to transform our index tracking errors minimization problem into second-order conic programming problem.

We present the original and transformed problems respectively. You can easily find the concept of SOCP in the context of index tracking problem. For example, we try to mimic the benchmark index by minimizing tracking error.

The original TE problem is

\[\begin{align} &\min_{w} \sqrt{\sum_{t=1}^{T} \left( \sum_{i=1}^{N} \left( R_t^I - w_i r_{it} \right)^2 \right)} \\ \text{subject to}& \\ &e^T w = 1 \\ &w > 0 \\ \end{align}\]
Here, \(w = (w_1 , w_2 , ..., w_N) \) and \(r = (r_1, r_2, ..., r_N) \).

The transformed TE problem as SOCP is

\[\begin{align} &\min_{w} t \\ \text{subject to}& \\ &\sqrt{\sum_{t=1}^{T} \left( \sum_{i=1}^{N} \left( R_t^I - w_i r_{it} \right)^2 \right)} \le t \\ &e^T w = 1+t \\ &w > 0 \\ \end{align}\]
Here, \(w = (w_1 , w_2 , ..., w_N, t) \) and \(r = (r_1, r_2, ..., r_N, 1) \).

It is worth noting that definitions of \(w\) and \(r\) are different between two equations. The second equation also includes \(t\) as a control variable. Second equation treats the first equation's objective function as an additional constraint. For convenience, two equations omit \(\frac{1}{T}\) since it is a constant and use a square root for formal expression.

Although the definition of SOCP seems somewhat difficult, we can easily observe the characteristics of SOCP from the above two formulations. The bottom line is that the convex objective function can be transformed into a constraint and the original objective function is replaced by a linear function.

R package

Using ROI and ROI.plugin.ecos, we can perform the index tracking minimization. But in this case, since there is no cardinality constraints, we need to select the subset of securities in advance and use the SOCP format.

But sparseIndexTracking R package is easy to use since its arguments are y and X as data. It also implements the cardinality constraints by adjusting the regularization parameter (\(\lambda\)). The higher the \(\lambda\), the more the coefficients are shrinked towards zero.

R code

The following R code implements two index tracking problems. We use data which is embedded in sparseIndexTracking R package. For expositional purpose, we assume the universe of stock as consisted of 30 because it is difficult to demonstrate the results as a table or figure when using all 386 stocks. But after understanding the main contents, we also deal with the case of 386 stocks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
#==============================================================
# Financial Econometrics & Derivatives, ML/DL 
# using R, Python, Keras, Tensorflow  
# by Sang-Heon Lee 
#
# https://shleeai.blogspot.com
#--------------------------------------------------------------
# Index Tracking Error Minimization 
# using ROI.ecos and sparseIndexTracking
#==============================================================
 
graphics.off()  # clear all graphs
rm(list = ls()) # remove all files from your workplace
    
library(sparseIndexTracking)
library(ROI)
library(ROI.plugin.ecos)
    
#------------------------------------------------
# Data
#------------------------------------------------
 
    # load stock index data
    data(INDEX_2010)
    y = as.vector(INDEX_2010$SP500)
    X = as.matrix(INDEX_2010$X)
    
    # comment it when full data is used
    X <- X[,1:30]
    
    nobs = length(y); nX = ncol(X)
 
#------------------------------------------------
# 1) Using ROI and ROI.ecos
#------------------------------------------------
    
    #--------------------------------------------
    # the original form
    #--------------------------------------------
    # w  = c( w1,  w2,  w3)' 
    # Xn = c(Xn1, Xn2, Xn3)
    #
    # min sqrt( (y1 - X1'*w)^2 + (y2 - X2'*w)^2 
    #         + (y3 - X3'*w)^2 + (y4 - X4'*w)^2 
    #         + (y5 - X5'*w)^2
    # )
    # s.t.
    #      w1 + w2 + w3 = 1
    #      w1, w2, w3 > 0
    #--------------------------------------------
    
    #--------------------------------------------
    # --> Rewritten into the SOCP form
    #--------------------------------------------
    # w  = c( w1,  w2,  w3, t)' 
    # Xn = c(Xn1, Xn2, Xn3, 1)
    #
    # minimize t
    # s.t.
    #      sqrt( (y1 - X1'*w)^2 + (y2 - X2'*w)^2 
    #          + (y3 - X3'*w)^2 + (y4 - X4'*w)^2 
    #          + (y5 - X5'*w)^2
    #      ) <= t
    #      w1 + w2 + w3 = 1
    #      w1, w2, w3 > 0
    #--------------------------------------------
    
    #--------------------------------------------
    # Index tracking error minimization
    # using second order cone programming
    #--------------------------------------------
    
    A <- rbind(c( rep(0,nX), -1), cbind(X,0))
    
    soc <- OP(objective   = L_objective(c(rep(0,nX), 1)),
              constraints = c(
                  C_constraint(A, K_soc(nobs+1), c(0,y)),
                  L_constraint(c(rep(1,nX), 0), "==", 1))
    )
    
    soc_sol <- ROI_solve(soc, solver = "ecos")
    wgt_roi <- soc_sol$solution[1:nX]
    
#------------------------------------------------
# 2) Using sparseIndexTracking
#------------------------------------------------
        
    # fit portfolio under error measure ETE 
    # (Empirical Tracking Error)
    
    # Unconstrained
    # wgt_sps <- spIndexTrack(X, y, lambda = 1e-180, u = 1, 
    #                        measure = 'ete', thres = 1e-180)
    
    # Constrained
     wgt_sps <- spIndexTrack(X, y, lambda = 1e-7, 
                             u = 1, measure = 'ete')
 
#------------------------------------------------
# 3) Comparison for allocation weights
#------------------------------------------------
    
    round(cbind(wgt_roi, wgt_sps),4)
 
Colored by Color Scripter
cs

With arguments for an unconstrained problem (\(\lambda=1e-180\)) and a subset of stocks (\(n=30\)) as an assumed universe, running the above R code results in the following weight allocations of two R packages: ROI with ROI.plugin.ecos and sparseIndexTracking. Since no regularization is applied, two results are same.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
> #------------------------------------------------
> # 3) Comparison for allocation weights
> #------------------------------------------------
>     
>     round(cbind(wgt_roi, wgt_sps),4)
                   wgt_roi wgt_sps
1436513D UN Equity  0.0270  0.0270
1500785D UN Equity  0.0220  0.0220
1518855D US Equity  0.0319  0.0319
9876566D UN Equity  0.0607  0.0607
A UN Equity         0.0149  0.0149
AA UN Equity        0.0426  0.0426
AAPL UW Equity      0.0444  0.0444
ABC UN Equity       0.0151  0.0151
ABT UN Equity       0.1330  0.1330
ADBE UW Equity      0.0114  0.0114
ADM UN Equity       0.0127  0.0127
ADP UW Equity       0.1440  0.1440
ADSK UW Equity      0.0113  0.0113
AEE UN Equity       0.0453  0.0453
AEP UN Equity       0.0158  0.0159
AES UN Equity       0.0074  0.0074
AET UN Equity       0.0132  0.0132
AFL UN Equity       0.0413  0.0413
AGN UN Equity       0.0145  0.0146
AIG UN Equity       0.0002  0.0002
AIV UN Equity       0.0452  0.0452
AIZ UN Equity       0.0202  0.0202
AKAM UW Equity      0.0000  0.0000
ALL UN Equity       0.0348  0.0348
ALTR UW Equity      0.0172  0.0172
AMAT UW Equity      0.0336  0.0336
AMGN UW Equity      0.0411  0.0411
AMP UN Equity       0.0503  0.0503
AMT UN Equity       0.0437  0.0437
AMZN UW Equity      0.0051  0.0051
 
Colored by Color Scripter
cs

For the sparse index tracking, with arguments for a constrained problem (\(\lambda=1e-6\)) and a subset of stocks (\(n=30\)) as an assumed universe, running the above R code results in the following weight allocations of two R package: ROI with ROI.plugin.ecos and sparseIndexTracking. We can easily find that the sparse index tracking demonstrates the selection effect because (\(\lambda=1e-6\) invokes a regularization.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
> #------------------------------------------------
> # 3) Comparison for allocation weights
> #------------------------------------------------
>     
>     round(cbind(wgt_roi, wgt_sps),4)
                   wgt_roi wgt_sps
1436513D UN Equity  0.0270  0.0397
1500785D UN Equity  0.0220  0.0000
1518855D US Equity  0.0319  0.0379
9876566D UN Equity  0.0607  0.0656
A UN Equity         0.0149  0.0000
AA UN Equity        0.0426  0.0445
AAPL UW Equity      0.0444  0.0510
ABC UN Equity       0.0151  0.0000
ABT UN Equity       0.1330  0.1598
ADBE UW Equity      0.0114  0.0000
ADM UN Equity       0.0127  0.0000
ADP UW Equity       0.1440  0.1783
ADSK UW Equity      0.0113  0.0000
AEE UN Equity       0.0453  0.0652
AEP UN Equity       0.0158  0.0000
AES UN Equity       0.0074  0.0000
AET UN Equity       0.0132  0.0000
AFL UN Equity       0.0413  0.0473
AGN UN Equity       0.0145  0.0000
AIG UN Equity       0.0002  0.0000
AIV UN Equity       0.0452  0.0543
AIZ UN Equity       0.0202  0.0000
AKAM UW Equity      0.0000  0.0000
ALL UN Equity       0.0348  0.0418
ALTR UW Equity      0.0172  0.0000
AMAT UW Equity      0.0336  0.0507
AMGN UW Equity      0.0411  0.0499
AMP UN Equity       0.0503  0.0595
AMT UN Equity       0.0437  0.0543
AMZN UW Equity      0.0051  0.0000
 
Colored by Color Scripter
cs

The two figures below show the weight allocations of two cases. When there is no regularization for cardinality constraint, two results are same.

ETF Tracking Error Minimization using R code

When there is a regularization for cardinality constraint, two results are different since sparse index tracking select a subset of securities from 30 universe.

When we use all 386 securities, the folloiwng two figures are obtained.

In the above case of all data, we can observe some discrepancies in allocation weights but overall distribution of weights are similar. As variables are too many, some numerical error is largely cumulated.

But for more precise calculations, we think that investigation with hyperparameters (\(\lambda\) and so on) varying is also needed.

These two approaches are complementary because sparse index tracking does not consider economically significant variables but statistically significant variables. \(\blacksquare\)

SHLee AI Financial Model

Pages

ETF Tracking Error Minimization using R code

Index Tracking

Second-order conic programming (SOCP)

R package

R code

No comments:

Post a Comment