Hurst Exponent
Pairs trading literature use the Hurst exponent frequently since it gives an simple and intuitive indicator for the behavior of stock returns. Using S&P 500 returns, let's learn how to estimate it using R code manually and then use R package conveniently.
If a variance of a time series is linear with time, it is considered a random walk since variance increases with time. We can not know the future value of it in this case.
If the variance is decreasing with time, roughly speaking, it reverts to its mean in the future since its variance is smaller in the future.
Finally, if the variance is increasing exponentially or geometrically or faster than time, it go away from its mean with a trend.
We assume that these three kinds of behaviors of a time series is summarized as one scalar \(H\) (Hurst exponent) which takes on 0.5 as a baseline of random walk. We can think of the following equation.
\[\begin{align} var(y_t) \sim t^{2H} \end{align}\]
Hurst exponent always range between 0 and 1 and based on the value of H
- H < 0.5 : mean-reversion to the long-term mean (anti-persistent)
- H = 0.5 : random walk
- H > 0.5 : strong trend and long memory effects (persistent)
Splitting a time series into sub series
Before calculating the Hurst exponent, let's assume that whole points in time can be expressed by using the sub divisions or sub time series.
\[\begin{align} & \text{points in time using (whole) time index } t \\ = &[1,2,3,...,L-2, L-1, L] \\ \\ = &[1,2,...,L_s-1,L_s, L_s+1, ... , L-1, L] \\ \\ & \text{points in time using sub time index } i \\ &\text{ within sub series } s \\ = &\underbrace{[1,2,..,L_s=\frac{L}{N_s}]}_{s=1} \underbrace{[1,2,..,L_s=\frac{L}{N_s}]}_{s=2} \\ &... \underbrace{[1,2,..,L_s=\frac{L}{N_s}]}_{s=N_s-1} \underbrace{[1,2,..,L_s=\frac{L}{N_s}]}_{s=N_s} \end{align}\]
\(t\) = (whole) time index
\(s\) = sub series of length \(L_s\)
\(i\) = (sub) time index within each sub series \(s\)
Rescaled Range Analysis
According to Hurst (1951), the rescaled range (R/S) analysis is the range of partial sums of deviations from their means, which is rescaled by their standard derivations.
|
The R/S statistics is known to follow the following relationship asymptotically.
\[\begin{align} (R/S)_{L_s} \sim c \times {L_s}^H \end{align}\] or \[\begin{align} E[(R/S)_{L_s}] = c \times {L_s}^H \end{align}\]
where \(c\) is a constant which is independent of \(L_s\) and \(H\) is called the Hurst exponent. \(H\) can be estimated by using a simple linear regression over a sample of increasing time horizons since \(L_s\) is a length of sub series.
\[\begin{align} \log (R/S)_{L_s} = \log c + H \log L_s \end{align}\]
\(L_s\) : a length of sub series
As can be seen the above equation, the number of data points of the regression is dependent on the length of sub series (\(L_s\)), which is not fixed but variable. It seems, however, that there is no absolute rule to choose which set of lengths of sub series. Instead, most research seems to use the following dyadic approach.
The above procedure is repeated for different values of \(L_s\), which is typically selected as \(L_s = L, \frac{L}{2^1}, \frac{L}{2^2}, \frac{L}{2^3}, ..., \frac{L}{2^k}(\approx 2^3=8) )\). The minimum length of eight is typically chosen for the length of the smallest sub series but it is not a must.
R code
The following R code estimates the Hurst exponent of of S&P 500 index returns for the sample period from 1/1/2000 to 8/1/2022.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | #========================================================# # Quantitative Financial Econometrics & Derivatives # ML/DL using R, Python, Tensorflow by Sang-Heon Lee # # https://shleeai.blogspot.com #--------------------------------------------------------# # Hurst Exponent #========================================================# graphics.off(); rm(list = ls()) library(pracma) # hurstexp library(quantmod) # getSymbols library(xts) #------------------------------------------------ # read S&P 500 index and calculate daily returns #------------------------------------------------ sdate <- as.Date("1999-12-31") edate <- as.Date("2022-08-01") getSymbols("^GSPC", from=sdate, to=edate) price <- GSPC[,6] return <- dailyReturn(price) df.stock <- cbind(time=time(price), as.data.frame(price), as.data.frame(return)) rownames(df.stock) <- NULL colnames(df.stock) <- c("time", "SPC.price", "SPC.return") #================================================= # calculation of Hurst exponent #================================================= # # time (t) : [1,2,............................,L] # range s : 1 2 Ns # time(i,s) : [1,..,L/Ns][1,..,L/Ns]...[1,..,L/Ns] # # sub series # number of sub series = s # length of sub series = L/Ns = Ls # # For each value of Ls, R/S is calculated # # When (R/S)_Ls and Ls pairs are obtained, # linear regression is estimated on log of that pairs. # Hurst exponent is the slope coefficient. # #================================================= #------------------------------------------------ # calculation of Rescaled Ranges #------------------------------------------------ # delete first zero x <- df.stock$SPC.return[-1] L <- length(x) # minimum length of sub series = around 8 vLs <- round(L/2^(0:9)) # length of sub series # output container df.rs <- data.frame(RS=NA, Ls = vLs) # each the length of sub series (Ls) for(k in 1:length(vLs)) { # select length of sub series, Ls Ls <- vLs[k] # number of returns or length of sub series Ns <- round(L/Ls,0) # rescaled range container rs <- rep(NA, Ns) # each range (sub series) for(i in 1:Ns) { # ith sub series x_is <- x[(Ls*(i-1)+1):(Ls*i)] # when i == Ns, some NA will occur. x_is <- x_is[!is.na(x_is)] # mean, stdev, deviation, cumulative sum ms <- mean(x_is) ss <- sd(x_is) ds <- x_is - ms zs <- cumsum(ds) # rescaled range rs[i] <- (max(zs)-min(zs))/ss } # save Ls and average of RS pairs df.rs[k,] <- c(mean(rs), Ls) } df.rs #------------------------------------------------ # estimation of Hurst exponent #------------------------------------------------ log.fit <- lm(log(RS)~log(Ls), data=df.rs) summary(log.fit) log.fit$coefficients[2] #------------------------------------------------ # Using R package simply # see Empirical Hurst exponent #------------------------------------------------ hurstexp(x) | cs |
The following estimation results show that the Hurst exponent of S&P 500 index returns is estimated as 0.5486468 and indicates a persistent behavior or long memory for the sample period. It is, however, not far from 0.5. Of course, this result can be changed a little with different sample periods. Therefore some more experiments are necessary for a robustness check.
Our estimate is similar to that of hurstexp() function in pracma R package. I don't know exactly what the differences between 5 methods in hurstexp() function are. It is likely that Empirical Hurst exponent (0.535896) is estimated by using a similar estimation procedure of ours. I think that hurstexp() function can use more detailed approach to divide or split a time series than ours.
Concluding Remarks
This post explains how to estimate the Hurst exponent which is usually used when implementing pairs trading strategy. We can find that there is some subtle manipulation of the length of sub series. For this reason, hurstexp() function in pracma R package is recommended to use.
Reference
Hurst, H. E. (1951). The long-term storage capacity of reservoirs: an experimental study Transaction of the American Society of Civil Engineers 116, 770-799.
No comments:
Post a Comment