Hurst Exponent using R code

This post explains how to estimate the Hurst exponent which indicates characteristics of a time series : mean-reversion, random walk, and trending with long memory using S&P 500 index returns.


Hurst Exponent


Pairs trading literature use the Hurst exponent frequently since it gives an simple and intuitive indicator for the behavior of stock returns. Using S&P 500 returns, let's learn how to estimate it using R code manually and then use R package conveniently.
Hurst Exponent using R code

If a variance of a time series is linear with time, it is considered a random walk since variance increases with time. We can not know the future value of it in this case.

If the variance is decreasing with time, roughly speaking, it reverts to its mean in the future since its variance is smaller in the future.

Finally, if the variance is increasing exponentially or geometrically or faster than time, it go away from its mean with a trend.

We assume that these three kinds of behaviors of a time series is summarized as one scalar \(H\) (Hurst exponent) which takes on 0.5 as a baseline of random walk. We can think of the following equation.

\[\begin{align} var(y_t) \sim t^{2H} \end{align}\]
Hurst exponent always range between 0 and 1 and based on the value of H

  • H < 0.5 : mean-reversion to the long-term mean (anti-persistent)
  • H = 0.5 : random walk
  • H > 0.5 : strong trend and long memory effects (persistent)


Splitting a time series into sub series


Before calculating the Hurst exponent, let's assume that whole points in time can be expressed by using the sub divisions or sub time series.

\[\begin{align} & \text{points in time using (whole) time index } t \\ = &[1,2,3,...,L-2, L-1, L] \\ \\ = &[1,2,...,L_s-1,L_s, L_s+1, ... , L-1, L] \\ \\ & \text{points in time using sub time index } i \\ &\text{ within sub series } s \\ = &\underbrace{[1,2,..,L_s=\frac{L}{N_s}]}_{s=1} \underbrace{[1,2,..,L_s=\frac{L}{N_s}]}_{s=2} \\ &... \underbrace{[1,2,..,L_s=\frac{L}{N_s}]}_{s=N_s-1} \underbrace{[1,2,..,L_s=\frac{L}{N_s}]}_{s=N_s} \end{align}\]
\(t\) = (whole) time index

\(s\) = sub series of length \(L_s\)

\(i\) = (sub) time index within each sub series \(s\)


Rescaled Range Analysis


According to Hurst (1951), the rescaled range (R/S) analysis is the range of partial sums of deviations from their means, which is rescaled by their standard derivations.

  1. Split a whole period of length \(L\) into \(N_s\) sub periods. Each sub period \(s(=1,...,N_s)\) has an equal length of \(L_s\) and therefore \(L = L_s \times N_s\)

  2. For each sub period \(s(=1,...,N_s)\), a mean (\(M_s\)) and standard deviation (\(S_s\)) are calculated

  3. A demeaned series (\(D_{i,s}\)) is calculated by subtracting the corresponding sub means (\(M_s\)) from the original time series (\(X_{i,s}\)) \[\begin{align} D_{i,s} = X_{i,s} - M_s \end{align}\]
  4. A cumulative series of \(D_{i,s}\) is calculated for \(i=1,2,...,L_s\) \[\begin{align} Y_{i,s} = \sum_{j=1}^{i} {D_{j,s}} \end{align}\]
  5. Find the range \(R_s\) for \(s=1,2,...,N_s\) \[\begin{align} R_s = &\max(Y_{1,s},...,Y_{L_s,s}) \\ &- \min(Y_{1,s},...,Y_{L_s,s}) \end{align}\]
  6. Calculate the mean of the rescaled range \(\frac{R_s}{S_s}\) for all subseries of length \(L_s\) \[\begin{align} (R/S)_{L_s} = \frac{1}{N_s} \sum_{s=1}^{N_s} \frac{R_s}{S_s} \end{align}\]
  7. Repeat the above procedure by changing \(L_s\)


The R/S statistics is known to follow the following relationship asymptotically.

\[\begin{align} (R/S)_{L_s} \sim c \times {L_s}^H \end{align}\] or \[\begin{align} E[(R/S)_{L_s}] = c \times {L_s}^H \end{align}\]
where \(c\) is a constant which is independent of \(L_s\) and \(H\) is called the Hurst exponent. \(H\) can be estimated by using a simple linear regression over a sample of increasing time horizons since \(L_s\) is a length of sub series.

\[\begin{align} \log (R/S)_{L_s} = \log c + H \log L_s \end{align}\]

\(L_s\) : a length of sub series


As can be seen the above equation, the number of data points of the regression is dependent on the length of sub series (\(L_s\)), which is not fixed but variable. It seems, however, that there is no absolute rule to choose which set of lengths of sub series. Instead, most research seems to use the following dyadic approach.

The above procedure is repeated for different values of \(L_s\), which is typically selected as \(L_s = L, \frac{L}{2^1}, \frac{L}{2^2}, \frac{L}{2^3}, ..., \frac{L}{2^k}(\approx 2^3=8) )\). The minimum length of eight is typically chosen for the length of the smallest sub series but it is not a must.


R code


The following R code estimates the Hurst exponent of of S&P 500 index returns for the sample period from 1/1/2000 to 8/1/2022.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
#========================================================#
# Quantitative Financial Econometrics & Derivatives 
# ML/DL using R, Python, Tensorflow by Sang-Heon Lee 
#
# https://shleeai.blogspot.com
#--------------------------------------------------------#
# Hurst Exponent
#========================================================#
 
graphics.off(); rm(list = ls())
 
library(pracma) # hurstexp
library(quantmod) # getSymbols
library(xts)
 
#------------------------------------------------
# read S&P 500 index and calculate daily returns
#------------------------------------------------
 
sdate <- as.Date("1999-12-31")
edate <- as.Date("2022-08-01")
getSymbols("^GSPC"from=sdate, to=edate)
 
price  <- GSPC[,6]
return <- dailyReturn(price)
df.stock <- cbind(time=time(price), 
                  as.data.frame(price), 
                  as.data.frame(return))
rownames(df.stock) <- NULL
colnames(df.stock) <- c("time""SPC.price""SPC.return")
 
#=================================================
# calculation of Hurst exponent
#=================================================
#
# time (t)  : [1,2,............................,L]
# range s   :      1         2             Ns
# time(i,s) : [1,..,L/Ns][1,..,L/Ns]...[1,..,L/Ns]
#
# sub series
# number of sub series = s
# length of sub series = L/Ns = Ls
#
# For each value of Ls, R/S is calculated
#
# When (R/S)_Ls and Ls pairs are obtained, 
# linear regression is estimated on log of that pairs.
# Hurst exponent is the slope coefficient.
#
#=================================================
 
#------------------------------------------------
# calculation of Rescaled Ranges
#------------------------------------------------
 
# delete first zero
<- df.stock$SPC.return[-1
<- length(x)
 
# minimum length of sub series = around 8
vLs <- round(L/2^(0:9)) # length of sub series
 
# output container
df.rs <- data.frame(RS=NA, Ls = vLs)
 
# each the length of sub series (Ls)
for(k in 1:length(vLs)) {
    
    # select length of sub series, Ls
    Ls <- vLs[k]
    
    # number of returns or length of sub series
    Ns <- round(L/Ls,0)
    
    # rescaled range container
    rs <- rep(NA, Ns)
    
    # each range (sub series)
    for(i in 1:Ns) {
 
        # ith sub series
        x_is <- x[(Ls*(i-1)+1):(Ls*i)] 
        
        # when i == Ns, some NA will occur.
        x_is <- x_is[!is.na(x_is)]
        
        # mean, stdev, deviation, cumulative sum
        ms <- mean(x_is)
        ss <- sd(x_is)   
        ds <- x_is - ms
        zs <- cumsum(ds)
        
        # rescaled range
        rs[i] <- (max(zs)-min(zs))/ss
    }
    
    # save Ls and average of RS pairs
    df.rs[k,] <- c(mean(rs), Ls)
}
 
df.rs
 
#------------------------------------------------
# estimation of Hurst exponent
#------------------------------------------------
 
log.fit <- lm(log(RS)~log(Ls), data=df.rs)
summary(log.fit)
log.fit$coefficients[2]
 
#------------------------------------------------
# Using R package simply
# see Empirical Hurst exponent
#------------------------------------------------
hurstexp(x)
 
cs


The following estimation results show that the Hurst exponent of S&P 500 index returns is estimated as 0.5486468 and indicates a persistent behavior or long memory for the sample period. It is, however, not far from 0.5. Of course, this result can be changed a little with different sample periods. Therefore some more experiments are necessary for a robustness check.

Hurst Exponent using R code


Our estimate is similar to that of hurstexp() function in pracma R package. I don't know exactly what the differences between 5 methods in hurstexp() function are. It is likely that Empirical Hurst exponent (0.535896) is estimated by using a similar estimation procedure of ours. I think that hurstexp() function can use more detailed approach to divide or split a time series than ours.


Concluding Remarks


This post explains how to estimate the Hurst exponent which is usually used when implementing pairs trading strategy. We can find that there is some subtle manipulation of the length of sub series. For this reason, hurstexp() function in pracma R package is recommended to use.


Reference


Hurst, H. E. (1951). The long-term storage capacity of reservoirs: an experimental study Transaction of the American Society of Civil Engineers 116, 770-799.


No comments:

Post a Comment