Understanding Haar Wavelet Transform
Roughly speaking, the wavelet is a way of representing a signal or time series with a coarse overall trend and detailed fluctuations. This is done by a sparse data trend and detailed coefficients. When we identify frequencies in localised in time from a signal or time series, we need to represent the signal in terms of basis functions which have compact supports, in other words, which have a value in a small window and zero elsewhere.
In this post, without complicated mathematics such as sine, cosine, vector space or inner product (which will be explained in the next posts), we try to understand some notations and characteristics of wavelet through simple basic example : the Haar wavelet.
The galaxy NGC 1961 unfurls its gorgeous spiral arms in this newly released image from NASA’s Hubble Space Telescope.
Haar Wavelet Basis Function
Applying the Haar wavelet to this time series means that the average of adjacent two values (average or scaling coefficients) and the difference of adjacent two values from that average (detail coefficients) are calculated from a high resolution (like data) to low resolution (like trend) successively. In particular, this process is applied not to detailed coefficients but scaling coefficients. This make a trend of a time series more and more smoother with the corresponding successive detailed coefficients yielded.
Why do we perform successive averages? The purpose of data analysis is to summarize a data into to reduced information such as average. The successive calculation of averages is a way to find a trend of data. In this regard, the wavelet transform is not an exception. Why do we calculate differences (detailed coefficients) for each averaging steps? This is due to the need to recover from a trend to an original or previous higher resolution data. By adding the corresponding differences to each average values, data with more resolutions is obtained.
Simply put, the resolution means a the number of identifiable points in time. When the number of daily data is 4, the resolution is 4. When there is an average of these 4 values, this resolution is 1.
Haar Wavelet Calculation
Assume that there is a time series which consist of [9, 7, 3, 5]. Let's calculate 2-resolution Harr wavelet transform from the 4-resolution data.
\[\begin{align} &\text{Averages (2-resolution)} \\ &8 = (9 + 7)/2 \\ &4 = (3 + 5)/2 \\ \\ &\text{Differences (2-resolution)} \\ &1 = 9 - 8 \\ -&1 = 3 - 4 \end{align}\]
Since 8 and 4 are averages of the subset of data, it is a trend of data. To recover the data, It is sufficient to add the corresponding differences to each averages as follows.
\[\begin{align} &\text{Recover Data} \\ &\text{(2-resolution → 4-resolution)} \\ &9 = 8 + 1 \\ &7 = 8 - 1 \\ &3 = 4 + (-1) \\ &5 = 4 -(-1) \end{align}\]
Now we can make more smoothed trend by applying the Haar wavelet transform to the two averages as follows. \[\begin{align} &\text{Averages (1-resolution)} \\ &6 = (8 + 4)/2 \\ \\ &\text{Differences (1-resolution)} \\ &2 = 8 - 6 \end{align}\]
Although 6 are averages of the subset of the subset of data, it is the average of a original data ((9+7+3+5)/4 = 24/4 = 6) since its resolution is 1 and it is from the successive averages. It is a global trend of data. To recover from these average and differences, It is also sufficient to add the corresponding differences to each averages as follows.
\[\begin{align} &\text{Recover Data} \\ &\text{(1-resolution → 2-resolution)} \\ &\color{blue}{8} = 6 + 2 \\ &\color{red}{4} = 6 - 2 \\ \\ &\text{Recover Data} \\ &\text{(2-resolution → 4-resolution)} \\ &9 = \color{blue}{8} + 1 \\ &7 = \color{blue}{8} - 1 \\ &3 = \color{red}{4} + (-1) \\ &5 = \color{red}{4} -(-1) \end{align}\]
Finally, the output of the first time application of the Haar wavelet transform to [9,7,3,5] is [8,4,1,-1] and the output of the second times application of the Haar wavelet transform to [9,7,3,5] is [6,2,1,-1]. Therefore to make a distinction between these results, we need to specify the resolution explicitly.
The resolution is assumed to be expressed as a power of 2 (\(2^j\)) relatively from the original data.
j (=1,2,...,J) means the level of resolution. To be more specific, it is data when j=0 (the highest resolution). j=1 (half the highest resolution) denotes the first application of a wavelet transformation and j=2 the second one and so on. This leads to the identification such that [9,7,3,5] is when j=0, [8,4,1,-1] when j=1, and [6,2,1,-1] when j=2.
R code
We can calculate the Haar wavelet easily by using R package. Of course, there are several useful wavelet R packages. Among them, we use waveslim R package.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | #========================================================# # Quantitative Financial Econometrics & Derivatives # ML/DL using R, Python, Tensorflow by Sang-Heon Lee # # https://shleeai.blogspot.com #--------------------------------------------------------# # Haar DWT example #========================================================# graphics.off(); rm(list = ls()) library(waveslim) # resolution as the power of 2 J = 2 # data data <- c(9, 7, 3, 5) # Haar wavelet transform data.haar <- mra(data, "haar", J, "dwt") nt <- length(names(data.haar)) # plot x11(width = 18/3.5, height = 6) par(mfcol=c(nt+1,1), mar=c(5-2,4,4-2,2)) plot(data, ylab="Data", main="Haar Wavelet Transform", lwd = 5, xaxt='n', pch=10, col="red", ylim = c(min(data)-1, max(data)+1)) axis(side=1, at=1:4, labels=1:4) for(i in 1:nt) { if (i==nt) { plot(data.haar[[i]], ylab=names(data.haar)[i], lwd = 5, pch=10, col="black", xaxt='n') } else { plot(data.haar[[i]], ylab=names(data.haar)[i], lwd = 5, pch=10, col="blue", xaxt='n', ylim = c(min(data.haar[[i]])-1, max(data.haar[[i]])+1)) } axis(side=1, at=1:4, labels=1:4) } | cs |
In the following figure, As J=2, S2 is one scaled coefficient (trend or average) and D1-D2 denote detailed coefficient (difference) at level j=1,2 respectively.
Practical Example :
For a practical example, let's apply the Haar wavelet to the S&P 500 stock index with J=8.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | #------------------------------------------------ # read S&P 500 index and calculate daily returns #------------------------------------------------ library(waveslim) library(quantmod) # getSymbols library(xts) sdate <- as.Date("2006-04-22") edate <- as.Date("2022-08-01") getSymbols("^GSPC", from=sdate, to=edate) price <- as.data.frame(GSPC[,6]) nobs <- nrow(price) # data date <- as.Date(rownames(price)) data <- price$GSPC.Adjusted # resolution as the power of 2 J = 8 # Haar wavelet transform data.haar <- mra(data, "haar", J, "dwt") nt <- length(names(data.haar)) # plot x11(width = 18/3.5, height = 6) par(mfcol=c((nt+1)/2,2), mar=c(5-2,4,4-2,2)) matplot(date, data, ylab="Data", main="8-level Decomposition of S&P 500 Index", lwd = 1, col="red", lty = 1, type = "l", ylim = c(min(data)-1, max(data)+1)) for(i in 1:nt) { if (i==nt) { plot(date, data.haar[[i]], ylab=names(data.haar)[i], lwd = 1, col="black", lty = 1, type = "l") } else { plot(date, data.haar[[i]], ylab=names(data.haar)[i], lwd = 1, col="blue", lty = 1, type = "l", ylim = c(min(data.haar[[i]])-1, max(data.haar[[i]])+1)) } } | cs |
With J=8, the Haar wavelet transform produces one scaled coefficient or trend (S8) and a set of detailed coefficients (D1~D8).
It is worth noting that the number of data is 4096, which is \(2^{12}\) since DWT takes as an input time series with its length being the power of 2. This requirement is due to the operations on pairs. This restriction is relaxed by using MODWT (Maximal Overlapping Discrete Wavelet Transform), which will be covered later.
Concluding Remarks
This post explains the discrete Haar wavelet transform and illustrates an example of S&P 500 index. We can find that the discrete wavelet transform can represent a signal or time series with a coarse overall trend and detailed fluctuations.
No comments:
Post a Comment