This post gives a small R code for the back transformation of the caret's preProcess() function, which is not implemented in caret R package yet. This is useful , for example, when we forecast stock prices using deep learning techniques such as the LSTM which requires normalized input data but we want to back transform it to the original scale.

Neptune and its rings captured by the James Webb space telescope. Photograph: Space Telescope Science Institut/ESA/Webb/AFP/Getty Images

Reverse Transform from Caret's preProcess()

Caret R package provides a very convenient function, preProcess(), which transform a given data to a normalized or standardized one. However, it does not provide the back (or reverse) transformation function.

Transformation

method = "center" or "scale" or c("center", "scale") \[\begin{align} x^{'} = (x - \mu_x)/\sigma_x \end{align}\]
method = "range", rangeBounds = c(a, b) \[\begin{align} x^{'} = (b-a) \times \frac{x - \min{x}}{\max{x} - \min{x}} + a \end{align}\]
These transformations are done by using preProcess() function in caret R package.

1
2
3
4
preProc <- preProcess(training, 
                      method = c("center", "scale"))
transformed <- predict(preProc, training)
 
Colored by Color Scripter
cs

Back Transformation

method = "center" or "scale" or c("center", "scale") \[\begin{align} x = x^{'}\times\sigma_x + \mu_x \end{align}\]
method = "range", rangeBounds = c(a, b) \[\begin{align} x = (x^{'} - a) \times \frac{\max{x} - \min{x}}{b-a} + \min{x} \end{align}\]
These back transformations can accomplished by the following R code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#===========================================================
# back transform using the object from the caret preProcess
#===========================================================
 
back_preProc <- function(preProc, df_trans, digits = 10) {
    
    pp <- preProc
    nc <- ncol(df_trans); nr <- nrow(df_trans)
    av <- t(replicate(nr, pp$mean))
    st <- t(replicate(nr, pp$std))
    a  <- pp$rangeBounds
    x_max <- t(replicate(nr, pp$ranges[2,]))
    x_min <- t(replicate(nr, pp$ranges[1,]))
    
    if(sum(!is.na(match(c("center", "scale"), 
                        names(pp$method)))) == 2) {
        df <- df_trans*st + av
    } else if(sum(!is.na(match("center", 
                               names(pp$method)))) == 1) {
        df <- df_trans + av
    } else if(sum(!is.na(match("scale", 
                               names(pp$method)))) == 1) {
        df <- df_trans*st
    } else {
        df <- (df_trans-a[1])/(a[2]-a[1])*(x_max - x_min) + x_min
    }
    
    return(round(df, digits))
}
 
Colored by Color Scripter
cs

Excercise

An exercise is a range transformation between -1 and 1 with training and test sample data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#========================================================#
# Quantitative Financial Econometrics & Derivatives 
# ML/DL using R, Python, Tensorflow by Sang-Heon Lee 
#
# https://shleeai.blogspot.com
#--------------------------------------------------------#
# backtransform of caret::preProcess
#========================================================#
 
graphics.off(); rm(list = ls())
 
library(caret)
 
#-----------------------------------------
# sample data
#-----------------------------------------
df <- data.frame(x = -10:10, y = -10:10*0.001)
 
#-----------------------------------------
# train/test splitting data
#-----------------------------------------
# In case of one-column dataframe, sub rows become a vector. 
# To avoid this and preserve a single-column data frame, 
# use drop=F option. 
df_train <- df[1:15,,drop=F]
df_test <- df[16:21,,drop=F]
 
 
#-----------------------------------------
# create transform funtion
#-----------------------------------------
preProc <- preProcess(df_train, method = "range", 
                      rangeBounds = c(-1, 1))
 
#=====================================================
# transform
#=====================================================
df_train_trans <- predict(preProc, df_train)
df_test_trans  <- predict(preProc, df_test)
 
    
#=====================================================
# back transform of train data
#=====================================================
df_train_back <- back_preProc(preProc, df_train_trans)
df_test_back  <- back_preProc(preProc, df_test_trans)
 
 
#-----------------------------------------
# print comparisons of returns
#-----------------------------------------
temp <- cbind(df_train, df_train_trans, df_train_back)
 
print("========= Train Data =========")
colnames(temp) <- c(
    paste0("raw_",colnames(df_train)),
    paste0("trans_",colnames(df_train_trans)),
    paste0("back_",colnames(df_back)))
print(temp)
    
print("========= Test Data  =========")
temp <- cbind(df_test, df_test_trans, df_test_back)
colnames(temp) <- c(
    paste0("raw_",colnames(df_test)),
    paste0("trans_",colnames(df_test_trans)),
    paste0("back_",colnames(df_back)))
print(temp)
 
Colored by Color Scripter
cs

Comparisons of the original, transformed, and back transformed data delivers the expected results.

The upper and lower bounds of the transfomed test data is not 1 and -1 since the raw data has a trend. To show a distinct result, I use a trending sample data.

SHLee AI Financial Model

Pages

R code : Back Transform from Caret's preProcess()

Reverse Transform from Caret's preProcess()

Transformation

Back Transformation

Excercise

No comments:

Post a Comment