R code : Back Transform from Caret's preProcess()

This post gives a small R code for the back transformation of the caret's preProcess() function, which is not implemented in caret R package yet. This is useful , for example, when we forecast stock prices using deep learning techniques such as the LSTM which requires normalized input data but we want to back transform it to the original scale.

Neptune and its rings captured by the James Webb space telescope. Photograph: Space Telescope Science Institut/ESA/Webb/AFP/Getty Images


Reverse Transform from Caret's preProcess()



Caret R package provides a very convenient function, preProcess(), which transform a given data to a normalized or standardized one. However, it does not provide the back (or reverse) transformation function.


Transformation


method = "center" or "scale" or c("center", "scale") \[\begin{align} x^{'} = (x - \mu_x)/\sigma_x \end{align}\]
method = "range", rangeBounds = c(a, b) \[\begin{align} x^{'} = (b-a) \times \frac{x - \min{x}}{\max{x} - \min{x}} + a \end{align}\]
These transformations are done by using preProcess() function in caret R package.



1
2
3
4
preProc <- preProcess(training, 
                      method = c("center""scale"))
transformed <- predict(preProc, training)
 
cs



Back Transformation


method = "center" or "scale" or c("center", "scale") \[\begin{align} x = x^{'}\times\sigma_x + \mu_x \end{align}\]
method = "range", rangeBounds = c(a, b) \[\begin{align} x = (x^{'} - a) \times \frac{\max{x} - \min{x}}{b-a} + \min{x} \end{align}\]
These back transformations can accomplished by the following R code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#===========================================================
# back transform using the object from the caret preProcess
#===========================================================
 
back_preProc <- function(preProc, df_trans, digits = 10) {
    
    pp <- preProc
    nc <- ncol(df_trans); nr <- nrow(df_trans)
    av <- t(replicate(nr, pp$mean))
    st <- t(replicate(nr, pp$std))
    a  <- pp$rangeBounds
    x_max <- t(replicate(nr, pp$ranges[2,]))
    x_min <- t(replicate(nr, pp$ranges[1,]))
    
    if(sum(!is.na(match(c("center""scale"), 
                        names(pp$method)))) == 2) {
        df <- df_trans*st + av
    } else if(sum(!is.na(match("center"
                               names(pp$method)))) == 1) {
        df <- df_trans + av
    } else if(sum(!is.na(match("scale"
                               names(pp$method)))) == 1) {
        df <- df_trans*st
    } else {
        df <- (df_trans-a[1])/(a[2]-a[1])*(x_max - x_min) + x_min
    }
    
    return(round(df, digits))
}
 
cs



Excercise


An exercise is a range transformation between -1 and 1 with training and test sample data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#========================================================#
# Quantitative ALM, Financial Econometrics & Derivatives 
# ML/DL using R, Python, Tensorflow by Sang-Heon Lee 
#
# https://shleeai.blogspot.com
#--------------------------------------------------------#
# backtransform of caret::preProcess
#========================================================#
 
graphics.off(); rm(list = ls())
 
library(caret)
 
#-----------------------------------------
# sample data
#-----------------------------------------
df <- data.frame(x = -10:10, y = -10:10*0.001)
 
#-----------------------------------------
# train/test splitting data
#-----------------------------------------
# In case of one-column dataframe, sub rows become a vector. 
# To avoid this and preserve a single-column data frame, 
# use drop=F option. 
df_train <- df[1:15,,drop=F]
df_test <- df[16:21,,drop=F]
 
 
#-----------------------------------------
# create transform funtion
#-----------------------------------------
preProc <- preProcess(df_train, method = "range"
                      rangeBounds = c(-11))
 
#=====================================================
# transform
#=====================================================
df_train_trans <- predict(preProc, df_train)
df_test_trans  <- predict(preProc, df_test)
 
    
#=====================================================
# back transform of train data
#=====================================================
df_train_back <- back_preProc(preProc, df_train_trans)
df_test_back  <- back_preProc(preProc, df_test_trans)
 
 
#-----------------------------------------
# print comparisons of returns
#-----------------------------------------
temp <- cbind(df_train, df_train_trans, df_train_back)
 
print("========= Train Data =========")
colnames(temp) <- c(
    paste0("raw_",colnames(df_train)),
    paste0("trans_",colnames(df_train_trans)),
    paste0("back_",colnames(df_back)))
print(temp)
    
print("========= Test Data  =========")
temp <- cbind(df_test, df_test_trans, df_test_back)
colnames(temp) <- c(
    paste0("raw_",colnames(df_test)),
    paste0("trans_",colnames(df_test_trans)),
    paste0("back_",colnames(df_back)))
print(temp)
 
cs


Comparisons of the original, transformed, and back transformed data delivers the expected results.

The upper and lower bounds of the transfomed test data is not 1 and -1 since the raw data has a trend. To show a distinct result, I use a trending sample data.


No comments:

Post a Comment