Neptune and its rings captured by the James Webb space telescope. Photograph: Space Telescope Science Institut/ESA/Webb/AFP/Getty Images
Reverse Transform from Caret's preProcess()
Caret R package provides a very convenient function, preProcess(), which transform a given data to a normalized or standardized one. However, it does not provide the back (or reverse) transformation function.
Transformation
method = "center" or "scale" or c("center", "scale") \[\begin{align} x^{'} = (x - \mu_x)/\sigma_x \end{align}\]
method = "range", rangeBounds = c(a, b) \[\begin{align} x^{'} = (b-a) \times \frac{x - \min{x}}{\max{x} - \min{x}} + a \end{align}\]
These transformations are done by using preProcess() function in caret R package.
1 2 3 4 | preProc <- preProcess(training, method = c("center", "scale")) transformed <- predict(preProc, training) | cs |
Back Transformation
method = "center" or "scale" or c("center", "scale") \[\begin{align} x = x^{'}\times\sigma_x + \mu_x \end{align}\]
method = "range", rangeBounds = c(a, b) \[\begin{align} x = (x^{'} - a) \times \frac{\max{x} - \min{x}}{b-a} + \min{x} \end{align}\]
These back transformations can accomplished by the following R code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | #=========================================================== # back transform using the object from the caret preProcess #=========================================================== back_preProc <- function(preProc, df_trans, digits = 10) { pp <- preProc nc <- ncol(df_trans); nr <- nrow(df_trans) av <- t(replicate(nr, pp$mean)) st <- t(replicate(nr, pp$std)) a <- pp$rangeBounds x_max <- t(replicate(nr, pp$ranges[2,])) x_min <- t(replicate(nr, pp$ranges[1,])) if(sum(!is.na(match(c("center", "scale"), names(pp$method)))) == 2) { df <- df_trans*st + av } else if(sum(!is.na(match("center", names(pp$method)))) == 1) { df <- df_trans + av } else if(sum(!is.na(match("scale", names(pp$method)))) == 1) { df <- df_trans*st } else { df <- (df_trans-a[1])/(a[2]-a[1])*(x_max - x_min) + x_min } return(round(df, digits)) } | cs |
Excercise
An exercise is a range transformation between -1 and 1 with training and test sample data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | #========================================================# # Quantitative Financial Econometrics & Derivatives # ML/DL using R, Python, Tensorflow by Sang-Heon Lee # # https://shleeai.blogspot.com #--------------------------------------------------------# # backtransform of caret::preProcess #========================================================# graphics.off(); rm(list = ls()) library(caret) #----------------------------------------- # sample data #----------------------------------------- df <- data.frame(x = -10:10, y = -10:10*0.001) #----------------------------------------- # train/test splitting data #----------------------------------------- # In case of one-column dataframe, sub rows become a vector. # To avoid this and preserve a single-column data frame, # use drop=F option. df_train <- df[1:15,,drop=F] df_test <- df[16:21,,drop=F] #----------------------------------------- # create transform funtion #----------------------------------------- preProc <- preProcess(df_train, method = "range", rangeBounds = c(-1, 1)) #===================================================== # transform #===================================================== df_train_trans <- predict(preProc, df_train) df_test_trans <- predict(preProc, df_test) #===================================================== # back transform of train data #===================================================== df_train_back <- back_preProc(preProc, df_train_trans) df_test_back <- back_preProc(preProc, df_test_trans) #----------------------------------------- # print comparisons of returns #----------------------------------------- temp <- cbind(df_train, df_train_trans, df_train_back) print("========= Train Data =========") colnames(temp) <- c( paste0("raw_",colnames(df_train)), paste0("trans_",colnames(df_train_trans)), paste0("back_",colnames(df_back))) print(temp) print("========= Test Data =========") temp <- cbind(df_test, df_test_trans, df_test_back) colnames(temp) <- c( paste0("raw_",colnames(df_test)), paste0("trans_",colnames(df_test_trans)), paste0("back_",colnames(df_back))) print(temp) | cs |
Comparisons of the original, transformed, and back transformed data delivers the expected results.
The upper and lower bounds of the transfomed test data is not 1 and -1 since the raw data has a trend. To show a distinct result, I use a trending sample data.
No comments:
Post a Comment