lambda.min, lambda.1se and Cross Validation in Lasso : Continuous Response

This post presents a R code for a k-fold cross validation of Lasso in the case of a gaussian regression (continuous Y). This work easily can be done by using a mean squared error. Except for this performance measure, the remianing procedures are the same as the ones utilized in the case of the binomial response.



Cross Validation in Lasso : Gaussian Regression



We have implemented a R code for the K-fold cross validation of lasso model with the binomial response in the previous post below.


The main output of this post is the following lasso cross validation figure for the case of a continuous Y variable. (top : cv.glmnet(), bottom : our result).

The difference between the previous (categorical Y) and this (continuous Y) post is the performance measure. The former uses a misclassification rate (mcr) from a confusion matrix but the latter a mean squared error (mse). We have only to modify this performance measure in the previous R code since the same logic is applied except for the performance measure,

In fact, we are familiar with MSE because the linear regression model uses this measure in the textbook level. Since it is easier than the case of binomial response. let's turn to the R code for this modification directly.

lambda.min, lambda.1se and Cross Validation in Lasso : Gaussian Regression

Cross Validation of Lasso with continuous Y variable


In the following R code, we use a built-in example data (QuickStartExample) for simplicity. In particular, we set arguments family = "gaussian" and type.measure = "mse" for a continuous dependent variable.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
#========================================================#
# Quantitative Financial Econometrics & Derivatives 
# ML/DL using R, Python, Tensorflow by Sang-Heon Lee 
#
# https://shleeai.blogspot.com
#--------------------------------------------------------#
# Cross Validation of Lasso : Gaussian Regression
#========================================================#
 
library(glmnet) 
 
graphics.off()  # clear all graphs
rm(list = ls()) # remove all files from your workspace
 
set.seed(1234)
 
#============================================
# data : x and y
#============================================
data(QuickStartExample) # built-in data
nfolds = 5 # number of folds
 
#============================================
# cross validation by using cv.glmnet
#============================================
cvfit <- cv.glmnet(
    x, y, family = "gaussian"
    type.measure = "mse" ,
    nfolds = nfolds,
    keep = TRUE  # returns foldid 
)
 
# two lambda from cv.glmnet
cvfit$lambda.min; cvfit$lambda.1se
x11(); plot(cvfit)
 
#============================================
# cross validation by hand
#============================================
# get a vector of fold id used in cv.glmnet
# to replicate the same result.
# Therefore, this is subject to the change
foldid <- cvfit$foldid # from glmnet
 
# candidate lambda range
fit      <- glmnet(x, y, family = "gaussian")
v.lambda <- fit$lambda
nla      <- length(v.lambda)
    
m.mse <- matrix(0, nrow = nfolds, ncol=nla)
 
#-------------------------------
# iteration over all folds
#-------------------------------
for (i in 1:nfolds) {
    # training   fold : tr
    # validation fold : va
    
    ifd <- which(foldid==i) # i-th fold
    tr.x <- x[-ifd,]; tr.y <- y[-ifd]
    va.x <- x[ifd,];  va.y <- y[ifd]
    
    # estimation using training fold
    fit <- glmnet(tr.x, tr.y, family = "gaussian"
                  lambda = v.lambda)
    # prediction on validation fold
    prd <- predict(fit, newx = va.x, type = "response")
        
    # mean squared error for each lambda
    for(c in 1:nla) {
      m.mse[i,c] <- mean((prd[,c]-va.y)^2)
    }
}
# average mse
v.mse <- colMeans(m.mse)
# save manual cross validation output
cv.out <- data.frame(lambda = v.lambda, 
    log_lambda = log(v.lambda), mse = v.mse)
    
#-------------------------------
# lambda.min
#-------------------------------
no_lambda_min <- which.min(cv.out$mse)
cv.out$lambda[no_lambda_min]
 
#-------------------------------
# lambda.1se
#-------------------------------
# standard error of mse
v.mse_se <- apply(m.mse,2,sd)/sqrt(nfolds)
# se of min lambda
mse_se_la_min <- v.mse_se[no_lambda_min]
# lambda.1se
max(cv.out$lambda[
    cv.out$mse < min(cv.out$mse) + mse_se_la_min])
 
#-------------------------------
# graph for cross validation
#-------------------------------
x11(); matplot(x = cv.out$log_lambda, 
    y=cbind(cv.out$mse, cv.out$mse+v.mse_se,
                        cv.out$mse-v.mse_se), 
    lty = "solid", col = c("blue","red","green"),
    type=c("p","l","l"), pch = 16, lwd = 3)
    
cs


Running the above R code results in the next two \(\lambda\)s of two approaches (cv.glmnet() and our implementation). Except for the treatment of a mean squared error, calculation of lambda.min and lambda.1se is the same as that of the case of binomial response. Two figures for cross validation are omitted because we have already seen them at the beginning of this blog.


1
2
3
4
5
6
7
8
9
10
11
12
13
> #-------------------------------------
> # from cv.glmnet()
> # cvfit$lambda.min; cvfit$lambda.1se
> #-------------------------------------
lambda.min : 0.08307327
lambda.1se : 0.1451729
 
> #-------------------------------------
> # from our implementation
> #-------------------------------------
lambda.min : 0.08307327
lambda.1se : 0.1451729
 
cs



Concluding Remarks


In this post, we can easily implement R code for a lasso cross validation with continuous dependent variable by a small modification of the binomial response case. \(\blacksquare\)


No comments:

Post a Comment