This post shows how to use Diebold-Mariano test by using multDM R package.

Diebold-Mariano test

The Diebold-Mariano (DM) test is the most commonly used tool to evaluate the significance of differences in forecasting accuracy. It is an asymptotic z-test of the hypothesis that the mean of the loss differential series:
\[\begin{align} d_t = L(\epsilon_{1t}) - L(\epsilon_{2t}) \end{align}\] where the loss (L) corresponds to various measures such as the absolute loss, squared loss, and more

The null hypothesis is that the two forecasts have the same accuracy. The alternative hypothesis is that the two forecasts have different levels of accuracy
\[\begin{align} &H_0 : E(d_t) = 0 \\ &H_1 : E(d_t) \ne 0 \end{align}\] Under the null hypothesis, the DM test statistic follows an asymptotic \(N(0,1)\) distribution. The \(H_0\) of no difference will be rejected if the DM test statistic falls outside the range of \(−z_{\alpha/2}\) to \(z_{\alpha/2}\), that is if \(|DM| > z_{\alpha/2}\).

R code

For example, let's consider the 3-month-ahead forecasts of 3-year yields from two models. Two forecasts appear similar, but some discrepancies emerge in the early part of the sample period.

The forecasts of the NS2 model seem to marginally outperform those of the DNS model. To substantiate this claim, we need to determine its statistical significance. For this purpose, we can employ the Diebold-Mariano test.

I use DM.test() function of multDM R package and also use rmse() function of Metrics R package. You can find more specific information about the input arguments in the code as comments.

#========================================================#
# Quantitative Financial Econometrics & Derivatives 
# ML/DL using R, Python, Tensorflow by Sang-Heon Lee 
#
# https://shleeai.blogspot.com
#-----------------------------------------------------------
# RMSE and DM test
#========================================================#
 
library(Metrics) # rmse
library(multDM)  # DM.test
 
graphics.off(); rm(list = ls())
 
#-----------------------------------------------------------
# 3-year yields and its 3-month-ahead forecasts
#-----------------------------------------------------------
v_true <- c(
    0.3785, 0.3215, 0.5347, 0.7144, 0.6514, 0.8292, 0.6743, 
    0.6224, 0.6064, 0.8306, 0.7256, 0.7286, 0.9157, 0.8904, 
    0.8072, 0.8938, 1.0534, 0.9926, 1.1203, 0.9872, 0.9110, 
    1.1185, 0.7884, 1.0344, 0.9060, 0.9406, 0.9746, 1.0441, 
    1.0218, 1.0907, 0.9378, 1.0687, 1.2408, 1.3501, 0.9984, 
    0.9470, 0.9011, 0.9579, 1.0672, 0.7380, 0.7910, 0.9682, 
    0.9247, 1.0455, 1.4340, 1.5014, 1.4965, 1.5201, 1.5407, 
    1.4813, 1.4557, 1.5719, 1.5258, 1.4565, 1.6455, 1.7537, 
    1.9286, 2.0033, 2.3127, 2.4284, 2.3985, 2.6337, 2.5348, 
    2.6134, 2.7537, 2.6817, 2.8666, 2.9158, 2.8218, 2.4624, 
    2.4203, 2.5041, 2.2260, 2.2320, 1.8940, 1.7164, 1.8397, 
    1.4335, 1.5867, 1.5199, 1.6291, 1.6159, 1.2989, 0.8547, 
    0.3230, 0.2712, 0.2095, 0.1916, 0.1365, 0.1686, 0.1665, 
    0.2153, 0.2050, 0.1780, 0.2025, 0.3433, 0.3967, 0.3779, 
    0.3314, 0.4694, 0.3631, 0.4161, 0.5466, 0.7801, 0.8247, 
    0.9860, 1.4037, 1.6326, 2.4844, 2.8933, 2.7241, 2.9872, 
    2.8356, 3.4367, 4.2307, 4.4006, 4.1147, 4.2295)
 
v_DNS <- c(
    0.6770, 0.7380, 0.6132, 0.7312, 0.5748, 0.7575, 1.1352, 
    1.0734, 1.1282, 1.0689, 0.9695, 1.0026, 1.2458, 1.0561, 
    1.1092, 1.1870, 1.3845, 1.1116, 1.1648, 1.3560, 1.3500, 
    1.2682, 1.3259, 1.0134, 1.3560, 0.8240, 1.0790, 1.0726, 
    1.1802, 1.1893, 1.2133, 1.3103, 1.3977, 1.2397, 1.2725, 
    1.2601, 1.5717, 1.1556, 1.2177, 1.1761, 1.3321, 1.2765, 
    0.8769, 0.9133, 1.2693, 1.0486, 1.2701, 1.5750, 1.7047, 
    1.7023, 1.5662, 1.7258, 1.5856, 1.6701, 1.7045, 1.7832, 
    1.6120, 1.8260, 1.9067, 1.9714, 2.1330, 2.3188, 2.4173, 
    2.4402, 2.7014, 2.5790, 2.6052, 2.8355, 2.6518, 2.7979, 
    2.7974, 2.7862, 2.6074, 2.5824, 2.6511, 2.2867, 2.3288, 
    2.1271, 1.8461, 1.9618, 1.6491, 1.7387, 1.5485, 1.6753, 
    1.7125, 1.4224, 1.0391, 0.5259, 0.3800, 0.3175, 0.2468, 
    0.2944, 0.3266, 0.3446, 0.5191, 0.3892, 0.2278, 0.3722, 
    0.5393, 0.7271, 0.6481, 0.5966, 0.6929, 0.5764, 0.5078, 
    0.6837, 0.8924, 0.8649, 0.9408, 1.3736, 1.4585, 2.1477, 
    2.6380, 2.5471, 2.9316, 2.7443, 3.3847, 3.9966)
 
v_NS2 <- c(
    0.3827, 0.4559, 0.3815, 0.3994, 0.3350, 0.6082, 0.7184, 
    0.6658, 0.8100, 0.7102, 0.6549, 0.6234, 0.7941, 0.6910, 
    0.6848, 0.7810, 0.7588, 0.6997, 0.7630, 0.8079, 0.7405, 
    0.8760, 0.8203, 0.7192, 0.9003, 0.5767, 0.8719, 0.7645, 
    0.9866, 1.0692, 1.0624, 1.1129, 1.2572, 1.3188, 1.4228, 
    1.5660, 1.6499, 1.1488, 1.2208, 1.1902, 1.2233, 1.3102, 
    0.9593, 0.9769, 1.1270, 1.1177, 1.2852, 1.6721, 1.7223, 
    1.6823, 1.6731, 1.7723, 1.7262, 1.4828, 1.5965, 1.5551, 
    1.5299, 1.8299, 1.9085, 2.0343, 2.0980, 2.2974, 2.4676, 
    2.4417, 2.6011, 2.5549, 2.5633, 2.6989, 2.6527, 2.8365, 
    2.9090, 2.8723, 2.4777, 2.4154, 2.5153, 2.2231, 2.2399, 
    1.9187, 1.7019, 1.8043, 1.4124, 1.5601, 1.4979, 1.6286, 
    1.6356, 1.2806, 0.8406, 0.3371, 0.3026, 0.1891, 0.1703, 
    0.1167, 0.1327, 0.1374, 0.2050, 0.1995, 0.1532, 0.1787, 
    0.3039, 0.3416, 0.3434, 0.3010, 0.4621, 0.3734, 0.4177, 
    0.5057, 0.7163, 0.7670, 1.0366, 1.3943, 1.5748, 2.5524, 
    2.9818, 2.6853, 3.5169, 2.4736, 3.5101, 4.2882)
 
x11(width=7, height=4.5)
matplot(cbind(v_true, v_DNS, v_NS2), type="l", col=2:4,
        lty=1, lwd=4, ylab = "Yields (Percent)")
legend("topleft", pch=rep(16,3), cex = 0.9, col=2:4, 
       legend=c("True", "DNS", "NS2"))
 
#-----------------------------------------------------------
# RMSE and DM Test
#-----------------------------------------------------------
 
#===========================================================
# DM test arguments
#===========================================================
# f1 : vector of the first forecast
# f2 : vector of the second forecast
# y     : vector of the real values of the modeled time-series
#
#-----------------------------------------------------------
# loss.type    : method to compute the loss function, 
#             if not specified loss.type="SE" is use
#-----------------------------------------------------------
# loss.type="SE"   : squared errors, 
# loss.type="AE"   : absolute errors, 
# loss.type="SPE"  : squared proportional error 
#                   (useful if errors are heteroskedastic), 
# loss.type="ASE"  : absolute scaled error, 
#
# if loss.type will be specified as some numeric, then 
# the function of type 
#
# exp(loss.type*errors)-1-loss.type*errors 
#
# will be used (useful when it is more costly 
# to underpredict y than to overpredict), 
#
# if not specified loss.type="SE" is use
#
#-----------------------------------------------------------
# h    : the forecast h-steps ahead are evaluated, 
#     if not specified h=1 is used
#-----------------------------------------------------------
# c    : if Harvey-Leybourne-Newbold correction 
#     for small samples should be used, 
#     if not specified c=FALSE is used
#-----------------------------------------------------------
# H1 : alternative hypothesis, 
#      if not specified H1="same" is taken
#-----------------------------------------------------------
# H1="same" for ”both forecasts have different accuracy”, 
# H1="more" for ”1st fcst is more accurate than 2nd fcst”, 
# H1="less" for ”1st fcst is less accurate than 2nd fcst”, 
#-----------------------------------------------------------
 
# input
v_true <- v_true; v_pred1 <- v_DNS; v_pred2 <- v_NS2
 
# RMSE in bps
rmse1 <- rmse(v_true, v_pred1)*100
rmse2 <- rmse(v_true, v_pred2)*100
 
# DM test
dm <- DM.test(f1 = v_pred1, f2 = v_pred2, y = v_true,
              loss="SE", h=3, c=FALSE, H1="same")
 
# output
cbind(RMSE_DNS = round(rmse1,2), 
      RMSE_NS2 = round(rmse2,2), 
      DM_stat  = round(dm$statistic,4),
      DM_pval  = round(dm$p.value,4))
 
Colored by Color Scripter
cs

From the following DM test result, we can conclude tha according to the Diebold-Mariano test of equal predictive ability, the null hypothesis is rejected in favor of forecasts of the NS2 model. In other words, The forecasts of the NS2 model are more accurate than the forecasts of the DNS model.

However, we cannot conclude that the NS2 model itself is more accurate than the DNS model, as the authors stress that the DM test was not intended for compairng models.

-----------------------------------------------------
              RMSE_DNS  RMSE_NS2  DM_stat  DM_pval
-----------------------------------------------------
   statistic   47.82     44.17     2.1027   0.0355
-----------------------------------------------------
 
Colored by Color Scripter
cs

Reference

Diebold, F.X. and R.S. Mariano (1995) Comparing predictive accuracy. Journal of Business and Economic Statistics 13, 253-263. \(\blacksquare\)

SHLee AI Financial Model

Pages

R : Diebold-Mariano Test

Diebold-Mariano test

R code

No comments:

Post a Comment