Long and Wide formats
What is the long and wide format? A picture paints a thousand words.
We want to transform the long format data to the wide format data for a panel of time series analysis such as a term structure of interest rates. As financial data is usually extracted from database system, we occasionally encounter the long format data. For example, the data in the above figure is a sample of Euro area yield curve which has the long format. To facilitate an empirical analysis, the wide format is appropriate.
Transforming between the long and wide format can be carried out by using reshape() R function. No further explanation is needed. Let's see the R code below.
R code
The following R code reads sample data and transform the long format to the wide format and vice versa. When using the reshape() function, we need to set the direction argument as "long" or "wide". In particular, we need to add new column name with some delimitator (., _, etc) to the wide format data when we transform it to the long format.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | #========================================================# # Quantitative Financial Econometrics & Derivatives # ML/DL using R, Python, Tensorflow by Sang-Heon Lee # # https://shleeai.blogspot.com #--------------------------------------------------------# # Long to wide format and vice versa for yield data #========================================================# graphics.off(); rm(list = ls()) # sample data : ECB zero yields str_data <- "term date rate 3M 2021-01-29 -0.625 3M 2021-02-26 -0.612 3M 2021-03-31 -0.636 3M 2021-04-30 -0.628 3M 2021-05-31 -0.632 3M 2021-06-30 -0.650 3M 2021-07-30 -0.663 3M 2021-08-31 -0.676 3M 2021-09-30 -0.712 3M 2021-10-29 -0.736 3M 2021-11-30 -0.895 3M 2021-12-31 -0.731 3Y 2021-01-29 -0.771 3Y 2021-02-26 -0.648 3Y 2021-03-31 -0.711 3Y 2021-04-30 -0.684 3Y 2021-05-31 -0.666 3Y 2021-06-30 -0.672 3Y 2021-07-30 -0.813 3Y 2021-08-31 -0.760 3Y 2021-09-30 -0.677 3Y 2021-10-29 -0.537 3Y 2021-11-30 -0.766 3Y 2021-12-31 -0.620 10Y 2021-01-29 -0.512 10Y 2021-02-26 -0.246 10Y 2021-03-31 -0.279 10Y 2021-04-30 -0.180 10Y 2021-05-31 -0.146 10Y 2021-06-30 -0.203 10Y 2021-07-30 -0.440 10Y 2021-08-31 -0.393 10Y 2021-09-30 -0.170 10Y 2021-10-29 -0.069 10Y 2021-11-30 -0.350 10Y 2021-12-31 -0.188 20Y 2021-01-29 -0.176 20Y 2021-02-26 0.103 20Y 2021-03-31 0.142 20Y 2021-04-30 0.252 20Y 2021-05-31 0.287 20Y 2021-06-30 0.201 20Y 2021-07-30 -0.059 20Y 2021-08-31 -0.033 20Y 2021-09-30 0.195 20Y 2021-10-29 0.103 20Y 2021-11-30 -0.115 20Y 2021-12-31 0.056" #========================================== # Read a sample of ECB zero coupon yields #========================================== df_long <- read.table(text = str_data, header = TRUE) #========================================== # Transform LONG to WIDE format #========================================== # using "wide" option df_wide <- reshape(df_long, direction = "wide", idvar = "date", timevar = "term") df_wide # initialize row names rownames(df_long) <- NULL # delete a unnecessary prefix in column names colnames(df_wide) <- gsub("rate.","", colnames(df_wide)) df_wide #========================================== # Transform WIDE to LONG format #========================================== df_wide2 <- df_wide # need to add new column name as a prefix colnames(df_wide2)[-1] <- paste0("term.", colnames(df_wide)[-1]) # using "long" option df_long2 <- reshape(df_wide2, direction = "long", idvar="date", varying = colnames(df_wide2)[-1], sep = ".") # initialize row names rownames(df_long2) <- NULL df_long2 | cs |
Running the above R code produces the following wide format of the yield curve data.
We can also transform the wide format data to the long format conversely.
No comments:
Post a Comment