R code snippet : Transform from long format to wide format

This post introduces a simple R code snippet for transforming the long format data to the wide format. We occasionally encounter the long format data such as yield curve data since it has two dimensions : maturity and time. For this end, we can use reshape() R built-in function.


Long and Wide formats



What is the long and wide format? A picture paints a thousand words.

R snippet : Transform from long format to wide format

We want to transform the long format data to the wide format data for a panel of time series analysis such as a term structure of interest rates. As financial data is usually extracted from database system, we occasionally encounter the long format data. For example, the data in the above figure is a sample of Euro area yield curve which has the long format. To facilitate an empirical analysis, the wide format is appropriate.

Transforming between the long and wide format can be carried out by using reshape() R function. No further explanation is needed. Let's see the R code below.

R code


The following R code reads sample data and transform the long format to the wide format and vice versa. When using the reshape() function, we need to set the direction argument as "long" or "wide". In particular, we need to add new column name with some delimitator (., _, etc) to the wide format data when we transform it to the long format.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
#========================================================#
# Quantitative Financial Econometrics & Derivatives 
# ML/DL using R, Python, Tensorflow by Sang-Heon Lee 
#
# https://shleeai.blogspot.com
#--------------------------------------------------------#
# Long to wide format and vice versa for yield data
#========================================================#
 
graphics.off(); rm(list = ls())
 
# sample data : ECB zero yields
str_data <- "term    date    rate
            3M    2021-01-29    -0.625
            3M    2021-02-26    -0.612
            3M    2021-03-31    -0.636
            3M    2021-04-30    -0.628
            3M    2021-05-31    -0.632
            3M    2021-06-30    -0.650
            3M    2021-07-30    -0.663
            3M    2021-08-31    -0.676
            3M    2021-09-30    -0.712
            3M    2021-10-29    -0.736
            3M    2021-11-30    -0.895
            3M    2021-12-31    -0.731
            3Y    2021-01-29    -0.771
            3Y    2021-02-26    -0.648
            3Y    2021-03-31    -0.711
            3Y    2021-04-30    -0.684
            3Y    2021-05-31    -0.666
            3Y    2021-06-30    -0.672
            3Y    2021-07-30    -0.813
            3Y    2021-08-31    -0.760
            3Y    2021-09-30    -0.677
            3Y    2021-10-29    -0.537
            3Y    2021-11-30    -0.766
            3Y    2021-12-31    -0.620
            10Y    2021-01-29    -0.512
            10Y    2021-02-26    -0.246
            10Y    2021-03-31    -0.279
            10Y    2021-04-30    -0.180
            10Y    2021-05-31    -0.146
            10Y    2021-06-30    -0.203
            10Y    2021-07-30    -0.440
            10Y    2021-08-31    -0.393
            10Y    2021-09-30    -0.170
            10Y    2021-10-29    -0.069
            10Y    2021-11-30    -0.350
            10Y    2021-12-31    -0.188
            20Y    2021-01-29    -0.176
            20Y    2021-02-26    0.103
            20Y    2021-03-31    0.142
            20Y    2021-04-30    0.252
            20Y    2021-05-31    0.287
            20Y    2021-06-30    0.201
            20Y    2021-07-30    -0.059
            20Y    2021-08-31    -0.033
            20Y    2021-09-30    0.195
            20Y    2021-10-29    0.103
            20Y    2021-11-30    -0.115
            20Y    2021-12-31    0.056"
 
#==========================================
# Read a sample of ECB zero coupon yields
#==========================================
df_long <- read.table(text = str_data, header = TRUE)
 
#==========================================
# Transform LONG to WIDE format
#==========================================
 
# using "wide" option
df_wide <- reshape(df_long, direction = "wide"
                   idvar = "date"
                   timevar = "term")
df_wide
 
# initialize row names
rownames(df_long) <- NULL
 
# delete a unnecessary prefix in column names
colnames(df_wide) <- gsub("rate.","", colnames(df_wide))
df_wide
 
#==========================================
# Transform WIDE to LONG format
#==========================================
df_wide2 <- df_wide
 
# need to add new column name as a prefix
colnames(df_wide2)[-1<- 
    paste0("term.", colnames(df_wide)[-1])
 
# using "long" option
df_long2 <- reshape(df_wide2, direction = "long"
                    idvar="date"
                    varying = colnames(df_wide2)[-1], 
                    sep = ".")
 
# initialize row names
rownames(df_long2) <- NULL
df_long2
 
cs


Running the above R code produces the following wide format of the yield curve data.

R code snippet : Transform from long format to wide format


We can also transform the wide format data to the long format conversely.

R code snippet : Transform from long format to wide format



No comments:

Post a Comment