Time Series Sequence Samples Dataset
Sequence-based models such LSTM requires the 3D dataset structure: (batch, timesteps, features). The term 'batch' specifically indicates the subset of samples included in a mini-batch during model training.
Timesteps denote the historical sequence of data instances or temporal lags. Incorporating this temporal dimension (timesteps) constructs a three-dimensional dataset, enabling the effective capture of the data's sequential nature.
Python code
I've made a general function for this purpose:
f_make_seq_data_from_matrix(data, ts_list, fh_list) .
The 'data' is a numpy matrix which contains a multivariate time series. 'ts_list' is a list of timesteps, which doesn't have to be consecutive. On the other hand, 'fh_list' refers to a list of forecasting horizons, capable of representing single or multi-step forecasts and also need not be consecutive.
This function, such as it is, is capable of handling distributed lags, as well as one-step or multi-step forecasting, and thus, I believe it is helpful for general purposes.
import numpy as np def f_make_seq_data_from_matrix(data, ts_list, fh_list): co_list = ts_list+fh_list coseq_range = range(min(co_list),max(co_list)+1) tsseq_range = range(min(ts_list),max(ts_list)+1) fhseq_range = range(min(fh_list),max(fh_list)+1) tssel_list = [i - min(ts_list) for i in ts_list] fhsel_list = [i - min(fh_list) for i in fh_list] is_seq = []; ot_seq = []; obs = data.shape[0] for i in range(obs - len(coseq_range) + 1): dal = data[i:i + len(coseq_range)] din = dal[:len(tsseq_range)] dot = dal[-len(fhseq_range):] is_seq.append(din[tssel_list]) ot_seq.append(dot[fhsel_list]) return np.array(is_seq), np.array(ot_seq) | cs |
The elements of two lists are determined based on the time 't'. For example, -2, -1, 0, 1, 2 correspond to t-2, t-1, t, t+1, t+2 respectively.
Case 1: Forecasting time t+1 utilizing information from time t
This resembles a typical example, akin to an AR(1) model. Achieving the same result is possible by using ts_list = [-1] and fh_list = [0] since the time lag structure remains consistent.
# Suppose data has 10 observations with 3 features data = np.random.rand(10, 3) # Generate suitable sequences for CNN, RNN, LSTM, and so on ts_list = [0] # selected timesteps fh_list = [1] # selected forecasting horizons # generate sequences dataset for Keras X, Y = f_make_seq_data_from_matrix(data, ts_list, fh_list) print("\nTimesteps:", ts_list, ", forecast horizons:", fh_list) print("\nData\n", data, "\n Shape of data:", data.shape) print("\nX\n", X, "\n Shape of X:", X.shape) print("\nY\n", Y, "\n Shape of Y:", Y.shape) | cs |
Timesteps: [0] , forecast horizons: [1] Data [[0.58708034 0.88707951 0.25878656] [0.52696273 0.13857786 0.50993527] [0.53533872 0.45365456 0.89658186] [0.54978604 0.91198371 0.25040483] [0.36520302 0.76098129 0.5341683 ] [0.46726791 0.82170191 0.52046577] [0.84807446 0.70375552 0.31805087] [0.3812772 0.31083093 0.33218005] [0.49522332 0.4586895 0.61974004] [0.88130502 0.47469752 0.50149153]] Shape of data: (10, 3) X [[[0.58708034 0.88707951 0.25878656]] [[0.52696273 0.13857786 0.50993527]] [[0.53533872 0.45365456 0.89658186]] [[0.54978604 0.91198371 0.25040483]] [[0.36520302 0.76098129 0.5341683 ]] [[0.46726791 0.82170191 0.52046577]] [[0.84807446 0.70375552 0.31805087]] [[0.3812772 0.31083093 0.33218005]] [[0.49522332 0.4586895 0.61974004]]] Shape of X: (9, 1, 3) Y [[[0.52696273 0.13857786 0.50993527]] [[0.53533872 0.45365456 0.89658186]] [[0.54978604 0.91198371 0.25040483]] [[0.36520302 0.76098129 0.5341683 ]] [[0.46726791 0.82170191 0.52046577]] [[0.84807446 0.70375552 0.31805087]] [[0.3812772 0.31083093 0.33218005]] [[0.49522332 0.4586895 0.61974004]] [[0.88130502 0.47469752 0.50149153]]] Shape of Y: (9, 1, 3) | cs |
Case 2: Forecasting times t+1, t+2, and t+3, utilizing sequential information from times t, t-1, and t-2
This involves a multistep forecasting approach utilizing sequential past information.
# Generate suitable sequences for CNN, RNN, LSTM, and so on ts_list = [-2,-1,0] # selected timesteps fh_list = [1,2,3] # selected forecasting horizons # generate sequences dataset for Keras X, Y = f_make_seq_data_from_matrix(data, ts_list, fh_list) print("\nTimesteps:", ts_list, ", forecast horizons:", fh_list) print("\nData\n", data, "\n Shape of data:", data.shape) print("\nX\n", X, "\n Shape of X:", X.shape) print("\nY\n", Y, "\n Shape of Y:", Y.shape) | cs |
Timesteps: [-2, -1, 0] , forecast horizons: [1, 2, 3] Data [[0.58708034 0.88707951 0.25878656] [0.52696273 0.13857786 0.50993527] [0.53533872 0.45365456 0.89658186] [0.54978604 0.91198371 0.25040483] [0.36520302 0.76098129 0.5341683 ] [0.46726791 0.82170191 0.52046577] [0.84807446 0.70375552 0.31805087] [0.3812772 0.31083093 0.33218005] [0.49522332 0.4586895 0.61974004] [0.88130502 0.47469752 0.50149153]] Shape of data: (10, 3) X [[[0.58708034 0.88707951 0.25878656] [0.52696273 0.13857786 0.50993527] [0.53533872 0.45365456 0.89658186]] [[0.52696273 0.13857786 0.50993527] [0.53533872 0.45365456 0.89658186] [0.54978604 0.91198371 0.25040483]] [[0.53533872 0.45365456 0.89658186] [0.54978604 0.91198371 0.25040483] [0.36520302 0.76098129 0.5341683 ]] [[0.54978604 0.91198371 0.25040483] [0.36520302 0.76098129 0.5341683 ] [0.46726791 0.82170191 0.52046577]] [[0.36520302 0.76098129 0.5341683 ] [0.46726791 0.82170191 0.52046577] [0.84807446 0.70375552 0.31805087]]] Shape of X: (5, 3, 3) Y [[[0.54978604 0.91198371 0.25040483] [0.36520302 0.76098129 0.5341683 ] [0.46726791 0.82170191 0.52046577]] [[0.36520302 0.76098129 0.5341683 ] [0.46726791 0.82170191 0.52046577] [0.84807446 0.70375552 0.31805087]] [[0.46726791 0.82170191 0.52046577] [0.84807446 0.70375552 0.31805087] [0.3812772 0.31083093 0.33218005]] [[0.84807446 0.70375552 0.31805087] [0.3812772 0.31083093 0.33218005] [0.49522332 0.4586895 0.61974004]] [[0.3812772 0.31083093 0.33218005] [0.49522332 0.4586895 0.61974004] [0.88130502 0.47469752 0.50149153]]] Shape of Y: (5, 3, 3) | cs |
Case 3: Forecasting at times t+3 and t+5 using nonconsecutive multistep forecasting with time t and t-2 as distributed lag information
This exercise isn't realistic; however, it's used to demonstrate the generalized characteristics of the function.
# Generate suitable sequences for CNN, RNN, LSTM, and so on ts_list = [-2,0] # selected timesteps fh_list = [3,5] # selected forecasting horizons # generate sequences dataset for Keras X, Y = f_make_seq_data_from_matrix(data, ts_list, fh_list) print("\nTimesteps:", ts_list, ", forecast horizons:", fh_list) print("\nData\n", data, "\n Shape of data:", data.shape) print("\nX\n", X, "\n Shape of X:", X.shape) print("\nY\n", Y, "\n Shape of Y:", Y.shape) | cs |
Timesteps: [-2, 0] , forecast horizons: [3, 5] Data [[0.58708034 0.88707951 0.25878656] [0.52696273 0.13857786 0.50993527] [0.53533872 0.45365456 0.89658186] [0.54978604 0.91198371 0.25040483] [0.36520302 0.76098129 0.5341683 ] [0.46726791 0.82170191 0.52046577] [0.84807446 0.70375552 0.31805087] [0.3812772 0.31083093 0.33218005] [0.49522332 0.4586895 0.61974004] [0.88130502 0.47469752 0.50149153]] Shape of data: (10, 3) X [[[0.58708034 0.88707951 0.25878656] [0.53533872 0.45365456 0.89658186]] [[0.52696273 0.13857786 0.50993527] [0.54978604 0.91198371 0.25040483]] [[0.53533872 0.45365456 0.89658186] [0.36520302 0.76098129 0.5341683 ]]] Shape of X: (3, 2, 3) Y [[[0.46726791 0.82170191 0.52046577] [0.3812772 0.31083093 0.33218005]] [[0.84807446 0.70375552 0.31805087] [0.49522332 0.4586895 0.61974004]] [[0.3812772 0.31083093 0.33218005] [0.88130502 0.47469752 0.50149153]]] Shape of Y: (3, 2, 3) | cs |
No comments:
Post a Comment