Python : Tensorflow Keras Tuner for the optimization of hyper parameters

This post shows how to use the Keras Tuner for the hyper parameter optimization. This can avoid many for-loops effectively, which are used when the grid search for hyper parameters is necessary.




Keras Tuner for the optimization of hyper parameters



To run the Keras Tuner with Yahoo Finance data (Tesla stock price), two packages should be installed: keras-tuner and yfinance.

To install these packages, type the following commands in the Jupyter Notebook.

1
2
!pip install yfinance
!pip install keras-tuner
cs



Data and some useful function


I apply a simple Keras deep learning model to Tesla stock returns for an illustration purpose. After data is loaded and transformed into returns, a train-test split is performed (80:20) using f_sequence_to_supervised() function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import yfinance as yf #!pip install yfinance
%matplotlib inline
 
import tensorflow as tf
import random
import os
 
def seed_everything(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    os.environ["PYTHONHASHSEED"= str(seed)
    tf.random.set_seed(seed)
    
# Construction of X and y
def f_sequence_to_supervised(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)): 
        end_ix = i + n_steps
        if end_ix > len(sequence)-1: break
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]   
        X.append(seq_x); y.append(seq_y)
    return np.array(X), np.array(y)
 
# read stock price
df_stkp = yf.download('TSLA', start='2020-01-01'
                      end='2023-01-31', progress=False)
df_stkp = df_stkp['Close']
 
# covert the stock price to daily stock returns and to plot it
df_stkr=df_stkp.pct_change().dropna()*100 # percent
df_stkp.plot(title="TSLA's stock return")
 
n_steps =  12
# stock return (%) with its lags
X, y = f_sequence_to_supervised(df_stkr.tolist(), n_steps)
 
index1=int(round(len(X)*0.8))
X_train, X_test = X[:index1], X[index1:]
y_train, y_test = y[:index1], y[index1:]
 
cs

Python : Keras Tuner for the optimization of hyper parameters


Keras model fitting and forecasting given predetermined hyper parameters


As a simple example, the following deep learning model is constructed. Fitting this model is followed by forecasting using the test data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
from keras.callbacks import EarlyStopping
from keras import optimizers
 
# define a simple Sequential model
seed_everything(1# fix the random seed
model = Sequential()
model.add(LSTM(50, input_shape=(n_steps,1)))
model.add(Dropout(0.2))
model.add(Dense(6))
model.add(Dense(1))
model.compile(loss='mean_squared_error'
              optimizer='adam'
              metrics = ['mse'])
 
#fit the model
early_stop = EarlyStopping(monitor='loss'
                  patience=50, verbose=0)
 
model.fit(X_train, y_train, epochs=1000
          batch_size=32, verbose=2
          callbacks=[early_stop])
 
# Forecast using test data
y_pred = model.predict(X_test, verbose=0)
 
plt.figure().set_figwidth(12)
plt.plot(np.c_[y_pred, y_test])
plt.legend(('Test data','Forecast'))
plt.show()
 
cs


After the above model is fitted, a forecast is done using the test data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Epoch 1/1000
20/20 - 2s - loss: 25.8200 - mse: 25.8200 - 2s/epoch - 84ms/step
Epoch 2/1000
20/20 - 0s - loss: 25.6928 - mse: 25.6928 - 75ms/epoch - 4ms/step
Epoch 3/1000
20/20 - 0s - loss: 25.6435 - mse: 25.6435 - 83ms/epoch - 4ms/step
Epoch 4/1000
20/20 - 0s - loss: 25.4286 - mse: 25.4286 - 84ms/epoch - 4ms/step
                            ⋮
Epoch 532/1000
20/20 - 0s - loss: 1.8256 - mse: 1.8256 - 103ms/epoch - 5ms/step
Epoch 533/1000
20/20 - 0s - loss: 1.6459 - mse: 1.6459 - 103ms/epoch - 5ms/step
Epoch 534/1000
20/20 - 0s - loss: 1.8238 - mse: 1.8238 - 107ms/epoch - 5ms/step
Epoch 535/1000
20/20 - 0s - loss: 1.7009 - mse: 1.7009 - 114ms/epoch - 6ms/step
Epoch 536/1000
20/20 - 0s - loss: 1.6138 - mse: 1.6138 - 110ms/epoch - 6ms/step
 
cs

Python : Keras Tuner for the optimization of hyper parameters


Set up and Run Keras Tuner


Keras Tuner is applied by the three stages in the next code. At first, some hyper parameters of interest are selected with their candidate values in the method of hp.Int(), hp.Float(), and so on. You can find more methods in the Keras web site (https://keras.io/api/keras_tuner/hyperparameters/).

A Keras code with these methods is implemented in a user-defined build function (f_build_model() in our case and this name can be changed, of course).

Using these information, tuner= kt.RandomSearch() is defined and tuner.search() function is called. The role of the latter is essentially same to the model.fit() in the standard Keras model. In particular, I use overwrite=True to avoid unexpected reloading of the previous output.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import keras.backend as K
import keras_tuner as kt #!pip install keras-tuner
from keras_tuner.tuners import RandomSearch
from keras_tuner.engine.hyperparameters import HyperParameters
 
# options for hyper parameters “hp” are specified
def f_build_model(hp):
    
    seed_everything(1# fix the random seed
    
    model = Sequential()
    model.add(LSTM(50, input_shape=(n_steps,1)))
    
    #model.add(Dropout(0.2))
    hp_rate = hp.Float('dropout1_rate',min_value=0.0,
                       max_value=0.4,step=0.2)
    model.add(Dropout(hp_rate))
    
    # model.add(Dense(6))
    hp_unit = hp.Int('dense1_units',min_value=3,
                     max_value=9,step=3)
    model.add(Dense(hp_unit))
    
    model.add(Dense(1))
    model.compile(loss='mean_squared_error'
                  optimizer='adam', metrics = ['mse'])
    return model
 
# define tuner
tuner= kt.RandomSearch(f_build_model, overwrite=True
                       objective='mse', max_trials=50
                       executions_per_trial=1)
 
# instead of fitting the model
# run the search function on the tuner object.
tuner.search(x=X_train, y=y_train, epochs=1000
             batch_size=32, verbose=1, callbacks=[early_stop])
 
cs


As Keras Tuner is a kind of a random search and then does not consider all combinations of hyper parameters since it is time consuming and ineffective practically, some restriction on this combination is necessary. This is the max_trials which is the number of hyperparameter combinations.

It is a standard practice to run model.fit() several times and get the average output to avoid parameter an initialization effect. Like this kind of robustness purposes, execution_per_trial is used and this parameter is the number of models that should be built and fit for each trial.

Running the above code, the optimal output is produced as follows.


1
2
3
4
5
6
7
Trial 9 Complete [00h 01m 05s]
mse: 1.3944592475891113
 
Best mse So Far: 0.004801980219781399
Total elapsed time: 00h 10m 19s
INFO:tensorflow:Oracle triggered exit
 
cs



Predicting the best model using selected hyper parameters


We can get the final model as well as optimized hyper parameters. The following code explains how to extract the final model or hyper parameters and how to predict.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials = 1)[0]
 
# Read selected parameters
print(f"""
The hyperparameter search is complete. 
The optimal dropout rate    : {best_hps.get('dropout1_rate')}.
The optimal number of units : {best_hps.get('dense1_units')}.
""")
 
# summary results for hyper parameter optimization
tuner.results_summary()
 
# Now get the best model
best_model = tuner.get_best_models(num_models=1)[0]
best_model.summary()
 
# Predict using the best model
y_pred_best = best_model.predict(X_test, verbose=0)
 
plt.figure().set_figwidth(12)
plt.plot(np.c_[y_test, y_pred, y_pred_best])
plt.legend(('Test data','Forecast''Forecast (best)'))
plt.show()
 
cs


After the hyperparameter search is complete, the optimal dropout rate and the number of units in fully connected dense layer are 0.0.and 9 respectively.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
The hyperparameter search is complete. 
The optimal dropout rate    : 0.0.
The optimal number of units : 9.
 
Results summary
Results in .\untitled_project
Showing 10 best trials
<keras_tuner.engine.objective.Objective object at 0x0000022E49675E50>
Trial summary
Hyperparameters:
dropout1_rate: 0.0
dense1_units: 9
Score: 0.004801980219781399
          ⋮
Trial summary
Hyperparameters:
dropout1_rate: 0.4
dense1_units: 6
Score: 3.4320671558380127
 
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm (LSTM)                 (None50)                10400                                                                
 dropout (Dropout)           (None50)                0                                                                   
 dense (Dense)               (None9)                 459                                                                  
 dense_1 (Dense)             (None1)                 10                                                                    
=================================================================
Total params: 10,869
Trainable params: 10,869
Non-trainable params: 0
_________________________________________________________________
 
cs

Python : Keras Tuner for the optimization of hyper parameters


Concluding Remarks


This post introduced Keras Tuner for hyper parameter optimization. As it is somewhat time-consuming, I used a very small set of candidate hyper parameters as an illustration. Therefore, it is recommended to use a relatively large set for real applications. Furthermore, you can use validation data by minimizing validation loss to prevent overfitting and enhance the ability of the model to generalize to new data. \(\blacksquare\)


No comments:

Post a Comment