Fixing the Random Seed for Reproducible Work
Setting random seeds fixed
For an illustration, a simple DNN model is implemented in the Jupyter Notebook using a function for fixing the random seed. The function seed_everything() is borrowed from https://dacon.io/codeshare/2363.
Every time we run this code as a whole, the result is always 96.562645 and is not changed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | # Import library and define functions # In[1]: _________________________________________________________________ from keras.models import Sequential from keras.layers import Dense import tensorflow as tf import numpy as np import random import os def seed_everything(seed: int = 42): random.seed(seed) np.random.seed(seed) os.environ["PYTHONHASHSEED"] = str(seed) tf.random.set_seed(seed) _________________________________________________________________ # Simple DNN model with a fixed random seed # In[2]: _________________________________________________________________ x = np.array([[10, 20, 30], [20, 30, 40], [30, 40, 50], [40, 50, 60]]) y = np.array([40, 50, 60, 70]) seed_everything(1) # fix the random seed model = Sequential() model.add(Dense(50, activation='relu', input_dim=3)) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') model.fit(x, y, epochs=1000, verbose=0) x_input = np.array([60, 70, 80]) x_input = x_input.reshape((1,3)) pred = model.predict(x_input, verbose=0) print(pred) _________________________________________________________________ [[96.562645]] | cs |
Robustness checks
However, when we reuse and rerun some parts of the above code in the same file, we can encounter some unexpected results as follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | # rerun model fit # In[3]: _________________________________________________________________ model.fit(x, y, epochs=1000, verbose=0) x_input = np.array([60, 70, 80]) x_input = x_input.reshape((1,3)) pred = model.predict(x_input, verbose=0) print(pred) _________________________________________________________________ [[90.000145]] # rerun model.fit with a fixed random seed # In[4]: _________________________________________________________________ seed_everything(1) model.fit(x, y, epochs=1000, verbose=0) x_input = np.array([60, 70, 80]) x_input = x_input.reshape((1,3)) pred = model.predict(x_input, verbose=0) print(pred) _________________________________________________________________ [[90.00001]] # rerun model.compile & model.fit with a fixed random seed # In[5]: _________________________________________________________________ seed_everything(1) model.compile(optimizer='adam', loss='mse') model.fit(x, y, epochs=1000, verbose=0) x_input = np.array([60, 70, 80]) x_input = x_input.reshape((1,3)) pred = model.predict(x_input, verbose=0) print(pred) _________________________________________________________________ [[89.99999]] # call seed_everything(1) model = Sequential() # In[6]: _________________________________________________________________ seed_everything(1) model = Sequential() model.add(Dense(50, activation='relu', input_dim=3)) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') model.fit(x, y, epochs=1000, verbose=0) x_input = np.array([60, 70, 80]) x_input = x_input.reshape((1,3)) pred = model.predict(x_input, verbose=0) print(pred) _________________________________________________________________ [[96.562645]] | cs |
The bottom line is that to reproduce always the same result of the same model several times in one file, seed_everything() function needs to be located before model = Sequential().
No comments:
Post a Comment