πŸ“™ CIFAR-10 Classifiers: Part 2 - Use Keras Tuner to speed up the hyperparameter search


Part 1: Tensorflow

This notebook is part of CIFAR-10 Classifiers post. In that post, I cover building classifiers using Deep Learning frameworks such as TensorFlow, PyTorch, PyTorch Lightning and making use of several hyperparameter optimization libraries. You may find something useful there.


CIFAR10 Classfier: Keras Tuner Edition

In the previous notebook, we manually tuned the hyper parameters to improve the test accuracy. The hyperparameter search space is incredibly large if you consider these (this is not an exhaustive list):

png

Imagine enumerating through that search space manually 😱. Well, “I can just do the grid search” You say. Well, that would be perfectly fine for a small search space and dataset. But there are better ways, and one of them is to use hyperparam optimization framework.

In this post, We will use Keras Tuner. We want to find out if we can beat the hand tuned model - 68.5% Test Accuracy.

Author: Katnoria | Created: 17-Aug-2020

1. Imports & Setup

import pickle
from time import time
from datetime import datetime
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPool2D, Dropout
from tensorflow.keras.layers import BatchNormalization, Input, GlobalAveragePooling2D
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras import Model
import IPython
import kerastuner as kt
def version_info(cls):
    print(f"{cls.__name__}: {cls.__version__}")
print("Version Used in this Notebook:")
version_info(tf)
version_info(tfds)
Version Used in this Notebook:
tensorflow: 2.3.0
tensorflow_datasets: 3.2.1
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Num GPUs Available:  1

2. Dataset

Tensorflow Datasets already provides this dataset in a format that we can use out of the box.

# Load the dataset
(ds_train, ds_test), metadata = tfds.load(
    'cifar10', split=['train', 'test'], shuffle_files=True, 
    with_info=True, as_supervised=True
)

IMG_SIZE = 32
NUM_CLASSES = metadata.features["label"].num_classes
print(f"Classes: {NUM_CLASSES}")
Classes: 10
IMG_SIZE = 32
# test
train_ds = ds_train \
    .cache() \
    .batch(1, drop_remainder=True) \
    .prefetch(tf.data.experimental.AUTOTUNE) 
examples = ds_train.take(64)

fig, axs = plt.subplots(5, 5, figsize=(8,8))

for record, ax in zip(examples, axs.flat):
    image, _ = record
    ax.imshow(image)
    ax.axis('off')
plt.show()    

png

# we no longer need it
del train_ds
# Base Model
base_model = tf.keras.applications.ResNet50(input_shape=(IMG_SIZE, IMG_SIZE, 3), include_top=False)
base_model.trainable = False

3. Data Augmentation

def transforms(x, hp):
    use_rotation = hp.Boolean('use_rotation')
    if use_rotation:
        x = tf.keras.layers.experimental.preprocessing.RandomRotation(
            hp.Float('rotation_factor', min_value=0.05, max_value=0.3)
        )(x)

    use_flip = hp.Boolean('use_flip')
    if use_flip:
        tf.keras.layers.experimental.preprocessing.RandomFlip(
          hp.Choice('orientation', values=['vertical', 'horizontal', 'horizontal_and_vertical'])
        )(x)

    use_zoom = hp.Boolean('use_zoom')
    if use_zoom:
        x = tf.keras.layers.experimental.preprocessing.RandomZoom(
            hp.Float('use_zoom', min_value=0.05, max_value=0.2)
        )(x)
    return x

# Without making any changes to image augmentation
# transforms = tf.keras.Sequential([
#     tf.keras.layers.experimental.preprocessing.RandomFlip('horizontal'),
#     tf.keras.layers.experimental.preprocessing.RandomRotation(0.2),
# ])
  # Load dataset
(ds_train, ds_test), metadata = tfds.load(
  'cifar10', split=['train', 'test'], shuffle_files=True, 
  with_info=True, as_supervised=True
  )
num_train_examples=len(ds_train)
# create train and test batch
BS = 128
train_ds = ds_train \
  .cache() \
  .shuffle(num_train_examples).batch(BS, drop_remainder=True) \
  .prefetch(tf.data.experimental.AUTOTUNE) 
  
test_ds = ds_test \
    .cache() \
    .batch(BS, drop_remainder=True) \
    .prefetch(tf.data.experimental.AUTOTUNE)       

from kerastuner import HyperModel

class BasicResnet50Model(HyperModel):
    def __init__(self, num_classes):
        self.num_classes = num_classes
    
    def build(self, hp):
        inputs = Input(shape=(IMG_SIZE, IMG_SIZE, 3))
        x = transforms(inputs, hp)
        x = tf.keras.applications.resnet.preprocess_input(x)
        x = base_model(x, training=False)
        x = Flatten()(x)          
        
        hp_activation = hp.Choice('activation', values=['relu', 'selu', 'elu'])  
        hp_drop_rate = hp.Float('rate', min_value=0.0, max_value=0.8, step=0.2) 
        # the line below will increase your search space significantly
        # you can change it to choice or sample?
#         num_dense_units = hp.Int('units', min_value=64, max_value=512, step=32)
        num_dense_units = hp.Choice('units', [32, 64, 128, 256, 512])
        x = Dense(num_dense_units, activation=hp_activation)(x)
        x = Dropout(hp_drop_rate)(x)        
        outputs = Dense(NUM_CLASSES)(x)    
        model = tf.keras.Model(inputs, outputs)

        hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
        model.compile(
          optimizer=tf.keras.optimizers.Adam(hp_learning_rate),
          loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
          metrics=['accuracy']
          )

        return model
class ClearTrainingOutput(tf.keras.callbacks.Callback):
    def on_train_end(*args, **kwargs):
        IPython.display.clear_output(wait = True)

3. Tuner: Random Search

Next, we also give random search a shot. There are other tuners available. You can find the complete list here

# instantiate the model
basic_resnet_model = BasicResnet50Model(10)
# define the tuner
tuner = kt.tuners.RandomSearch(
    basic_resnet_model,
    objective='val_accuracy',
    max_trials=50,
    overwrite=True,
    project_name = 'cifar10-kt-randomsearch',
)

callbacks = [
             tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', mode='max', patience=3, baseline=0.9),
             ClearTrainingOutput()
             ]
# go search
start = time()
print(f"start: {datetime.fromtimestamp(start)}")
tuner.search(train_ds, epochs=20, callbacks=callbacks, validation_data=train_ds, verbose=0)
stop = time()

Trial complete

Trial summary

|-Trial ID: b9c6b6685964225ccf37400e8a44d5e5

|-Score: 0.49366986751556396

|-Best step: 0

Hyperparameters:

|-activation: relu

|-learning_rate: 0.0001

|-orientation: horizontal

|-rate: 0.4

|-rotation_factor: 0.263610997215446

|-units: 512

|-use_flip: True

|-use_rotation: True

|-use_zoom: True

INFO:tensorflow:Oracle triggered exit


INFO:tensorflow:Oracle triggered exit
took = stop - start
print(f"Total training time: {took//60 : .0f}m {took%60:.0f}s")
Total training time:  25m 20s

3.1 Evaluate

We will now evaluate the best model on the test set.

best_model = tuner.get_best_models(1)[0]
best_hyperparameters = tuner.get_best_hyperparameters(1)[0]
best_hyperparameters.values
{'use_rotation': False,
 'use_flip': False,
 'use_zoom': False,
 'activation': 'selu',
 'rate': 0.4,
 'units': 512,
 'learning_rate': 0.001,
 'rotation_factor': 0.183175427618116,
 'orientation': 'horizontal_and_vertical'}
# best_model = tuner.get_best_models(1)[0]
best_model.evaluate(test_ds)
78/78 [==============================] - ETA: 0s - loss: 0.9271 - accuracy: 0.65 - ETA: 0s - loss: 0.8672 - accuracy: 0.66 - ETA: 0s - loss: 0.8869 - accuracy: 0.66 - ETA: 0s - loss: 0.8815 - accuracy: 0.67 - ETA: 0s - loss: 0.9056 - accuracy: 0.66 - ETA: 0s - loss: 0.9180 - accuracy: 0.67 - ETA: 0s - loss: 0.9261 - accuracy: 0.66 - ETA: 0s - loss: 0.9335 - accuracy: 0.66 - ETA: 0s - loss: 0.9325 - accuracy: 0.66 - ETA: 0s - loss: 0.9427 - accuracy: 0.66 - ETA: 0s - loss: 0.9439 - accuracy: 0.66 - ETA: 0s - loss: 0.9506 - accuracy: 0.66 - ETA: 0s - loss: 0.9537 - accuracy: 0.66 - ETA: 0s - loss: 0.9505 - accuracy: 0.66 - ETA: 0s - loss: 0.9527 - accuracy: 0.66 - ETA: 0s - loss: 0.9517 - accuracy: 0.66 - 1s 12ms/step - loss: 0.9520 - accuracy: 0.6673





[0.9520245790481567, 0.6672676205635071]

Test Accuracy: 66.7%

4. Hyperband

Let us try another tuner that speeds up the random search using adaptive resource allocation and early stopping. Ref

# instantiate the model
basic_resnet_model = BasicResnet50Model(10)
# define the tuner
tuner = kt.Hyperband(
    basic_resnet_model, objective='val_accuracy',  
    max_epochs = 50, factor = 3, project_name = 'cifar10-dense2'
    )
# go search
start = time()
print(f"start: {datetime.fromtimestamp(start)}")
tuner.search(
    train_ds, epochs = 25, validation_data = test_ds, verbose=0,
    callbacks = [tf.keras.callbacks.EarlyStopping(patience=3), ClearTrainingOutput()]
    )
end = time()

Trial complete

Trial summary

|-Trial ID: 5da8c8fed707d0b76913715b563e44d3

|-Score: 0.5245392918586731

|-Best step: 0

Hyperparameters:

|-activation: elu

|-learning_rate: 0.01

|-orientation: horizontal

|-rate: 0.0

|-rotation_factor: 0.2706748106096059

|-tuner/bracket: 0

|-tuner/epochs: 50

|-tuner/initial_epoch: 0

|-tuner/round: 0

|-units: 256

|-use_flip: True

|-use_rotation: True

|-use_zoom: False

INFO:tensorflow:Oracle triggered exit


INFO:tensorflow:Oracle triggered exit
took = end - start
print(f"Total training time: {took//60 : .0f}m {took%60:.0f}s")
Total training time:  61m 12s
best_model = tuner.get_best_models(1)[0]
best_hyperparameters = tuner.get_best_hyperparameters(1)[0]
best_hyperparameters.values
{'use_rotation': False,
 'use_flip': False,
 'use_zoom': False,
 'activation': 'selu',
 'rate': 0.4,
 'units': 256,
 'learning_rate': 0.0001,
 'rotation_factor': 0.18142451208184568,
 'orientation': 'horizontal_and_vertical',
 'tuner/epochs': 50,
 'tuner/initial_epoch': 17,
 'tuner/bracket': 2,
 'tuner/round': 2,
 'tuner/trial_id': 'a016de656b093c99c7f2f29f7ee4c08f'}

4.1 Checkpointing

We do not need to manually save the best model or hyper param. Keras-Tuner records every trial and you can just load it back using the same command used above.

Test: Restart the notebook, do not run the trial and execute the cells below

best_model = tuner.get_best_models(1)[0]
best_hyperparameters = tuner.get_best_hyperparameters(1)[0]
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter


WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter


WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_1


WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_1


WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_2


WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_2


WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.decay


WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.decay


WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.learning_rate


WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.learning_rate


WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.


WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
best_hyperparameters.values
{'use_rotation': False,
 'use_flip': False,
 'use_zoom': False,
 'activation': 'selu',
 'rate': 0.4,
 'units': 256,
 'learning_rate': 0.0001,
 'rotation_factor': 0.18142451208184568,
 'orientation': 'horizontal_and_vertical',
 'tuner/epochs': 50,
 'tuner/initial_epoch': 17,
 'tuner/bracket': 2,
 'tuner/round': 2,
 'tuner/trial_id': 'a016de656b093c99c7f2f29f7ee4c08f'}

4.2 Evaluate

Let’s evaluate the best model against the test set

best_model.evaluate(test_ds)
78/78 [==============================] - ETA: 0s - loss: 0.9755 - accuracy: 0.65 - ETA: 0s - loss: 0.8871 - accuracy: 0.68 - ETA: 0s - loss: 0.8818 - accuracy: 0.68 - ETA: 0s - loss: 0.8763 - accuracy: 0.69 - ETA: 0s - loss: 0.9023 - accuracy: 0.68 - ETA: 0s - loss: 0.9197 - accuracy: 0.68 - ETA: 0s - loss: 0.9226 - accuracy: 0.68 - ETA: 0s - loss: 0.9134 - accuracy: 0.69 - ETA: 0s - loss: 0.9187 - accuracy: 0.69 - ETA: 0s - loss: 0.9214 - accuracy: 0.69 - ETA: 0s - loss: 0.9237 - accuracy: 0.69 - ETA: 0s - loss: 0.9233 - accuracy: 0.69 - ETA: 0s - loss: 0.9354 - accuracy: 0.68 - ETA: 0s - loss: 0.9378 - accuracy: 0.69 - ETA: 0s - loss: 0.9335 - accuracy: 0.69 - ETA: 0s - loss: 0.9403 - accuracy: 0.68 - ETA: 0s - loss: 0.9423 - accuracy: 0.68 - 1s 12ms/step - loss: 0.9409 - accuracy: 0.6868





[0.9409434199333191, 0.6867988705635071]

The best test accuracy is 68.67%, which is slightly better than our baseline

Keras-Tuner also supports bayesian optimization to search the best model (BayesianOptimization Tuner). You could give it a try too.

5. Conclusion

We saw that best architecture does not use any image augmentation πŸ˜‚ and SeLU seems to be the activation that keeps showing up.

Here are a few things that we could try:

  • additional image augmentation search
  • search the pooling options (Global Average, Global Max Pooling)
  • add batchnorm?
  • add more layers?

See also