NLP for learners – Changing learning rates and stopping early(ReduceLROnPlateau/EarlyStopping)

When using model.compile(), set the learning rate to the optimizer (optimization algorithm).

optimizer = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy',
    optimizer=optimizer,
    metrics=['accuracy'])

lr=0.01 is the learning rate.

The learning rate can be seen as a value that relates to the accuracy of learning.

This can be compared to finding a specific location on a map. When you try to find Manhattan in a map app, a map that shows your neighborhood won’t help you. First, display a map of the entire United States and look for New York State. Then you should zoom in on New York State and look for the small island of Manhattan along the coastline.

The learning rate is similar to the scale of a map. At first, we use a larger learning rate to find the approximate value that is close to the correct answer. Then, when you get closer to the correct answer, you reduce the learning rate to find the exact location of the correct answer.

ReduceLROnPlateau

from keras.callbacks import ReduceLROnPlateau
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, min_lr=0.0001)

ReduceLROnPlateau reduces the learning rate based on a given condition.

monitor='val_loss' means to monitor the loss function of the validation data.
To enable val_loss, you need to supply the validation data to model.fit(). See the previous article for details.

patience=2 means to change the learning rate when val_loss does not improve in 2 epochs.
factor=0.5 means, for example, if the first learning rate was 0.01, the next learning rate should be changed to 0.005.

min_lr=0.0001 is the minimum learning rate and will not get smaller.

EarlyStopping

from keras.callbacks import EarlyStopping
EarlyStopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, mode='auto')

EarlyStopping stops training when there is no improvement in training results.

monitor='val_loss' means to monitor the loss function of the validation data. min_delta indicates that the training result is not improved if the change is smaller than the given absolute value. patience=10 means that if val_loss is not improved at 10 epochs, the training will be stopped.

mode specifies how to monitor for increasing or decreasing values. Usually you should set it to auto.

mode specifies how to monitor the increase or decrease of the value. Usually it’s set to auto.

Finally, these features are specified in callbacks of fit().

model.fit(
 ......, callbacks=[EarlyStopping, reduce_lr])