Learning Rate Finder¶

In [ ]:

%matplotlib inline
from fastai.gen_doc.nbdoc import *
from fastai.vision import *
from fastai.callbacks import *

Learning rate finder plots lr vs loss relationship for a Learner. The idea is to reduce the amount of guesswork on picking a good starting learning rate.

Overview:

First run lr_find learn.lr_find()
Plot the learning rate vs loss learn.recorder.plot()
Pick a learning rate before it diverges then start training

Technical Details: (first described by Leslie Smith)

Train Learner over a few iterations. Start with a very low start_lr and change it at each mini-batch until it reaches a very high end_lr. Recorder will record the loss at each iteration. Plot those losses against the learning rate to find the optimal value before it diverges.

Choosing a good learning rate¶

For a more intuitive explanation, please check out Sylvain Gugger's post

In [ ]:

path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)
def simple_learner(): return Learner(data, simple_cnn((3,16,16,2)), metrics=[accuracy])
learn = simple_learner()

First we run this command to launch the search:

In [ ]:

show_doc(Learner.lr_find)

`lr_find`[source]

lr_find(learn:Learner, start_lr:Floats=*1e-07, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, wd:float=None*)

Explore lr from start_lr to end_lr over num_it iterations in learn. If stop_div, stops when loss diverges.

In [ ]:

learn.lr_find(stop_div=False, num_it=200)

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

Then we plot the loss versus the learning rates. We're interested in finding a good order of magnitude of learning rate, so we plot with a log scale.

In [ ]:

learn.recorder.plot()
min_grad_lr = learn.recorder.min_grad_lr

Min numerical gradient: 7.59E-03

Then, we choose a value that is approximately in the middle of the sharpest downward slope. In this case, training with 3e-2 looks like it should work well:

In [ ]:

simple_learner().fit(2, 1e-2)

Total time: 00:06

epoch	train_loss	valid_loss	accuracy
1	0.085923	0.057610	0.978901
2	0.054614	0.030795	0.991168

Don't just pick the minimum value from the plot!

In [ ]:

learn = simple_learner()
simple_learner().fit(2, 1e-0)

Total time: 00:06

epoch	train_loss	valid_loss	accuracy
1	1.346225	0.693147	0.495584
2	0.706118	0.693147	0.495584

Picking a value before the downward slope results in slow training:

In [ ]:

learn = simple_learner()
simple_learner().fit(2, 1e-3)

Total time: 00:06

epoch	train_loss	valid_loss	accuracy
1	0.167499	0.142523	0.946025
2	0.128227	0.109466	0.960255

Suggest LR¶

The red dot on the graph is the point with the minimum numerical gradient. We can use that point as a first guess for an LR

In [ ]:

learn = simple_learner()
simple_learner().fit(2, min_grad_lr)

Total time: 00:06

epoch	train_loss	valid_loss	accuracy
1	0.095497	0.063320	0.978410
2	0.048079	0.041837	0.983317

In [ ]:

show_doc(LRFinder)

`class` `LRFinder`[source]

LRFinder(learn:Learner, start_lr:float=*1e-07, end_lr:float=10, num_it:int=100, stop_div:bool=True*) :: LearnerCallback

Causes learn to go on a mock training from start_lr to end_lr for num_it iterations.

Callback methods¶

You don't call these yourself - they're called by fastai's Callback system automatically to enable the class's functionality.

In [ ]:

show_doc(LRFinder.on_train_begin)

`on_train_begin`[source]

on_train_begin(pbar, ****kwargs**:Any)

Initialize optimizer and learner hyperparameters.

In [ ]:

show_doc(LRFinder.on_batch_end)

`on_batch_end`[source]

on_batch_end(iteration:int, smooth_loss:TensorOrNumber, ****kwargs**:Any)

Determine if loss has runaway and we should stop.

In [ ]:

show_doc(LRFinder.on_epoch_end)

`on_epoch_end`[source]

on_epoch_end(****kwargs**:Any)

Tell Learner if we need to stop.

In [ ]:

show_doc(LRFinder.on_train_end)

`on_train_end`[source]

on_train_end(****kwargs**:Any)

Cleanup learn model weights disturbed during LRFind exploration.

Learning Rate Finder¶

Choosing a good learning rate¶

lr_find[source]

Suggest LR¶

class LRFinder[source]

Callback methods¶

on_train_begin[source]

on_batch_end[source]

on_epoch_end[source]

on_train_end[source]

`lr_find`[source]

`class` `LRFinder`[source]

`on_train_begin`[source]

`on_batch_end`[source]

`on_epoch_end`[source]

`on_train_end`[source]