This notebook is for testing that the peak memory consumption is efficient and doesn't necessarily require more GPU RAM than needed.
The detection comes from reading the output of IPyExperimentsPytorch per-cell reports and fastai.callbacks.mem.PeakMemMetric metric.
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
from fastai.utils.mem import *
from fastai.callbacks.mem import *
from pathlib import Path
import numpy as np
#! pip install ipyexperiments
from ipyexperiments import IPyExperimentsPytorch
from ipyexperiments.utils.mem import *
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
assert str(device) == 'cuda:0', f"we want GPU, got {device}"
from IPython.display import Markdown, display
def alert(string, color='red'):
display(Markdown(f"<span style='color:{color}'>**{string}**</span>"))
exp1 = IPyExperimentsPytorch()
*** Experiment started with the Pytorch backend Device: ID 0, GeForce GTX 1070 Ti (8119 RAM) *** Current state: RAM: Used Free Total Util CPU: 2275 18518 31588 MB 7.20% GPU: 503 7616 8119 MB 6.19% ・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:00.000 ・ CPU: 0 0 2275 MB | ・ GPU: 0 0 503 MB |
path = untar_data(URLs.MNIST)
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:00.003 ・ CPU: 0 1 2277 MB | ・ GPU: 0 0 503 MB |
# setup
defaults.cmap='binary'
tfms = ([*rand_pad(padding=3, size=28, mode='zeros')], [])
num_workers=0
#bs=512
bs=128
data = (ImageItemList.from_folder(path, convert_mode='L')
.split_by_folder(train='training', valid='testing')
.label_from_folder()
.transform(tfms)
.databunch(bs=bs, num_workers=num_workers)
.normalize(imagenet_stats)
)
data
ImageDataBunch; Train: LabelList y: CategoryList (60000 items) [Category 4, Category 4, Category 4, Category 4, Category 4]... Path: /home/stas/.fastai/data/mnist_png x: ImageItemList (60000 items) [Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28)]... Path: /home/stas/.fastai/data/mnist_png; Valid: LabelList y: CategoryList (10000 items) [Category 4, Category 4, Category 4, Category 4, Category 4]... Path: /home/stas/.fastai/data/mnist_png x: ImageItemList (10000 items) [Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28)]... Path: /home/stas/.fastai/data/mnist_png; Test: None
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:00.837 ・ CPU: 34 3 2359 MB | ・ GPU: 0 0 503 MB |
#arch="resnet34"
arch="resnet50"
model = getattr(models, arch) # models.resnetXX
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:00.000 ・ CPU: 0 0 2359 MB | ・ GPU: 0 0 503 MB |
learn = create_cnn(data, model, metrics=[accuracy], callback_fns=PeakMemMetric)
#learn.opt_func
#learn.opt_func = partial(optim.SGD, momentum=0.9)
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:01.360 ・ CPU: 0 0 2518 MB | ・ GPU: 106 0 609 MB |
# must leave at least the size of the 2nd epoch peak
# with resnet50
# - with bs=128 it's about 300MB
# - with bs=512 it's about 900MB
x=gpu_mem_leave_free_mbs(300)
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:02.562 ・ CPU: 0 0 2519 MB | ・ GPU: 7210 0 7819 MB |
# some insights on the peak mem spike here:
# https://discuss.pytorch.org/t/high-gpu-memory-usage-problem/34694/2
learn.fit_one_cycle(2, max_lr=1e-2)
| epoch | train_loss | valid_loss | accuracy | cpu used | peak | gpu used | peak |
|---|---|---|---|---|---|---|---|
| 1 | 0.128634 | 0.064438 | 0.981300 | 17 | 17 | 46 | 294 |
| 2 | 0.047700 | 0.023343 | 0.991700 | 5 | 6 | 0 | 226 |
・ RAM: △Consumed △Peaked Used Total | Exec time 0:01:32.669 ・ CPU: 0 0 2544 MB | ・ GPU: 38 256 7857 MB |
# can free memory if need be (useful on OOM so won't need to restart the kernel)
del x
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:00.000 ・ CPU: 0 0 2544 MB | ・ GPU: -7210 7210 647 MB |
learn.save(f'reload1')
_=learn.load(f'reload1')
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:00.287 ・ CPU: 0 0 2544 MB | ・ GPU: 2 128 649 MB |
learn.export()
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:00.181 ・ CPU: 0 0 2477 MB | ・ GPU: 0 0 649 MB |
learn = load_learner(path, test=ImageItemList.from_folder(path/'testing'))
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:00.131 ・ CPU: 3 0 2480 MB | ・ GPU: 94 0 743 MB |
learn.data.test_ds
len(learn.data.test_ds)
LabelList y: EmptyLabelList (10000 items) [EmptyLabel , EmptyLabel , EmptyLabel , EmptyLabel , EmptyLabel ]... Path: . x: ImageItemList (10000 items) [Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28)]... Path: /home/stas/.fastai/data/mnist_png
10000
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:00.004 ・ CPU: 0 0 2480 MB | ・ GPU: 0 0 743 MB |
the inference peak happens only the first time it's run, same as with fit() and same as with fit() pytorch gracefully frees any overhead if there is little memory left, so all is good here too.
# same as with fit() if there is extra memory, the peak will be much larger, but if there isn't it still works.
x=gpu_mem_leave_free_mbs(200)
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:02.543 ・ CPU: 0 0 2480 MB | ・ GPU: 7176 0 7919 MB |
predictions = learn.get_preds(ds_type=DatasetType.Test)
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:02.487 ・ CPU: 0 0 2480 MB | ・ GPU: 0 190 7919 MB |
# re-run to check peak consumption
predictions = learn.get_preds(ds_type=DatasetType.Test)
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:02.323 ・ CPU: 0 0 2480 MB | ・ GPU: 0 30 7919 MB |
del x
・ RAM: △Consumed △Peaked Used Total | Exec time 0:00:00.000 ・ CPU: 0 0 2480 MB | ・ GPU: -7176 0 743 MB |