Tracking memory leaks / memory fragmentation¶

This notebook is for testing that the peak memory consumption is efficient and doesn't necessarily require more GPU RAM than needed.

The detection comes from reading the output of IPyExperimentsPytorch per-cell reports and fastai.callbacks.mem.PeakMemMetric metric.

In [ ]:

%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [ ]:

from fastai.vision import *
from fastai.utils.mem import *
from fastai.callbacks.mem import *

In [ ]:

from pathlib import Path
import numpy as np

In [ ]:

#! pip install ipyexperiments
from ipyexperiments import IPyExperimentsPytorch
from ipyexperiments.utils.mem import *

In [ ]:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
assert str(device) == 'cuda:0', f"we want GPU, got {device}"

In [ ]:

from IPython.display import Markdown, display
def alert(string, color='red'):
    display(Markdown(f"<span style='color:{color}'>**{string}**</span>"))

Prep dataset¶

In [ ]:

exp1 = IPyExperimentsPytorch()

*** Experiment started with the Pytorch backend
Device: ID 0, GeForce GTX 1070 Ti (8119 RAM)


*** Current state:
RAM:  Used  Free  Total      Util
CPU:  2275 18518  31588 MB   7.20% 
GPU:   503  7616   8119 MB   6.19% 


･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:00.000
･ CPU:         0       0     2275 MB |
･ GPU:         0       0      503 MB |

In [ ]:

path = untar_data(URLs.MNIST)

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:00.003
･ CPU:         0       1     2277 MB |
･ GPU:         0       0      503 MB |

In [ ]:

# setup
defaults.cmap='binary'

tfms = ([*rand_pad(padding=3, size=28, mode='zeros')], [])
num_workers=0
#bs=512
bs=128
data = (ImageItemList.from_folder(path, convert_mode='L')
      .split_by_folder(train='training', valid='testing')
      .label_from_folder()
      .transform(tfms)
      .databunch(bs=bs, num_workers=num_workers)
      .normalize(imagenet_stats)
       )  
data

Out[ ]:

ImageDataBunch;

Train: LabelList
y: CategoryList (60000 items)
[Category 4, Category 4, Category 4, Category 4, Category 4]...
Path: /home/stas/.fastai/data/mnist_png
x: ImageItemList (60000 items)
[Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28)]...
Path: /home/stas/.fastai/data/mnist_png;

Valid: LabelList
y: CategoryList (10000 items)
[Category 4, Category 4, Category 4, Category 4, Category 4]...
Path: /home/stas/.fastai/data/mnist_png
x: ImageItemList (10000 items)
[Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28), Image (1, 28, 28)]...
Path: /home/stas/.fastai/data/mnist_png;

Test: None

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:00.837
･ CPU:        34       3     2359 MB |
･ GPU:         0       0      503 MB |

Train and Validate¶

In [ ]:

#arch="resnet34"
arch="resnet50"
model = getattr(models, arch) # models.resnetXX

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:00.000
･ CPU:         0       0     2359 MB |
･ GPU:         0       0      503 MB |

In [ ]:

learn = create_cnn(data, model, metrics=[accuracy], callback_fns=PeakMemMetric)
#learn.opt_func
#learn.opt_func = partial(optim.SGD, momentum=0.9)

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:01.360
･ CPU:         0       0     2518 MB |
･ GPU:       106       0      609 MB |

In [ ]:

# must leave at least the size of the 2nd epoch peak
# with resnet50
# - with bs=128 it's about 300MB
# - with bs=512 it's about 900MB
x=gpu_mem_leave_free_mbs(300)

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:02.562
･ CPU:         0       0     2519 MB |
･ GPU:      7210       0     7819 MB |

In [ ]:

# some insights on the peak mem spike here:
# https://discuss.pytorch.org/t/high-gpu-memory-usage-problem/34694/2

learn.fit_one_cycle(2, max_lr=1e-2)

Total time: 01:32

epoch	train_loss	valid_loss	accuracy	cpu used	peak	gpu used	peak
1	0.128634	0.064438	0.981300	17	17	46	294
2	0.047700	0.023343	0.991700	5	6	0	226

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:01:32.669
･ CPU:         0       0     2544 MB |
･ GPU:        38     256     7857 MB |

In [ ]:

# can free memory if need be (useful on OOM so won't need to restart the kernel)
del x

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:00.000
･ CPU:         0       0     2544 MB |
･ GPU:     -7210    7210      647 MB |

In [ ]:

learn.save(f'reload1')
_=learn.load(f'reload1')

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:00.287
･ CPU:         0       0     2544 MB |
･ GPU:         2     128      649 MB |

In [ ]:

learn.export()

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:00.181
･ CPU:         0       0     2477 MB |
･ GPU:         0       0      649 MB |

Inference via learn.export¶

In [ ]:

learn = load_learner(path, test=ImageItemList.from_folder(path/'testing'))

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:00.131
･ CPU:         3       0     2480 MB |
･ GPU:        94       0      743 MB |

In [ ]:

learn.data.test_ds
len(learn.data.test_ds)

Out[ ]:

LabelList
y: EmptyLabelList (10000 items)
[EmptyLabel , EmptyLabel , EmptyLabel , EmptyLabel , EmptyLabel ]...
Path: .
x: ImageItemList (10000 items)
[Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28), Image (3, 28, 28)]...
Path: /home/stas/.fastai/data/mnist_png

Out[ ]:

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:00.004
･ CPU:         0       0     2480 MB |
･ GPU:         0       0      743 MB |

the inference peak happens only the first time it's run, same as with fit() and same as with fit() pytorch gracefully frees any overhead if there is little memory left, so all is good here too.

In [ ]:

# same as with fit() if there is extra memory, the peak will be much larger, but if there isn't it still works.
x=gpu_mem_leave_free_mbs(200)

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:02.543
･ CPU:         0       0     2480 MB |
･ GPU:      7176       0     7919 MB |

In [ ]:

predictions = learn.get_preds(ds_type=DatasetType.Test)

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:02.487
･ CPU:         0       0     2480 MB |
･ GPU:         0     190     7919 MB |

In [ ]:

# re-run to check peak consumption
predictions = learn.get_preds(ds_type=DatasetType.Test)

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:02.323
･ CPU:         0       0     2480 MB |
･ GPU:         0      30     7919 MB |

In [ ]:

del x

･ RAM: △Consumed △Peaked  Used Total | Exec time 0:00:00.000
･ CPU:         0       0     2480 MB |
･ GPU:     -7176       0      743 MB |