In [ ]:

%reload_ext autoreload
%autoreload 2

In [ ]:

#export
from nb_003 import *
import nb_002

import operator
from random import sample
from torch.utils.data.sampler import Sampler

In [ ]:

DATA_PATH = Path('data')
PATH = DATA_PATH/'caltech101' # http://www.vision.caltech.edu/Image_Datasets/Caltech101/

Caltech 101¶

Create validation set¶

The first step will be to create a dataset from our files. We need to separate a definite amount of files to be used as our validation set. We will do this randomly by setting a percentage apart, in this case 0.2.

In [ ]:

classes = ["airplanes", "Motorbikes", "BACKGROUND_Google", "Faces", "watch", "Leopards", "bonsai",
    "car_side", "ketch", "chandelier", "hawksbill", "grand_piano", "brain", "butterfly", "helicopter", "menorah",
    "trilobite", "starfish", "kangaroo", "sunflower", "ewer", "buddha", "scorpion", "revolver", "laptop", "ibis", "llama",
    "minaret", "umbrella", "electric_guitar", "crab", "crayfish",]

np.random.seed(42)
train_ds,valid_ds = ImageDataset.from_folder(PATH, test_pct=0.2)

x = train_ds[1114][0]
def xi(): return Image(train_ds[1114][0])
classes = train_ds.classes
c = len(classes)

len(train_ds),len(valid_ds),c

Rectangular affine fix¶

In [ ]:

show_image(x, figsize=(6,3), hide_axis=False)
print(x.shape)

In [ ]:

rot_m = np.array(rotate.func(40.)); rot_m

In [ ]:

rotate(xi(), 40.).show(figsize=(6,3))

In [ ]:

#export
def affine_mult(c,m):
    if m is None: return c
    size = c.size()
    _,h,w,_ = size
    m[0,1] *= h/w
    m[1,0] *= w/h
    c = c.view(-1,2)
    c = torch.addmm(m[:2,2], c,  m[:2,:2].t())
    return c.view(size)

nb_002.affine_mult = affine_mult

In [ ]:

rotate(xi(), 40.).show(figsize=(6,3))

Crop with padding¶

Now we are going to add padding or crop automatically according to a desired final size. The best way to do this is to integrate both transforms into the same function.

We will do the padding necessary to achieve a size x size (square) image. If size is greater than either the height or width dimension of our image, we know we will need to add padding. If size is smaller than either height or width dimension of our image, we will have to crop. We might have to do one, the other, both or neither. In this example we are only adding padding since both our height and width are smaller than 300, our desired dimension for the new height and width.

As is the case with our original function, we can add a row_pct or col_pct to our transform to focus on different parts of the image instead of the center which is our default.

Crop_pad

Crop_pad crops and pads our image to create an output image according to a given target size.

Parameters

Size What is the target size of each side in pixels. If only one number s is specified, image is made square with dimensions s * s.

Domain: Positive integers.
Padding_mode What is the type of padding used in the transform.

Domain: 'reflect', 'zeros', 'border'
Row_pct Determines where to cut our image vertically on the bottom and top when cropping (which rows are left out). If <0.5, more rows will be cut in the top than in the bottom and viceversa (varies linearly).

Domain: Real numbers between 0 and 1.
Col_pct Determines where to cut our image horizontally on the left and right when cropping (which columns are left out). If <0.5, more rows will be cut in the left than in the right and viceversa (varies linearly).

Domain: Real numbers between 0 and 1.

Note: While experimenting take into account that this example image contains a thin black border in the original. This affects our transforms and can be seen when we use reflect padding.

In [ ]:

class TfmCrop(TfmPixel): order=99

@TfmCrop
def crop_pad(x, size, padding_mode='reflect',
             row_pct:uniform = 0.5, col_pct:uniform = 0.5):
    size = listify(size,2)
    rows,cols = size
    if x.size(1)<rows or x.size(2)<cols:
        row_pad = max((rows-x.size(1)+1)//2, 0)
        col_pad = max((cols-x.size(2)+1)//2, 0)
        x = F.pad(x[None], (col_pad,col_pad,row_pad,row_pad), mode=padding_mode)[0]
    row = int((x.size(1)-rows+1)*row_pct)
    col = int((x.size(2)-cols+1)*col_pct)

    x = x[:, row:row+rows, col:col+cols]
    return x.contiguous() # without this, get NaN later - don't know why

In [ ]:

crop_pad(xi(), 300, row_pct=0.,col_pct=0., padding_mode='constant').show()

In [ ]:

crop_pad(xi(), 150).show()

In [ ]:

crop_pad(xi(), 150, row_pct=0.,col_pct=0.98, padding_mode='constant').show()

In [ ]:

tfm = crop_pad(size=100, row_pct=(0,1.), col_pct=(0,1.))

_,axes = plt.subplots(1,4, figsize=(12,3))
for ax in axes.flat:
    tfm.resolve()
    tfm(xi()).show(ax)

Combine crop/resize¶

Next, we are going to combine our cropping and padding with the resize operation. In other words, we will get a picture, and crop/pad it in such a way that we get our desired size. It is similar to our previous transform only this time the final dimensions don't have to be square. This gives us more flexibility since our network architecture might take rectangular pictures as input.

First, we will get the target dimensions. For this we have built get_crop_target. This function takes three arguments: a target_px, a target_aspect and a multiple. target_px is our base dimension, target_aspect is our relation between width and height and mult is what do we need our dimensions to be a multiple of.

To understand this better, let's take our example where our values are target_px=220, target_aspect=2., mult=32 (default). In plain text we are telling our function: return the dimensions that meet a ~220*220 area image with a width twice as long as the height and where height and width are multiples of 32.

We are now going to transform our image to our desired dimensions by using crop or padding. Before we crop or pad we will make an intermediate transform that will allow us to later get our output image with the desired dimensions. Let's call our initial dimensions h_i, w_i, our intermediate dimensions h_m, w_m and our output dimensions h_o, w_o. Our objective will be to get our output image by either cropping or padding, but not both.

We will first enlarge or reduce our original image. get_resize_target will enlarge or reduce our input image (keeping the shape or h_i/w_i constant) until one of the dimensions is equal to the corresponding final output dimension (i.e. h_m=h_o or w_m=w_o). But how does it know which dimension to equate? Let's think about this in detail.

If we intend to crop, our intermediate image's area has to be larger than our output image (since we are going to crop out some pixels) and if we intend to pad, our intermediate image's area has to be smaller than our output image (since we will add some pixels). This means that the dimension we will chose to equate will depend on the relationship between the ratios h_i/h_o and w_i/w_o. If we want to crop we will want to equate the dimension with the smallest ratio since that would mean that (h_m, w_m) >= (h_o, w_o) which is exactly what we want (a larger area). Conversely if we want to pad, we will equate the dimension with the largest ratio since that will guarantee that (h_m, w_m) <= (h_o, w_o) (a smaller area).

As an example say we have our image with dimensions h_i = 192 and w_i = 128 and our target dimensions are h_o=160 w_o=320. That is, we have to turn a vertical rectangle into a horizontal rectangle. We can do this in to ways:

Padding the borders so we make our image wider
Cropping the top and bottom so we squash our image and make it wider

If we intend to crop, our intermediate dimensions will be (h_m, w_m) = (480, 320). If we intend to pad (h_m, w_m) = (160, 107). Note that 480/320 ≈ 160/107 ≈ 192/128, that is our intermidiate image's aspect ratio is always equal to our input image's aspect ratio.

In [ ]:

#export
def round_multiple(x, mult): return (int(x/mult+0.5)*mult)

def get_crop_target(target_px, target_aspect=None, mult=32):
    target_px = listify(target_px, 2)
    target_r,target_c = target_px
    if target_aspect:
        target_r = math.sqrt(target_r*target_c/target_aspect)
        target_c = target_r*target_aspect
    return round_multiple(target_r,mult),round_multiple(target_c,mult)

In [ ]:

get_crop_target(220)

In [ ]:

get_crop_target((220,110))

In [ ]:

crop_target = get_crop_target(220, 2.);
target_r,target_c = crop_target
crop_target, target_r*target_c

In [ ]:

_,r,c = x.shape; x.shape

In [ ]:

#export
@partial(Transform, order=99)
def crop_pad(img, size=None, mult=32, padding_mode=None,
             row_pct:uniform = 0.5, col_pct:uniform = 0.5):
    aspect = img.aspect if hasattr(img, 'aspect') else 1.
    if not size and hasattr(img, 'size'): size = img.size
    if not padding_mode:
        if hasattr(img, 'sample_kwargs') and ('padding_mode' in img.sample_kwargs):
            padding_mode = img.sample_kwargs['padding_mode']
        else: padding_mode='reflect'
    if padding_mode=='zeros': padding_mode='constant'

    rows,cols = get_crop_target(size, aspect, mult=mult)
    x = img.px
    if x.size(1)<rows or x.size(2)<cols:
        row_pad = max((rows-x.size(1)+1)//2, 0)
        col_pad = max((cols-x.size(2)+1)//2, 0)
        x = F.pad(x[None], (col_pad,col_pad,row_pad,row_pad), mode=padding_mode)[0]
    row = int((x.size(1)-rows+1)*row_pct)
    col = int((x.size(2)-cols+1)*col_pct)

    x = x[:, row:row+rows, col:col+cols]
    img.px = x.contiguous() # without this, get NaN later - don't know why
    return img

In [ ]:

img = xi()
img.aspect = 2
img = crop_pad(img, 220)
img.show(figsize=(9,3))
img.shape

We are now going to transform our image to our desired dimensions by using crop or padding. Before we crop or pad we will make an intermediate transform that will allow us to later get our output image with the desired dimensions. Let's call our initial dimensions h_i, w_i, our intermediate dimensions h_m, w_m and our output dimensions h_o, w_o.

Our objective will be to get our output image by cropping or padding but not both. To achive this, we will first enlarge or reduce our original image. get_resize_target will enlarge or reduce our input image (keeping the shape or h_i/w_i constant) until one of the dimensions is equal to the corresponding final output dimension (i.e. h_m=h_o or w_m=w_o). But how does it know which dimension to equate? We can figure this out intuitively. If we intend to crop, our intermediate image's area has to be larger than our output image (since we are going to crop out some pixels) and if we intend to pad, our intermediate image's area has to be smaller than our output image (since we will add some pixels). This means that the dimension we will chose to equate will depend on the relationship between the ratios h_i/h_0 and w_i/w_o. If we want to crop we will want to equate the dimension with the smallest ratio since that would mean that (h_m, w_m) >= (h_o, w_o) which is exactly what we want (a larger area). Conversely if we want to pad, we will equate the dimension with the largest ratio since that will guarantee that (h_m, w_m) <= (h_o, w_o) (a smaller area).

As an example say we have our image with dimensions h_i = 192 and w_i = 128 and our target dimensions are h_o=160 w_o=320. That is, we have to turn a vertical rectangle into a horizontal rectangle. We can do this in to ways:

Padding the borders so we make our image wider
Cropping the top and bottom so we squash our image and make it wider

If we intend to crop, our intermediate dimensions will be (h_m, w_m) = (480, 320). If we intend to pad (h_m, w_m) = (160, 107). Note that 480/320 ≈ 160/107 ≈ 192/128.

In [ ]:

r_ratio = r/target_r
c_ratio = c/target_c
# min -> crop; max -> pad
ratio = max(r_ratio,c_ratio)
r_ratio,c_ratio,ratio

In [ ]:

r2,c2 = round(r/ratio),round(c/ratio); r2,c2

In [ ]:

#export
def get_resize_target(img, crop_target, do_crop=False):
    if crop_target is None: return None
    ch,r,c = img.shape
    target_r,target_c = crop_target
    ratio = (min if do_crop else max)(r/target_r, c/target_c)
    return ch,round(r/ratio),round(c/ratio)

In [ ]:

get_resize_target(x, crop_target, False)

In [ ]:

get_resize_target(x, crop_target, True)

In [ ]:

#export
@partial(Transform, order=TfmAffine.order-2)
def resize_image(x, *args, **kwargs): return x.resize(*args, **kwargs)

def _resize(self, size=None, do_crop=False, mult=32):
    assert self._flow is None
    if not size and hasattr(self, 'size'): size = self.size
    aspect = self.aspect if hasattr(self, 'aspect') else None
    crop_target = get_crop_target(size, aspect, mult=mult)
    target = get_resize_target(self, crop_target, do_crop)
    self.flow = affine_grid(target)
    return self

Image.resize=_resize

In [ ]:

img = xi()

In [ ]:

img.aspect = 2
img.resize(220)
img.show(figsize=(9,3))
img.shape

In [ ]:

img = xi()
img.aspect = 2
img.resize(220, do_crop=True)
img.show(figsize=(9,3))
img.shape

In [ ]:

#export
def is_listy(x)->bool: return isinstance(x, (tuple,list))

def apply_tfms(tfms, x, do_resolve=True, xtra=None, aspect=None, size=None,
               padding_mode='reflect', **kwargs):
    if not tfms: return x
    if not xtra: xtra={}
    tfms = sorted(listify(tfms), key=lambda o: o.tfm.order)
    if do_resolve: resolve_tfms(tfms)
    x = Image(x.clone())
    x.set_sample(padding_mode=padding_mode, **kwargs)
    x.aspect = aspect
    x.size = size
    
    for tfm in tfms:
        if tfm.tfm in xtra: x = tfm(x, **xtra[tfm.tfm])
        x = tfm(x)
    return x.px

nb_002.apply_tfms = apply_tfms

import nb_002b
nb_002b.apply_tfms = apply_tfms

In [ ]:

tfms = [resize_image(size=crop_target),
        rotate(degrees=(40.,40.))]

img = apply_tfms(tfms, x)
show_image(img, figsize=(6,3))
crop_target,img.shape

In [ ]:

tfms = [resize_image(size=crop_target, do_crop=True),
        rotate(degrees=(40.,40.))]

img = apply_tfms(tfms, x, aspect=2)
show_image(img, figsize=(6,3))
img.shape

In [ ]:

tfms = [resize_image(size=220),
        rotate(degrees=(40.,40.))]

img = apply_tfms(tfms, x, aspect=2)
show_image(img, figsize=(6,3))
get_crop_target(220, 2),img.shape

In [ ]:

tfms = [rotate(degrees=(40.,40.)), crop_pad(size=220)]

img = apply_tfms(tfms, x, aspect=2)
show_image(img, figsize=(6,3))
img.shape

In [ ]:

tfms = [rotate(degrees=(40.,40.)),
        resize_image(),
        crop_pad()]

img = apply_tfms(tfms, x, aspect=2, size=220)
show_image(img, figsize=(6,3))
get_crop_target(220,2), img.shape

1¶

In [ ]:

def resize_crop(size=None, do_crop=False, mult=32, rand_crop=False):
    crop_kw = {'row_pct':(0,1.),'col_pct':(0,1.)} if rand_crop else {}
    return [resize_image(size=size, do_crop=do_crop, mult=mult),
           crop_pad(size=size, mult=mult, **crop_kw)]

In [ ]:

tfms = [rotate(degrees=(40.,40.)), *resize_crop()]

img = apply_tfms(tfms, x, aspect=2, size=220)
show_image(img, figsize=(6,3))
get_crop_target(220,2), img.shape

In [ ]:

tfms = [rotate(degrees=(40.,40.)), *resize_crop(do_crop=True)]
img = apply_tfms(tfms, x, size=220, aspect=2)
show_image(img, figsize=(6,3))
img.shape

In [ ]:

tfms = [rotate(degrees=(40.,40.)), *resize_crop(do_crop=False)]
img = apply_tfms(tfms, x, size=220, aspect=2, padding_mode='zeros')
show_image(img, figsize=(6,3))
img.shape

Fit¶

Let's see how our transforms look for different values of zoom, rotate and crop_pad.

Transform¶

In [ ]:

#export
def rand_zoom(*args, **kwargs): return zoom(*args, row_pct=(0,1), col_pct=(0,1), **kwargs)
def rand_crop(*args, **kwargs): return crop_pad(*args, row_pct=(0,1), col_pct=(0,1), **kwargs)
def zoom_crop(scale, do_rand=False, p=1.0):
    zoom_fn = rand_zoom if do_rand else zoom
    crop_fn = rand_crop if do_rand else crop_pad
    return [zoom_fn(scale=scale, p=p), crop_fn()]

In [ ]:

tfms = [
    rotate(degrees=(-20,20.)),
    rand_zoom(scale=(1.,1.95)),
    *resize_crop(size=100, rand_crop=True, do_crop=False)
]

_,axes = plt.subplots(1,4, figsize=(12,3))
for ax in axes.flat:
    show_image(apply_tfms(tfms, x, padding_mode='zeros'), ax)

In [ ]:

tfms = [
    rotate(degrees=(-20,20.)),
    rand_zoom(scale=(1.,1.95)),
    *resize_crop(size=100, rand_crop=True, do_crop=True)
]

_,axes = plt.subplots(1,4, figsize=(12,3))
for ax in axes.flat:
    show_image(apply_tfms(tfms, x), ax)

Fit¶

Finally, with our choice of transforms and parameters we are going to fit our Darknet model and check our results. To fit our model we will need to resize our images to have the same size so we can feed them in batches to our model. We face the same decisions as before.

In this case we chose to pad our images (since in _apply_affine do_crop default is False). If we wanted to crop instead, we can easily add do_crop=True to train_tds.

We also decided to make our images square, with dimension size x size. If we wanted a rectangle with width to height ratio a we could have added aspect=a to train_ds.

In [ ]:

[PIL.Image.open(fn).size for fn in np.random.choice(train_ds.x, 5)]

In [ ]:

size = 150

In [ ]:

train_tfms = [
    rotate(degrees=(-20,20.)),
    rand_zoom(scale=(1.,1.5)),
    *resize_crop(size=size, rand_crop=True, do_crop=True)
]
valid_tfms = [
    *resize_crop(size=size, rand_crop=False, do_crop=True)
]

In [ ]:

_,axes = plt.subplots(1,4, figsize=(10,5))
for ax in axes.flat: show_image(apply_tfms(train_tfms, x), ax)

In [ ]:

show_image(apply_tfms(valid_tfms, x, size=size))

In [ ]:

bs = 128

In [ ]:

valid_tds = DatasetTfm(valid_ds, valid_tfms, padding_mode='zeros')
data = DataBunch(valid_tds, valid_tds, bs=bs, num_workers=0)
xb,yb = next(iter(data.train_dl))
b = xb.transpose(1,0).reshape(3,-1)
data_mean=b.mean(1).cpu()
data_std=b.std(1).cpu()
data_mean,data_std

In [ ]:

show_image_batch(data.train_dl, train_ds.classes, 4)

In [ ]:

valid_tds = DatasetTfm(valid_ds, valid_tfms, padding_mode='zeros')
train_tds = DatasetTfm(train_ds, train_tfms, padding_mode='zeros')

In [ ]:

norm,denorm = normalize_funcs(data_mean,data_std)

In [ ]:

data = DataBunch(train_tds, valid_tds, bs=bs, num_workers=12, tfms=norm)
len(data.train_dl),len(data.valid_dl)

In [ ]:

model = Darknet([1, 2, 4, 4, 2], num_classes=c, nf=16)
learn = Learner(data, model)
opt_fn = partial(optim.SGD, momentum=0.9)

In [ ]:

learn.fit(1, 0.1, opt_fn=opt_fn)

In [ ]:

learn.fit(1, 0.2, opt_fn=opt_fn)

In [ ]:

learn.fit(5, 0.4, opt_fn=opt_fn)

In [ ]:

learn.fit(5, 0.1, opt_fn=opt_fn)

In [ ]:

learn.fit(5, 0.01, opt_fn=opt_fn)

Fin¶

In [ ]: