Viewing inputs and outputs¶

In [ ]:

from fastai import *
from fastai.gen_doc.nbdoc import *

In this tutorial, we'll see how the same API allows you to get a look at the inputs and outputs of your model, whether in the vision, text or tabular application. We'll go over a lot of different tasks and each time, grab some data in a DataBunch with the data block API, see how to get a look at a few inputs with the show_batch method, train an appropriate Learner then use the show_results method to see what the outputs of our model actually look like.

In [ ]:

jekyll_note("""As usual, this page is generated from a notebook that you can find in the docs_srs folder of the
[fastai repo](https://github.com/fastai/fastai). The examples are all designed to run fast, which is why we use
samples of the dataset, a resnet18 as a backbone and don't train for very long. You can change all of those parameters
to run your own experiments!
""")

Note: As usual, this page is generated from a notebook that you can find in the docs_srs folder of the [fastai repo](https://github.com/fastai/fastai). The examples are all designed to run fast, which is why we use samples of the dataset, a resnet18 as a backbone and don't train for very long. You can change all of those parameters to run your own experiments!

Vision¶

To quickly get acces to all the vision functions inside fastai, we use the usual import statements.

In [ ]:

from fastai import *
from fastai.vision import *

A classification problem¶

Let's begin with our sample of the MNIST dataset.

In [ ]:

mnist = untar_data(URLs.MNIST_TINY)
tfms = get_transforms(do_flip=False)

It's set up with an imagenet structure so we use it to split our training and validation set, then labelling.

In [ ]:

data = (ImageItemList.from_folder(mnist)
        .split_by_folder()          
        .label_from_folder()
        .transform(tfms, size=32)
        .databunch()
        .normalize(imagenet_stats)) 

Once your data is properly set up in a DataBunch, we can call data.show_batch() to see what a sample of a batch looks like.

In [ ]:

data.show_batch()

Note that the images were automatically de-normalized before being showed with their labels (inferred from the names of the folder). We can specify a number of rows if the default of 5 is too big, and we can also limit the size of the figure.

In [ ]:

data.show_batch(rows=3, figsize=(4,4))

Now let's create a Learner object to train a classifier.

In [ ]:

learn = create_cnn(data, models.resnet18, metrics=accuracy)
learn.fit_one_cycle(1,1e-2)
learn.save('mini_train')

Total time: 00:02
epoch  train_loss  valid_loss  accuracy
1      0.530551    0.120395    0.961373  (00:02)

Our model has quickly reache around 89% accuracy, now let's see its predictions on a sample of the validation set. For this, we use the show_results method.

In [ ]:

learn.show_results()

Since the validation set is usually sorted, we get only images belonging to the same class. We can then again specify a number of rows, a figure size, but also the dataset on which we want to make predictions.

In [ ]:

learn.show_results(ds_type=DatasetType.Train, rows=4, figsize=(4,8))

A multilabel problem¶

Now let's try these on the planet dataset, which is a little bit different in the sense that each image can have multiple tags (and not jsut one label).

In [ ]:

planet = untar_data(URLs.PLANET_TINY)
planet_tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)

Here each images is labelled in a file named 'labels.csv'. We have to add 'train' as a prefix to the filenames, '.jpg' as a suffix and he labels are separated by spaces.

In [ ]:

data = (ImageItemList.from_csv(planet, 'labels.csv', folder='train', suffix='.jpg')
        .random_split_by_pct()
        .label_from_df(sep=' ')
        .transform(planet_tfms, size=128)
        .databunch()
        .normalize(imagenet_stats))

And we can have look at our data with data.show_batch.

In [ ]:

data.show_batch(rows=2, figsize=(9,7))

Then we can then create a Learner object pretty easily and train it for a little bit.

In [ ]:

learn = create_cnn(data, models.resnet18)
learn.fit_one_cycle(5,1e-2)
learn.save('mini_train')

Total time: 00:06
epoch  train_loss  valid_loss
1      0.801522    0.747786    (00:02)
2      0.734977    0.702085    (00:00)
3      0.665771    0.608365    (00:00)
4      0.604124    0.462342    (00:00)
5      0.553832    0.399360    (00:00)

And to see actual predictions, we just have to run learn.show_results().

In [ ]:

learn.show_results(figsize=(12,15))

A regression example¶

For the next example, we are going to use the BIWI head pose dataset. On pictures of persons, we have to find the center of their face. For the fastai docs, we have built a small subsample of the dataset (200 images) and prepared a dictionary for the correspondance fielname to center.

In [ ]:

biwi = untar_data(URLs.BIWI_SAMPLE)
fn2ctr = pickle.load(open(biwi/'centers.pkl', 'rb'))

To grab our data, we use this dictionary to label our items. We also use the PointsItemList class to have the targets be of type ImagePoints (which will make sure the data augmentation is properly applied to them). When calling transform we make sure to set tfm_y=True.

In [ ]:

data = (ImageItemList.from_folder(biwi)
        .random_split_by_pct()
        .label_from_func(lambda o:fn2ctr[o.name], label_cls=PointsItemList)
        .transform(get_transforms(), tfm_y=True, size=(120,160))
        .databunch()
        .normalize(imagenet_stats))

Then we can have a first look at our data with data.show_batch().

In [ ]:

data.show_batch(rows=3, figsize=(9,6))

We train our model for a little bit before using learn.show_results().

In [ ]:

learn = create_cnn(data, models.resnet18)
learn.fit_one_cycle(5, 3e-3)
learn.save('mini_train')

Total time: 00:07
epoch  train_loss  valid_loss
1      1.250838    0.975605    (00:03)
2      1.157905    1.041689    (00:01)
3      1.282076    0.414419    (00:01)
4      1.265881    0.329879    (00:01)
5      1.235147    0.298493    (00:01)

In [ ]:

learn.show_results(rows=3)

A segmentation example¶

Now we are going to look at the camvid dataset (at least a small sample of it), where we have to predict the class of each pixel in an image. Each image in the 'images' subfolder as an equivalent in 'labels' that is its segmentations mask.

In [ ]:

camvid = untar_data(URLs.CAMVID_TINY)
path_lbl = camvid/'labels'
path_img = camvid/'images'

We read the classes in 'codes.txt' and the function maps each image filename with its corresponding mask filename.

In [ ]:

codes = np.loadtxt(camvid/'codes.txt', dtype=str)
get_y_fn = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'

The data block API allows us to uickly get everything in a DataBunch and then we can have a look with show_batch.

In [ ]:

data = (SegmentationItemList.from_folder(path_img)
        .random_split_by_pct()
        .label_from_func(get_y_fn, classes=codes)
        .transform(get_transforms(), tfm_y=True, size=128)
        .databunch(bs=16, path=camvid)
        .normalize(imagenet_stats))

In [ ]:

data.show_batch(rows=2, figsize=(7,5))

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Then we train a Unet for a few epochs.

In [ ]:

jekyll_warn("This training is fairly unstable, you should use more epochs and the full dataset to get better results.")

Warning: This training is fairly unstable, you should use more epochs and the full dataset to get better results.

In [ ]:

learn = Learner.create_unet(data, models.resnet18)
learn.fit_one_cycle(3,1e-2)
learn.save('mini_train')

Total time: 00:25
epoch  train_loss  valid_loss
1      2.746014    nan         (00:19)
2      2.102728    nan         (00:02)
3      1.766212    nan         (00:02)

In [ ]:

learn.show_results()

Text¶

Next application is text, so let's start by importing everything we'll need.

In [ ]:

from fastai import *
from fastai.text import *

Language modelling¶

First we'll fine-tune a pretrained language model on our subset of imdb.

In [ ]:

imdb = untar_data(URLs.IMDB_SAMPLE)

In [ ]:

data_lm = (TextList.from_csv(imdb, 'texts.csv', cols='text')
                   .random_split_by_pct()
                   .label_for_lm()
                   .databunch())
data_lm.save()

data.show_batch() will work here as well. For a language model, it shows us the beginning of each sequence of text along the batch dimension (the target being to guess the next word).

In [ ]:

data_lm.show_batch()

idx	text
0	xxbos \n\n i saw this on the xxmaj sci - xxmaj fi channel . xxmaj it came on right after the first one . xxmaj for some reason this movie kept me interested . i do n't know why , stop asking . \n\n xxup xxunk xxmaj okay ... xxmaj it was cheesy how this guy got involved with the making of the movie . xxmaj in the first movie
1	xxmaj johnson starts xxunk - singing while she does high steps and xxunk her xxunk in her attempt to teach this dance to the xxmaj african women . xxmaj meanwhile , they just stand there staring at her , apparently wondering what this crazy white woman is trying to accomplish . xxmaj it 's a very funny scene , but it has unpleasant undertones . xxmaj osa xxmaj johnson is
2	pretty cool guy ! i know i would n't have the xxunk to go even 5 feet away from a croc . \n\n xxmaj but , everything in this movie is bad . xxmaj xxunk jokes , people getting xxunk , and the skit about the xxmaj president all make the movie one of the worst of all time . \n\n xxmaj it 's a really bad film that you
3	one has commented that this movie is just a cheap knock off of xxup re . xxmaj first , a " special " commando force is the unique defense for a facility with a computer matrix that has an xxup ai and xxunk xxunk . xxmaj and this " xxmaj xxunk " rip - off has a series of xxunk that inevitably kill off one member of the xxunk at
4	what laughs we made were from the stupidity of the plot than at anything amusing . xxmaj even the xxunk during the credits were n't very funny . xxmaj ultimately i was left with nothing except a desire to warn people away from this movie . \n\n xxmaj rating : 3 xxbos xxmaj after high - school xxunk , best friends xxmaj alice and xxmaj xxunk , decide to take

Now let's define a language model learner

In [ ]:

learn = language_model_learner(data_lm, pretrained_model=URLs.WT103_1)
learn.fit_one_cycle(2, 1e-2)
learn.save('mini_train_lm')
learn.save_encoder('mini_train_encoder')

Total time: 00:40
epoch  train_loss  valid_loss  accuracy
1      4.681728    3.901660    0.286583  (00:20)
2      4.420851    3.837718    0.290115  (00:20)

Then we can have a look at the results. It shows a certain amount of words (default 20), then the next 20 target words and the ones that were predicted.

In [ ]:

learn.show_results()

text	target	pred
xxbos xxmaj shot into car from through the xxunk , someone is playing someone else their latest song , someone	did n't react , according to the voice - over . i just wonder how that came to be made	's . know . and to the story of over . xxmaj 'm thought why much was . be .
on ! xxmaj did you know that , on the set of " xxmaj xxunk xxmaj xxunk " , he	to xxunk poor xxmaj xxunk xxmaj xxunk , by telling her : " xxmaj oh , xxmaj xxunk , how	to be the xxunk xxunk xxmaj xxunk , and the him that " xxmaj the , xxmaj xxunk ! xxmaj
underground at the middle of the night , but she 's just stupid like that . xxmaj so the xxunk	the next , and last , train will come in 7 minutes . xxmaj now xxmaj kate , dumb party	that xxunk day " the , xxunk is be to . minutes . xxmaj the , xxunk xxmaj who ,
a " xxunk " xxunk xxunk ) and the casting is xxunk , with special xxunk to " xxmaj doc	, who xxunk in a " xxmaj mac xxmaj fly " character , and to xxmaj xxunk , who seems	and " was the the xxunk xxunk xxunk " xxunk " , . and xxmaj " xxunk xxmaj who is
of the concept . xxmaj the play , " xxmaj sister xxmaj mary xxmaj xxunk xxmaj explains xxmaj it xxmaj	xxmaj for xxmaj you , " was presented -- at least in xxmaj hollywood -- in precisely the same tone	" about xxmaj the " " is written as the the until the xxunk 's as the the same way

Classification¶

Now let's see a classification example. We have to use the same vocabulary as for the language model if we want to be able to use the encoder we saved.

In [ ]:

data_clas = (TextList.from_csv(imdb, 'texts.csv', cols='text', vocab=data_lm.vocab)
                   .split_from_df(col='is_valid')
                   .label_from_df(cols='label')
                   .databunch(bs=42))

Here show_batch shows the beginning of each review with its target.

In [ ]:

data_clas.show_batch()

text	target
xxbos xxmaj raising xxmaj victor xxmaj vargas : a xxmaj review \n\n xxmaj you know , xxmaj raising xxmaj victor xxmaj vargas is like sticking your hands into a big , xxunk bowl of xxunk . xxmaj it 's warm and gooey , but you 're not sure if it feels right . xxmaj try as i might , no matter how warm and gooey xxmaj raising xxmaj victor xxmaj	negative
xxbos xxup the xxup shop xxup around xxup the xxup corner is one of the xxunk and most feel - good romantic comedies ever made . xxmaj there 's just no getting around that , and it 's hard to actually put one 's feeling for this film into words . xxmaj it 's not one of those films that tries too hard , nor does it come up with	positive
xxbos xxmaj now that xxmaj che(2008 ) has finished its relatively short xxmaj australian cinema run ( extremely limited xxunk screen in xxmaj xxunk , after xxunk ) , i can xxunk join both xxunk of " xxmaj at xxmaj the xxmaj movies " in taking xxmaj steven xxmaj soderbergh to task . \n\n xxmaj it 's usually satisfying to watch a film director change his style / subject ,	negative
xxbos xxmaj this film sat on my xxmaj xxunk for weeks before i watched it . i xxunk a self - xxunk xxunk flick about relationships gone bad . i was wrong ; this was an xxunk xxunk into the xxunk - up xxunk of xxmaj new xxmaj xxunk . \n\n xxmaj the xxunk is the same as xxmaj xxunk xxmaj xxunk ' " xxmaj la xxmaj xxunk , "	positive
xxbos xxmaj many xxunk that this is n't just a classic due to the fact that it 's the first xxup 3d game , or even the first xxunk - up . xxmaj it 's also one of the first xxunk games , one of the xxunk definitely the first ) truly claustrophobic games , and just a pretty well - xxunk gaming experience in general . xxmaj with graphics	positive

And we can train a classifier that uses our previous encoder.

In [ ]:

learn = text_classifier_learner(data_clas)
learn.load_encoder('mini_train_encoder')
learn.fit_one_cycle(2, slice(1e-3,1e-2))
learn.save('mini_train_clas')

Total time: 00:54
epoch  train_loss  valid_loss  accuracy
1      0.675724    0.773513    0.600000  (00:28)
2      0.660357    0.771560    0.525000  (00:25)

In [ ]:

learn.show_results()

text	target	prediction
xxbos \n\n i 'm sure things did n't exactly go the same way in the real life of xxmaj homer xxmaj hickam as they did in the film adaptation of his book , xxmaj rocket xxmaj boys , but the movie " xxmaj october xxmaj sky " ( an xxunk of the book 's title ) is good enough to stand alone . i have not read xxmaj hickam 's	positive	positive
xxbos xxmaj to review this movie , i without any doubt would have to quote that memorable scene in xxmaj tarantino 's " xxmaj pulp xxmaj fiction " ( xxunk ) when xxmaj jules and xxmaj vincent are talking about xxmaj mia xxmaj wallace and what she does for a living . xxmaj jules tells xxmaj vincent that the " xxmaj only thing she did worthwhile was pilot " .	negative	negative
xxbos xxmaj how viewers react to this new " adaption " of xxmaj shirley xxmaj jackson 's book , which was xxunk as xxup not being a remake of the original 1963 movie ( true enough ) , will be based , i suspect , on the following : those who were big fans of either the book or original movie are not going to think much of this one	negative	negative
xxbos xxmaj the trouble with the book , " xxmaj memoirs of a xxmaj geisha " is that it had xxmaj japanese xxunk but underneath the xxunk it was all an xxmaj american man 's way of thinking . xxmaj reading the book is like watching a magnificent ballet with great music , sets , and costumes yet performed by xxunk animals dressed in those xxunk far from xxmaj japanese	negative	negative
xxbos xxmaj bonanza had a great cast of wonderful actors . xxmaj xxunk xxmaj xxunk , xxmaj pernell xxmaj whitaker , xxmaj michael xxmaj xxunk , xxmaj dan xxmaj blocker , and even xxmaj guy xxmaj williams ( as the cousin who was brought in for several episodes during 1964 to replace xxmaj adam when he was leaving the series ) . xxmaj the cast had chemistry , and they	positive	positive

Tabular¶

Last application brings us to tabular data. First let's import everything we'll need.

In [ ]:

from fastai import *
from fastai.tabular import *

We'll use a sample of the adult dataset here. Once we read the csv file, we'll need to specify the dependant variable, the categorical variables, the continuous variables and the processors we want to use.

In [ ]:

adult = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(adult/'adult.csv')
dep_var = '>=50k'
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']
cont_names = ['education-num', 'hours-per-week', 'age', 'capital-loss', 'fnlwgt', 'capital-gain']
procs = [FillMissing, Categorify, Normalize]

Then we can use the data block API to grab everything together before using data.show_batch()

In [ ]:

data = (TabularList.from_df(df, path=adult, cat_names=cat_names, cont_names=cont_names, procs=procs)
                           .split_by_idx(valid_idx=range(800,1000))
                           .label_from_df(cols=dep_var)
                           .databunch())

In [ ]:

data.show_batch()

workclass	education	marital-status	occupation	relationship	race	sex	native-country	education-num_na	education-num	hours-per-week	age	capital-loss	fnlwgt	capital-gain
Private	Some-college	Never-married	Sales	Own-child	White	Male	United-States	False	-0.0312	-1.6556	-1.4357	-0.2164	1.0896	-0.1459
Private	Some-college	Married-civ-spouse	Sales	Husband	White	Male	United-States	False	-0.0312	0.7743	1.3496	-0.2164	1.7284	0.5311
Self-emp-not-inc	Bachelors	Never-married	Sales	Not-in-family	White	Male	United-States	False	1.1422	0.7743	-0.7760	-0.2164	0.0064	-0.1459
Private	11th	Separated	Other-service	Not-in-family	Black	Female	United-States	False	-1.2046	0.1264	-0.5561	-0.2164	-0.7904	-0.1459
Self-emp-not-inc	10th	Never-married	Prof-specialty	Not-in-family	White	Female	United-States	False	-1.5958	-0.6836	-0.9226	-0.2164	0.4192	-0.1459

Here we grab a tabular_learner that we train for a little bit.

In [ ]:

learn = tabular_learner(data, layers=[200,100], metrics=accuracy)
learn.fit(1, 1e-2)
learn.save('mini_train')

Total time: 00:04
epoch  train_loss  valid_loss  accuracy
1      0.342632    0.347381    0.850000  (00:04)

And we can use learn.show_results().

In [ ]:

learn.show_results()

workclass	education	marital-status	occupation	relationship	race	sex	native-country	education-num_na	education-num	hours-per-week	age	capital-loss	fnlwgt	capital-gain	target	prediction
Private	Some-college	Divorced	Handlers-cleaners	Unmarried	White	Female	United-States	True	-0.0312	-0.0356	0.4701	-0.2164	-0.8793	-0.1459	0	0
Self-emp-inc	Prof-school	Married-civ-spouse	Prof-specialty	Husband	White	Male	United-States	True	-0.0312	1.5843	0.5434	-0.2164	0.0290	1.8829	1	1
Private	Assoc-voc	Divorced	#na#	Not-in-family	White	Male	United-States	True	-0.0312	-0.1976	-0.1896	-0.2164	1.7704	-0.1459	0	0
Federal-gov	Bachelors	Never-married	Tech-support	Not-in-family	White	Male	United-States	True	-0.0312	0.3694	-0.9959	-0.2164	-1.3242	-0.1459	0	0
Private	Bachelors	Married-civ-spouse	#na#	Husband	White	Male	United-States	True	-0.0312	-0.0356	-0.1163	-0.2164	-0.2389	-0.1459	0	0