#!/usr/bin/env python # coding: utf-8 # # Get your data ready for training # This module defines the basic [`DataBunch`](/basic_data.html#DataBunch) object that is used inside [`Learner`](/basic_train.html#Learner) to train a model. This is the generic class, that can take any kind of fastai [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) or [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). You'll find helpful functions in the data module of every application to directly create this [`DataBunch`](/basic_data.html#DataBunch) for you. # In[1]: from fastai.gen_doc.nbdoc import * from fastai.basic_data import * # In[2]: show_doc(DataBunch, doc_string=False) # Bind together a `train_dl`, a `valid_dl` and optionally a `test_dl`, ensures they are on `device` and apply to them `tfms` as batch are drawn. `path` is used internally to store temporary files, `collate_fn` is passed to the pytorch `Dataloader` (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in [`vision.image`](/vision.image.html#vision.image) why this can be important). # # An example of `tfms` to pass is normalization. `train_dl`, `valid_dl` and optionally `test_dl` will be wrapped in [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader). # In[3]: show_doc(DataBunch.create, doc_string=False) # Create a [`DataBunch`](/basic_data.html#DataBunch) from `train_ds`, `valid_ds` and optionally `test_ds`, with batch size `bs` and by using `num_workers`. `tfms` and `device` are passed to the init method. # In[4]: show_doc(DataBunch.dl) # In[5]: show_doc(DataBunch.add_tfm) # Adds a transform to all dataloaders. # In[6]: show_doc(DeviceDataLoader, doc_string=False) # Put the batches of `dl` on `device` after applying an optional list of `tfms`. `collate_fn` will replace the one of `dl`. All dataloaders of a [`DataBunch`](/basic_data.html#DataBunch) are of this type. # ### Factory method # In[7]: show_doc(DeviceDataLoader.create, doc_string=False) # Create a [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader) on `device` from a `dataset` with batch size `bs`, `num_workers`processes and a given `collate_fn`. The dataloader will `shuffle` the data if that flag is set to True, and `tfms` are passed to the init method. All `kwargs` are passed to the pytorch [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class initialization. # ### Methods # In[8]: show_doc(DeviceDataLoader.one_batch) # In[9]: show_doc(DeviceDataLoader.add_tfm) # Add a transform (i.e. same as `self.tfms.append(tfm)`). # In[10]: show_doc(DeviceDataLoader.remove_tfm) # Remove a transform. # ## Generic classes # The first two last classes are just empty shell to be subclassed by one of the applications, the last one is there to create empty [`DataBunch`](/basic_data.html#DataBunch) (useful when we want a [`Learner`](/basic_train.html#Learner) on inference mode). # In[11]: show_doc(DatasetBase, title_level=3) # In[12]: show_doc(LabelDataset, title_level=3) # In[13]: show_doc(SingleClassificationDataset, title_level=3) # ## Undocumented Methods - Methods moved below this line will intentionally be hidden # In[14]: show_doc(DeviceDataLoader.proc_batch) # In[15]: show_doc(DeviceDataLoader.collate_fn) # ## New Methods - Please document or move to the undocumented section # In[ ]: