#!/usr/bin/env python
# coding: utf-8

# # Get your data ready for training

# This module defines the basic [`DataBunch`](/basic_data.html#DataBunch) object that is used inside [`Learner`](/basic_train.html#Learner) to train a model. This is the generic class, that can take any kind of fastai [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) or [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). You'll find helpful functions in the data module of every application to directly create this [`DataBunch`](/basic_data.html#DataBunch) for you.

# In[1]:


from fastai.gen_doc.nbdoc import *
from fastai.basic_data import * 


# In[2]:


show_doc(DataBunch, doc_string=False)


# Bind together a `train_dl`, a `valid_dl` and optionally a `test_dl`, ensures they are on `device` and apply to them `tfms` as batch are drawn. `path` is used internally to store temporary files, `collate_fn` is passed to the pytorch `Dataloader` (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in [`vision.image`](/vision.image.html#vision.image) why this can be important). 
# 
# An example of `tfms` to pass is normalization. `train_dl`, `valid_dl` and optionally `test_dl` will be wrapped in [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader).

# In[3]:


show_doc(DataBunch.create, doc_string=False)


# Create a [`DataBunch`](/basic_data.html#DataBunch) from `train_ds`, `valid_ds` and optionally `test_ds`, with batch size `bs` and by using `num_workers`. `tfms` and `device` are passed to the init method.

# In[4]:


show_doc(DataBunch.dl)


# In[5]:


show_doc(DataBunch.add_tfm)


# Adds a transform to all dataloaders.

# In[6]:


show_doc(DeviceDataLoader, doc_string=False)


# Put the batches of `dl` on `device` after applying an optional list of `tfms`. `collate_fn` will replace the one of `dl`. All dataloaders of a [`DataBunch`](/basic_data.html#DataBunch) are of this type. 

# ### Factory method

# In[7]:


show_doc(DeviceDataLoader.create, doc_string=False)


# Create a [`DeviceDataLoader`](/basic_data.html#DeviceDataLoader) on `device` from a `dataset` with batch size `bs`, `num_workers`processes and a given `collate_fn`. The dataloader will `shuffle` the data if that flag is set to True, and `tfms` are passed to the init method. All `kwargs` are passed to the pytorch [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class initialization.

# ### Methods

# In[8]:


show_doc(DeviceDataLoader.one_batch)


# In[9]:


show_doc(DeviceDataLoader.add_tfm)


# Add a transform (i.e. same as `self.tfms.append(tfm)`).

# In[10]:


show_doc(DeviceDataLoader.remove_tfm)


# Remove a transform.

# ## Generic classes

# The first two last classes are just empty shell to be subclassed by one of the applications, the last one is there to create empty [`DataBunch`](/basic_data.html#DataBunch) (useful when we want a [`Learner`](/basic_train.html#Learner) on inference mode).

# In[11]:


show_doc(DatasetBase, title_level=3)


# In[12]:


show_doc(LabelDataset, title_level=3)


# In[13]:


show_doc(SingleClassificationDataset, title_level=3)


# ## Undocumented Methods - Methods moved below this line will intentionally be hidden

# In[14]:


show_doc(DeviceDataLoader.proc_batch)


# In[15]:


show_doc(DeviceDataLoader.collate_fn)


# ## New Methods - Please document or move to the undocumented section

# In[ ]: