from fastai.gen_doc.nbdoc import *
from fastai.text.models import *
from fastai import *
This module fully implements the AWD-LSTM from Stephen Merity et al. The main idea of the article is to use a RNN with dropout everywhere, but in an intelligent way. There is a difference with the usual dropout, which is why you’ll see a RNNDropout module: we zero things, as is usual in dropout, but we always zero the same thing according to the sequence dimension (which is the first dimension in pytorch). This ensures consistency when updating the hidden state through the whole sentences/articles.
This being given, there are five different dropouts in the AWD-LSTM:
show_doc(get_language_model, doc_string=False)
Creates an AWD-LSTM with a first embedding of vocab_sz by emb_sz, a hidden size of n_hid, RNNs with n_layers that can be bidirectional if bidir is True. The last RNN as an output size of emb_sz so that we can use the same decoder as the encoder if tie_weights is True. The decoder is a Linear layer with or without bias. If qrnn is set to True, we use [QRNN cells] instead of LSTMS. pad_token is the token used for padding.
embed_p is used for the embedding dropout, input_p is used for the input dropout, weight_p is used for the weight dropout, hidden_p is used for the hidden dropout and output_p is used for the output dropout.
Note that the model returns a list of three things, the actual output being the first, the two others being the intermediate hidden states before and after dropout (used by the RNNTrainer). Most loss functions expect one output, so you should use a Callback to remove the other two if you're not using RNNTrainer.
show_doc(get_rnn_classifier, doc_string=False)
get_rnn_classifier[source]
get_rnn_classifier(bptt:int,max_seq:int,n_class:int,vocab_sz:int,emb_sz:int,n_hid:int,n_layers:int,pad_token:int,layers:Collection[int],drops:Collection[float],bidir:bool=False,qrnn:bool=False,hidden_p:float=0.2,input_p:float=0.6,embed_p:float=0.1,weight_p:float=0.5) →Module
Creates a RNN classifier with a encoder taken from an AWD-LSTM with arguments vocab_sz, emb_sz, n_hid, n_layers, bias, bidir, qrnn, pad_token and the dropouts parameters. This encoder is fed the sequence by successive bits of size bptt and we only keep the last max_seq outputs for the pooling layers.
The decoder use a concatenation of the last outputs, a MaxPooling of all the ouputs and an AveragePooling of all the outputs. It then uses a list of BatchNorm, Dropout, Linear, ReLU blocks (with no ReLU in the last one), using a first layer size of 3*emb_sz then follwoing the numbers in n_layers to stop at n_class. The dropouts probabilities are read in drops.
Note that the model returns a list of three things, the actual output being the first, the two others being the intermediate hidden states before and after dropout (used by the RNNTrainer). Most loss functions expect one output, so you should use a Callback to remove the other two if you're not using RNNTrainer.
On top of the pytorch or the fastai layers, the language models use some custom layers specific to NLP.
show_doc(EmbeddingDropout, doc_string=False, title_level=3)
Applies a dropout with probability embed_p to an embedding layer emb in training mode. Each row of the embedding matrix has a probability embed_p of being replaced by zeros while the others are rescaled accordingly.
enc = nn.Embedding(100, 7, padding_idx=1)
enc_dp = EmbeddingDropout(enc, 0.5)
tst_input = torch.randint(0,100,(8,))
enc_dp(tst_input)
tensor([[ 0.0000, -0.0000, -0.0000, 0.0000, 0.0000, 0.0000, -0.0000],
[ 0.0000, 0.0000, -0.0000, 0.0000, 0.0000, -0.0000, -0.0000],
[-0.0000, -0.0000, 0.0000, -0.0000, -0.0000, -0.0000, -0.0000],
[ 0.0000, -0.0000, -0.0000, -0.0000, 0.0000, 0.0000, 0.0000],
[ 0.2932, 2.0022, 2.1872, -0.3247, 0.1347, -0.3324, -1.3978],
[ 1.4960, -2.5978, 1.5589, 0.9840, -1.5260, -2.4613, 0.4806],
[-0.0000, 0.0000, -0.0000, 0.0000, 0.0000, 0.0000, -0.0000],
[ 0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000]],
grad_fn=<EmbeddingBackward>)
show_doc(RNNDropout, doc_string=False, title_level=3)
Applies a dropout with probability p consistently over the first dimension in training mode.
dp = RNNDropout(0.3)
tst_input = torch.randn(3,3,7)
tst_input, dp(tst_input)
(tensor([[[ 1.2319, 1.1261, 1.2774, 0.1549, -1.1483, 1.0135, -0.5733],
[ 0.3503, 1.6554, -0.3416, 0.1143, -1.6186, 0.1263, 0.6576],
[-0.1282, -1.4898, 1.3864, 0.8228, -1.3303, 2.0144, 0.1165]],
[[-0.7594, 0.3570, 0.2195, 0.0835, 0.4086, -0.2475, 0.5885],
[ 0.0940, 0.1063, 0.4301, 0.4235, 0.3187, 0.2077, 1.3733],
[ 1.1039, 1.0182, 0.2202, 0.6540, -1.0580, -0.1514, 1.1673]],
[[ 0.7464, -1.1539, -0.1214, -0.0774, 0.1987, -0.4181, 0.0653],
[ 1.0115, 2.2871, -0.6750, 0.6190, 0.5913, 0.6784, -0.2695],
[ 0.7146, 0.4232, -1.9684, -0.2852, -0.1162, 0.2386, 0.7550]]]),
tensor([[[ 1.7598, 0.0000, 0.0000, 0.2213, -1.6404, 1.4479, -0.8190],
[ 0.5004, 2.3649, -0.4880, 0.1633, -0.0000, 0.1805, 0.0000],
[-0.1832, -0.0000, 0.0000, 1.1754, -1.9005, 2.8777, 0.1665]],
[[-1.0849, 0.0000, 0.0000, 0.1192, 0.5837, -0.3536, 0.8407],
[ 0.1342, 0.1519, 0.6144, 0.6050, 0.0000, 0.2967, 0.0000],
[ 1.5770, 0.0000, 0.0000, 0.9343, -1.5114, -0.2163, 1.6675]],
[[ 1.0663, -0.0000, -0.0000, -0.1106, 0.2839, -0.5973, 0.0933],
[ 1.4450, 3.2672, -0.9642, 0.8842, 0.0000, 0.9691, -0.0000],
[ 1.0208, 0.0000, -0.0000, -0.4074, -0.1660, 0.3408, 1.0786]]]))
show_doc(WeightDropout, doc_string=False, title_level=3)
Applies dropout of probability weight_p to the layers in layer_names of module in training mode. A copy of those weights is kept so that the dropout mask can change at every batch.
module = nn.LSTM(5, 2)
dp_module = WeightDropout(module, 0.4)
getattr(dp_module.module, 'weight_hh_l0')
Parameter containing:
tensor([[-0.6580, -0.1605],
[ 0.3274, -0.1130],
[-0.4807, -0.4852],
[ 0.2366, -0.4500],
[ 0.0782, 0.1738],
[ 0.1071, -0.2037],
[-0.5886, 0.5423],
[ 0.6924, -0.6779]], requires_grad=True)
It's at the beginning of a forward pass that the dropout is applied to the weights.
tst_input = torch.randn(4,20,5)
h = (torch.zeros(1,20,2), torch.zeros(1,20,2))
x,h = dp_module(tst_input,h)
getattr(dp_module.module, 'weight_hh_l0')
tensor([[-1.0966, -0.0000],
[ 0.5457, -0.0000],
[-0.0000, -0.8087],
[ 0.3944, -0.0000],
[ 0.1303, 0.2897],
[ 0.1785, -0.0000],
[-0.0000, 0.0000],
[ 1.1541, -1.1298]], grad_fn=<MulBackward0>)
show_doc(SequentialRNN, doc_string=False, title_level=3)
class SequentialRNN[source]
SequentialRNN(args) ::Sequential
Create a Sequentiall module with args that has a reset function.
show_doc(SequentialRNN.reset)
reset[source]
reset()
Call the reset function of self.children (if they have one).
show_doc(dropout_mask, doc_string=False)
dropout_mask[source]
dropout_mask(x:Tensor,sz:Collection[int],p:float)
Create a dropout mask of size sz, the same type as x and probability p.
tst_input = torch.randn(3,3,7)
dropout_mask(tst_input, (3,7), 0.3)
tensor([[0.0000, 1.4286, 1.4286, 1.4286, 1.4286, 1.4286, 0.0000],
[0.0000, 1.4286, 1.4286, 1.4286, 1.4286, 0.0000, 1.4286],
[1.4286, 1.4286, 0.0000, 1.4286, 1.4286, 0.0000, 0.0000]])
Such a mask is then expanded in the sequence length dimension and multiplied by the input to do an RNNDropout.
show_doc(RNNCore, doc_string=False, title_level=3)
Create an AWD-LSTM encoder with an embedding layer of vocab_sz by emb_sz, a hidden size of n_hid, n_layers layers. pad_token is passed to the Embedding, if bidir is True, the model is bidirectional. If qrnn is True, we use QRNN cells instead of LSTMs. Dropouts are embed_p, input_p, weight_p and hidden_p.
show_doc(RNNCore.reset)
show_doc(LinearDecoder, doc_string=False, title_level=3)
Create a the decoder to go on top of an RNNCore encoder and create a language model. n_hid is the dimension of the last hidden state of the encoder, n_out the size of the output. Dropout of output_p is applied. If a tie_encoder is passed, it will be used for the weights of the linear layer, that will have bias or not.
show_doc(MultiBatchRNNCore, doc_string=False, title_level=3)
show_doc(MultiBatchRNNCore.concat)
concat[source]
concat(arrs:Collection[Tensor]) →Tensor
Concatenate the arrs along the batch dimension.
show_doc(PoolingLinearClassifier, doc_string=False, title_level=3)
Create a linear classifier that sits on an RNNCore encoder. The last output, MaxPooling of all the outputs and AvgPooling of all the outputs are concatenated, then blocks of bn_drop_lin are stacked, according to the values in layers and drops.
show_doc(PoolingLinearClassifier.pool, doc_string=False)
pool[source]
pool(x:Tensor,bs:int,is_max:bool)
Pool x (of batch size bs) along the batch dimension. is_max decides if we do an AvgPooling or a MaxPooling.
show_doc(WeightDropout.forward)
forward[source]
forward(args:ArgStar)
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(RNNCore.forward)
forward[source]
forward(input:LongTensor) →Tuple[Tensor,Tensor]
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(EmbeddingDropout.forward)
forward[source]
forward(words:LongTensor,scale:Optional[float]=None) →Tensor
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(RNNDropout.forward)
forward[source]
forward(x:Tensor) →Tensor
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(PoolingLinearClassifier.forward)
forward[source]
forward(input:Tuple[Tensor,Tensor]) →Tuple[Tensor,Tensor,Tensor]
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(MultiBatchRNNCore.forward)
forward[source]
forward(input:LongTensor) →Tuple[Tensor,Tensor]
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
show_doc(WeightDropout.reset)
reset[source]
reset()
show_doc(LinearDecoder.forward)
forward[source]
forward(input:Tuple[Tensor,Tensor]) →Tuple[Tensor,Tensor,Tensor]
Defines the computation performed at every call. Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.