Data Science (Incl AI/ML): Recurrent Neural Networks (RNN)

RNN Matrix Dimensions

Many-to-Many RNN Architecture

RNN Architecture (Many-to-Many)

LSTM Cell Architecture (Long, Short-term Memory)

Key Components/Traits

Cell State - a mechanism for memory
Gating - a way to modify the cell state in certain ways. Main idea being regulating the information that the network stores (and passes on to the next layer) or forgets.
Constant Error Carousel - A mechanism to bypass error to flow uninterrupted at a layer. Helps in arresting vanishing gradients.

In feedforward, first the previous activations

h_{t - 1}

and the current input

x_{t}

get concatenated (shown by the dot operator). The concatenated vector goes into each of the three gates. The 'x' denotes element-wise multiplication while the '+' denotes element-wise addition between two vectors/matrices. Note that the output gate has another tanh function though it is not a gate (there are no weights involved in that operation, as shown in the figure).

The feedforward equations of an LSTM are as follows:

f_{t} = s i g m o i d (W_{f} [h_{t - 1}, x_{t}] + b_{f})

i_{t} = s i g m o i d (W_{i} [h_{t - 1}, x_{t}] + b_{i})

c_{t}^{^{'}} = t a n h (W_{c} [h_{t - 1}, x_{t}] + b_{c})

c_{t} = f_{t} c_{t - 1} + i_{t} c_{t}^{^{'}}

o_{t} = s i g m o i d (W_{o} [h_{t - 1}, x_{t}] + b_{o})

h_{t} = o_{t} t a n h (c_{t})

In the RNN cell, you had exactly one matrix

W

(concatenation of the feedforward and the recurrent matrix). In case of an LSTM cell, you have four weight matrices:

W_{f}, W_{i}, W_{c}, W_{o} .

Each of these is a concatenation of the feedforward and recurrent weight matrices. Thus, you can write the weights of an LSTM as:

W_{f} = [\begin{matrix} W_{F f} | W_{R f} \end{matrix}]

W_{i} = [\begin{matrix} W_{F i} | W_{R i} \end{matrix}]

W_{c} = [\begin{matrix} W_{F c} | W_{R c} \end{matrix}]

W_{o} = [\begin{matrix} W_{F o} | W_{R o} \end{matrix}]

-----------

Data Science (Incl AI/ML)

Thursday, October 24, 2019

Recurrent Neural Networks (RNN)

RNN Matrix Dimensions

Many-to-Many RNN Architecture

LSTM Cell Architecture (Long, Short-term Memory)

Key Components/Traits

No comments:

Post a Comment