[NLP] 2. RNN Basics: Language Model

이 글은 필자가 Dive into Deep Learning을 읽고 정리한 글입니다.

🏉 Language Model

Given a text sequence that consists of tokens $(x_1, x_2, \cdots, x_T)$ in a text sequence of length $T$ , the goal of language model is to estimate the joint probability of the sequence $P(x_1, x_2, \cdots, x_T)$ . We should know how to model a document or even a sequence of tokens.

🏉 Learning a language Model

Let us start by applying basic probability rules:

$P(x_1, x_2, \cdots, x_T)= \prod_{t=1}^T P(x_t|x_1, \cdots, x_{t-1})$

There are some solution that tries to solve $P(x_t|x_1, \cdots, x_{t-1})$ . Such as:

Laplace smoothing
Markov Models and $n$ -grams
Natural Language Statistics

However, as the length of the token gets bigger, these solutions might not be solvable. Recurrent Neural Network (RNN) came as a solution for it.

🏉 Recurrent Neural Network

Rather than modelling $P(x_t|x_1, \cdots, x_{t-1})$ , it is preferable to use a latent variable model:

$P(x_t|x_1, \cdots, x_{t-1}) \approx P(x_t|h_{t-1})$

where $h_{t-1}$ is athat stores the sequence from time step 1 to $t-1$ . This can be expressed as $h_t = f(x_t, h_{t-1})$ , where $f$ is a function that outputs the hidden state.

Likewise, a neural network that uses recurrent computation for hidden states is called a Recurrent Neural Network (RNN). $H_t$ can be expressed as

$H_t = \phi (X_t W_{xh} + H_{t-1}W_{hh}+b_h)$

where $X_t$ is the minibatch of inputs at time step $t$ , $H_t \in \mathcal{R}^{n \times h}$ the hidden variable of time step $t$ . The parameters of the RNN include the weights $W_{xh} \in \mathcal{R}^{d \times h}, W_{hh} \in \mathcal{R}^{h \times h}$ and the bias $b_h \in \mathcal{R}^{1 \times h}$ .

🏉 BPTT

Let's look at RNN's basic structure.

$h_t = f (x_t, h_{t-1}, w_h)$

$o_t = g(h_t, w_o)$

Conclusion: Accumulate gradients using chain rule

$\mathrm{L}(x_1, \cdots, x_T, y_1. \cdots, y_T, w_h, w_o) = \frac{1}{T} \sum_{t=1}^T l(y_t, o_t)$

$\frac{\partial L}{\partial w_h} = \frac{1}{T} \sum_{t=1}^T \frac{\partial l(y_t, o_t)}{\partial w_h}$

🏉 Truncated BPTT

🏉 Gradient Clipping / Vanishing Gradient Problem

저작자표시 비영리 동일조건

'AI, Deep Learning Basics > NLP' 카테고리의 다른 글

[CS182 Sergey Levine] Deep Learning - NLP Basics (0)	2022.04.19
[NLP] 4. Modern Recurrent Neural Networks: Seq2Seq (0)	2022.02.12
[NLP] 3. Modern Recurrent Neural Networks: GRU, LSTM (0)	2022.02.05
[NLP] 1. Introduction of NLP, Word2vec (0)	2022.01.22
[NLP] RNN 예제로 살펴보는 RNN 맛보기 (0)	2019.09.10

🏉 Language Model
🏉 Recurrent Neural Network
🏉 BPTT

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[NLP] 2. RNN Basics: Language Model

🏉 Language Model

🏉 Learning a language Model

🏉 Recurrent Neural Network

🏉 BPTT

🏉 Truncated BPTT

🏉 Gradient Clipping / Vanishing Gradient Problem

'AI, Deep Learning Basics > NLP' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역