[CS330] 02. Multi-Task Learning & Transfer learning Basics

What is "Task"?

More formally, a task can be described as this format: $\mathcal{T}_i \equiv \{p_i(\textbf{x}), p_i (\textbf{y|x}), \mathcal{L}_i\}$ , based on data generating distribution.

Multi-task learning: Learn $\mathcal{T}_1, \mathcal{T}_2, \cdots, \mathcal{T}_T$ at once
- Transfer learning: Solve target task $\mathcal{T}_b$ after solving source task $\mathcal{T}_a$ by transferring knowledge learned from $\mathcal{T}_a$ ( $\mathcal{T}_a$ include multiple tasks itself.)
  - Key assumption: Cannot access data $\mathcal{D}_a$ during transfer.
  - Fine-tuning is a valid solution to transfer learning $\phi \leftarrow \theta -\alpha \nabla_\theta \mathcal{L}(\theta, \mathcal{D}^{tr})$
The meta-learning: Given data on previous tasks, learn a new task more quickly and proficiently

Vanilla Multi-Task Learning

We have to define the model, objective, and its optimization way.

We use task descriptor $z_{i}$ : $f_{θ} (y|x) \to f_{θ} (y|x, z_i)$
- How does the Multi-task architecture should be: Concatenation-based conditioning, Additive conditioning, Multi-head architecture, and Multiplicative conditioning
Objective ${min}_{θ} \sum_{i = 1}^{T} L_{i} (θ, D_{i})$
- We can weight tasks differently: ${min}_{θ} \sum_{i = 1}^{T} w_{i} L_{i} (θ, D_{i})$
  - The weighting can be a. heuristics, b. task uncertainty c. monotonic improvment towards Pareto optimal solutions or d. optimize for the worst-case task loss.
How should the objective be optimized: By mini-batch Backpropagation with optimizer
Challenges: Negative transfer, Overfitting, Lot of tasks, etc.

Multi-task learning → Transfer learning

What are some problems/applications where transfer learning might make sense?
- When $\mathcal{D}_a$ is very large (Don't want to retain & retrain on $\mathcal{D}_a$ )
- When you don't care about solving $\mathcal{T}_a$ and $\mathcal{T}_b$ simultaneously
- More answers about. Personalization

Transfer learning → Fine-tuning

Transfer-learning: If we are solving target task $\mathcal{T}_b$ after solving source task $\mathcal{T}_a$ by transferring knowledge learned from $\mathcal{T}_a$

We can use fine-tuning by using $\theta$ , which is the parameters pre-trained on $\mathcal{D}_a$ . We have a training data for new task $\mathcal{T}_b$ . We can fine-tune for many gradient steps by this formula:

$\phi \leftarrow - \alpha \nabla_\theta \mathcal{L}(\theta, \mathcal{D}_{b}^{tr})$

However, fine-tuning does not work well with very small target task datasets. This is where meta-learning can help.

저작자표시 비영리 동일조건

'Robotics & Perception > Multi-task Learning & Meta-learning' 카테고리의 다른 글

[CS330] 06. Advanced Meta-Learning: Task Construction (0)	2022.06.15
[CS330] 03. Supervised solution of Meta-learning problem: Black-Box vs. Optimization-based vs. Non-Parametric (0)	2022.06.07
[CS330] 01. Course Introduction (0)	2022.06.05
[CS330 Chelsea Finn] Deep Multi-task learning and Meta-learning Contents (0)	2022.04.19

What is "Task"?
Vanilla Multi-Task Learning
Multi-task learning → Transfer learning
Transfer learning → Fine-tuning

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[CS330] 02. Multi-Task Learning & Transfer learning Basics

What is "Task"?

Vanilla Multi-Task Learning

Multi-task learning → Transfer learning

Transfer learning → Fine-tuning

'Robotics & Perception > Multi-task Learning & Meta-learning' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역