Robotics & Perception/Multi-task Learning & Meta-learning

[CS330] 02. Multi-Task Learning & Transfer learning Basics

What is "Task"?

More formally, a task can be described as this format: Ti{pi(x),pi(y|x),Li}, based on data generating distribution.

  • Multi-task learning: Learn  T1,T2,,TT at once
    • Transfer learning: Solve target task  Tb after solving source task Ta by transferring knowledge learned from Ta (Ta include multiple tasks itself.)
      • Key assumption: Cannot access data Da during transfer. 
      • Fine-tuning is a valid solution to transfer learning ϕθαθL(θ,Dtr)
  • The meta-learning: Given data on previous tasks, learn a new task more quickly and proficiently

Vanilla Multi-Task Learning 

We have to define the model, objective, and its optimization way.

  1. We use task descriptor zi: fθ(y|x)fθ(y|x, z_i)
    • How does the Multi-task architecture should be: Concatenation-based conditioning, Additive conditioning, Multi-head architecture, and Multiplicative conditioning
  2. Objective minθi=1TLi(θ,Di)
    • We can weight tasks differently: minθi=1TwiLi(θ,Di)
      • The weighting can be a. heuristics, b. task uncertainty c. monotonic improvment towards Pareto optimal solutions or d. optimize for the worst-case task loss.
  3. How should the objective be optimized: By mini-batch Backpropagation with optimizer
  4. Challenges: Negative transfer, Overfitting, Lot of tasks, etc.

Multi-task learning → Transfer learning

  • What are some problems/applications where transfer learning might make sense?
    • When Da is very large (Don't want to retain & retrain on Da)
    • When you don't care about solving Ta and Tb simultaneously
    • More answers about. Personalization

 Transfer learning  → Fine-tuning

Transfer-learning: If we are solving target task Tb after solving source task Ta by transferring knowledge learned from Ta

We can use fine-tuning by using θ, which is the parameters pre-trained on Da. We have a training data for new task Tb. We can fine-tune for many gradient steps by this formula:

ϕαθL(θ,Dbtr)

However, fine-tuning does not work well with very small target task datasets. This is where meta-learning can help.