Everybody Dance Now

#MotionTransfer #videotovideotranslation #GAN

Dataset: a variety of videos, enabling trained amateurs to spin and twirl like ballerinas, perform martial arts kicks, or dance as vibrantly as pop stars
Transfer motion between two video subjects in a frame-by-frame manner > mapping between images of two inidividuals > discover an image-to-image translation between source and target set
“Do as I do” motion transfer: transfer performance to a novel target < Video-to-video translation using pose as an intermediate representation

Extract pose from [the source subject]
Apply the learned pose-to-appearance mapping to generate [the target subject]
Predict two consecutive frames for temporally coherent video results / a separate pipeline for realistic face synthesis

Extract motion

Problem: unlikely to have an exact frame(due to body shape, motion style unique to each other)
Solution: key point-based pose > use Pose stick figures by OpenPose, DensePose
Input: pose stick figures from the source from the trained model

Image-to-image translation

Disentangle motion from appearance and Synthesize video

MoCoGAN: unsupervised adversarial training to learn this separation and generates videos of subjects performing novel motions or facial expression
- Dynamics Transfer GAN 부문 (facial expressions from a source image > a video onto a target person )
그 외 models
- Pix2pix, CoGAN, UNIT, CycleGAN, DiscoGAN, Cascaded Refinement Networks, pix2pixHD

Encoding body poses
Use pre-trained pose detection
Global pose normalization
Transform the pose key points of the source person so that they appear in accordance with the target person’s body shape and location as in the Transfer section
: by analyzing the heights and ankle positions for the poses of each subject and use a linear mapping btw the closest and farthest ankle positions

Pose to Video Translation

Simple GAN Model (Create video sequence)
: predict two consecutive frames, by enforcing temporal coherence
- training
  간단히 말해서 fake G(x_t)와 real y_t를 구별할 수 있도록 coherence 학습된 Pose detector P 얻기 $L_{smooth} (G,D)=\mathbb{E}_{(x,y)} [log D(x_t, x_{t+1}, y_t, y_{t+1})]+E_x[log(1-D(x_t, x_{t+1}, G(x_t), G(x_{t+1}))]$
- transfer
  일반 image가 들어오면 이를 동일하게 적용
FaceGAN
: to add more detail and realism to the face region
얼굴 면적(x_F)만을 취하여 GAN 모델에 적용하는 방식, pix2pix 방식을 적용하였다. $L_{face} (G_f,D_f)=\mathbb{E}_{(x_F ,y_F)} [log D_f(x_F, y_F)]+E_x_{F}[log(1-D(x_F, G(x)_F+r)]$

저작자표시 비영리 동일조건

'AI, Deep Learning Basics > Computer Vision' 카테고리의 다른 글

[Generative Model] Variational AutoEncoder 1. Basic: AE, DAE, VAE (0)	2021.12.06
[Computer Vision] Image, Video 분야 subtask 및 데이터 종류 정리 (0)	2021.12.01
[Basic] 3x3 Conv, 1x1 Conv 하는 이유(FCN vs. FC Layer vs. FPN) (0)	2021.11.20
[Instance segmentation] Mask R-CNN/Detectron2 모델 파일 분석 (0)	2021.11.09
[톺아보기] Pytorch를 이용한 Image Classifier 코드, Gradient Descent (0)	2021.10.26

Related Works
Method
Pose Detection
Pose Encoding and Normalization
Pose to Video Translation

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[논문리뷰] Everybody Dance Now

Everybody Dance Now

Extract motion

Image-to-image translation

Method

Pose Detection

Pose Encoding and Normalization

Pose to Video Translation

'AI, Deep Learning Basics > Computer Vision' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

Everybody Dance Now

Related Works

Extract motion

Image-to-image translation

Method

Pose Detection

Pose Encoding and Normalization

Pose to Video Translation

'AI, Deep Learning Basics > Computer Vision' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역