Constructing Future

[Diffusion Model] DDPM, DDIM

→2023.01.28

Training Tips for the Transformer Model

This article is the summary for Training Tips for the Transformer Model and Advanced Techiniques for Fine-Tuning Transformers. Training data preprocessing A higher batch size my be beneficial for the training and the batch size can be higher when excluding training sentences longer than a given threshold. It may be a good idea to exclude too long sentences. Training data size Comparing different..

→2022.11.05

Training on GPU, CPU

이 사이트를 꼭 읽자. Training을 할 때 눈 여겨보아야 할 점은 bottleneck이 있는지이다. 즉, GPU와 CPU가 balance있게 잘 퍼졌는지 확인해야 한다는 점이다. CPU와 GPU 모두 RAM이 존재한다. CPU (RAM): 주로 serial 계산을 처리 GPU (DRAM, VRAM): matrix multiplication을 하는 역할을 한다. GPU Bottleneck (Hardward 문제) Hardward의 thread가 모자르다. -core가 모자르다. Thread를 사용할 Tensorcore 부족 Memory transfer 하드웨어 -> RAM, RAM -> GPU memory, GPU memory -> shared memory: 크게 3단계로 나뉘는데, 상위 단계로 보낼..

→2022.11.05

(작성중) [AI602] 3. Bayesian Deep Learning

1. Bayesian Machine Learning Posterior inference 2. Bayesian Deep Learning How: Variational inference Markov Chain Monte Carlo (MCMC) Sampling Bayes by Backprop 3. Bayesian Approximation 4. Appolication) Meta-learning: Neural Processes

→2022.10.01

[AI602] 2. Self-supervised Learning

Self-supervised learning is an unsupervised learning strategy to learn transferable representations by solving data-generated tasks (pretext tasks). It also learns a good representation without supervision, and implements other tasks. The model is evaluated by stacking a task-specific layer. 결국에는, SSL은 어떤 objective를 해결하기 위해서 1. pretext task에는 기존의 dataset에서 반대의 representation을 만들어서 적용하는 방식. 이후에, ..

→2022.10.01

[AI602] 1. Vision Transformer

Transformer: an all-attention model for encoder-decoder framework without any recurrences or convolutions. Attention Self-attention (문장 내에서의 연결) Scaled dot-product attention (with keys, values, and queries). Self-attention learns to encode a word at a certain position by learning what other words to focus on to better understand it. Done via matrix computations, which are fast to compute using G..

→2022.09.11

[AI602] AdvancedML Introduction

Vision Transformer Self-supervised Learning Bayesian Deep Learning

→2022.09.11

[생각노트] GPT-3 / Chain of thoughts: InstructGPT

이 글은 필자가 NLP를 하면서 조금씩 배운 내용을 적은 [생각노트] 시리즈 중 하나입니다. 내용이 맞지 않을 수 있다는 점 참고하세요. Transformer --> GPT-3 GPT-3 Dataset: CommonCrawl, WebText2 (from web pages), Wikipedia, Books1, Books2 (a large collection of free novel books written) Chain of thoughts ex. InstructGPT, Step by step Zero-shot이나 Few-shot이 존재한다.

→2022.07.29

[Logger] wandb 사용법

본 글은 필자의 이해를 돕기 위해 작성된 글로 TensorboardX를 Pytorch에서 구동하는데 일련의 과정을 적은 글입니다. 참고자료1 wandb: tensorboard와 마찬가지로 모델의 파라미터나 accuracy, loss를 기록하는데 유용한 도구 On terminal) wandb login, init pip install wandb wandb login # then put your api key by accessing this link: https://app.wandb.ai/authorize wandb init # set the configuration information ex. entity, project name wandb.init (configuration for model) wandb..

→2022.05.05

AI, Deep Learning Basics/Computer Vision

[CS182 Sergey Levine] Deep Learning - NLP Basics

Goal: Check how sequential processing might be possible for robots Contents 4h 43m Lecture 10. Recurrent Neural Networks 1h 25m Lecture 11. Sequence to Sequence 1h 11m Lecture 12. Transformers 1h 4m Lecture 13. NLP 1h 3m

→2022.04.19

Instance segmentation COCO Evaluation metric

본 글은 Instance segmentation task의 evaluation metric (평가 척도)인 COCO evaluation metric을 설명한 글입니다. COCO dataset은 여기서 볼 수 있습니다. On the instance segmentation task, its evaluation metric solves this problem by using the intersection over union (IoU). IoU means measuring the similarity between finite sets, defined as the size of the intersection divided by the size of the union of the sets. Based on the ..

→2022.04.04

[Generative Model] Latent Variable model

이 글은 Generative model 에 대한 필자의 이해를 높이고자 작성된 글입니다. 참고자료는 자료1 입니다. Latent variable model(LVM) defines a distribution over observation $x$ by using a (vector) latent variable $z$(: as an explanation for the observation)and specifying: The prior distribution $p(z)$ for the latent variable The likelihood $p(x|z)$ that connects the latent variable to the observation The joint distribution $p(x, z) = p..

→2022.03.19

Methodology skeletons

AI skeletons Supervised model Self-supervised model Unsupervised model Generative models Autoregressive models RNN & Transformer language models, NADE, PixelCNN, WaveNet Latent variable models Tractable: e.g. invertible / flow-based models (RealNVP, Glow, etc.) Intractable: e.g. Markov Chain Monte Carlo, Variational Autoencoders series Implicit models Generative Adversarial Networks (GANs) and v..

→2022.03.19

(작성중) [Generative Model] Generative Adversarial Networks(GAN)

이 글은 GAN에 대한 필자의 이해를 높이고자 작성된 글입니다. 참고자료는 자료1 입니다. Concept of Generative Adversarial Networks(GAN) field of(Generative model-latent variable model), A neural net that maps noise vectors to observations Training: use the learning signal from a classifier trained to discriminate between samples from the model and the training data Pros Can generative very realistic images Conceptually simple imple..

→2022.03.17

[Basic] Probabilistic model

Data is treated as a random variable 🌰 Deterministic Neural Network Model weights are assumed to have a true value that is just unknown All weights are having a single fixed value as is the norm Often the absence of a statistical flavor to such an analysis is prone to overfitting on selected examples and in general, presents challenges to draw confident conclusions. Softmax Model can be uncertai..

→2022.03.11

Uncertainty

🐶 용어 정리 Prediction, Confidence, Probability 🐶Why uncertainty is important? Status Before: using the prediction Now: using prediction, uncertainty Purpose Uncertainty inherent in inductive inference Incorrect model assumptions noisy or imprecise data “... a weather forecaster can be very certain that the chance of rain is 50 %; or her best estimate at 20 % might be very uncertain due to lack of d..

→2022.02.20

[Logger] TensorboardX 사용하기

본 글은 필자의 이해를 돕기 위해 작성된 글로 TensorboardX를 Pytorch에서 구동하는데 일련의 과정을 적은 글입니다. TensorboardX: 모델의 파라미터나 accuracy, loss를 기록하는데 유용한 도구 Process Pytorch 모델에 기록한 파라미터 기록하기: 밑의 코드 출처와 같이 writer를 불러서 하는 경우도 있지만 구현된 모델들에서 사용할 때는 Callback으로 간단하게 파라미터를 추가만 해도 처리가 되도록 하는 경우가 대부분이다. x = torch.arange(-5, 5, 0.1).view(-1, 1) y = -5 * x + 0.1 * torch.randn(x.size()) model = torch.nn.Linear(1, 1) criterion = torch.nn...

→2022.02.19

Training tip 정리

Learning rate 부터 정리 Training 시간이 왜 이렇게 오래/짧게 걸리는가 Nvidia-smi (GPU 실시간 확인) Profiler (GPU 사용 history 정리) 어떤걸 고려해야 하는가 Multi-processing: Data loader 의 num_worker Multi-threading Training 되는지 어떻게 확인하는가 Loss이 내려가는지 확인. 특히, Loss 각 항목에 관해서 처리. Parameter나 buffer의 mean, variance 확인 Etc. 결과 확인시 전체 데이터에 대해서 해야지 나오는 batch에 따른 결과를 보면 안된다. Training 때 고려 GPU resource Dataset Training time Etc. Multi-processing..

→2022.02.16

[Basic] Activation Function/Loss Function/Evaluation metric

본 글은 필자의 이해를 돕기 위해 작성된 글입니다. 참고 링크: 링크1 🐤Loss Function 과 Evaluation Metric 차이점 간단히 말해서 Loss function은 딥러닝 모델 학습시 성능을 높이기 위해 minimize/maximize 시켜야 하는 지표이고, Evaluation metric은 여러 딥러닝 모델들 중에 좋은 성능을 확인하기 위해 쓰이는 지표입니다. 예를 들어 classification 문제라 하면 모델의 loss function은 대체적으로 crossentropyloss 으로 분류의 지표를 표시한다면 모델 들 간의 성능을 확인하기 위해서는 evaluation metric이 accuracy가 되어야 한다. 딥러닝 모델의 parameter estimation method 중 ..

→2022.02.12

[NLP] 4. Modern Recurrent Neural Networks: Seq2Seq

🐶 Encoder-Decoder structure 🐶 Sequence to Sequence

→2022.02.12

[NLP] 3. Modern Recurrent Neural Networks: GRU, LSTM

🎲 Gated Recurrent Units (GRU) 🎲 Long Short Term Memory (LSTM)

→2022.02.05

[NLP] 2. RNN Basics: Language Model

이 글은 필자가 Dive into Deep Learning을 읽고 정리한 글입니다. 🏉 Language Model Given a text sequence that consists of tokens$(x_1, x_2, \cdots, x_T)$ in a text sequence of length $T$, the goal of language model is to estimate the joint probability of the sequence $P(x_1, x_2, \cdots, x_T)$. We should know how to model a document or even a sequence of tokens. 🏉 Learning a language Model Let us start by applying..

→2022.02.01

AI, Deep Learning Basics/Computer Vision

[NLP] 1. Introduction of NLP, Word2vec

🏓 NLP: Natural Language Processing 자연어를 처리하는 분야, 우리의 말을 컴퓨터에게 이해시키기 위한 분야를 의미합니다. 자연어는 살아있는 언어이며 그 안에는 '부드러움'이 있습니다. 🏓 '단어의 의미'를 잘 파악하는 표현방법 시소러스(Thesaurus, 유의어 사전) 활용 단어 네트워크(사람의 손으로 만든 유의어 사전)를 이용하는 방법이다. 단어 사이의 '상위와 하위' 혹은 '전체와 부분' 등 더 세세한 관계까지 정의해둔다. ex. Car = auto, automobile, machine, motorcar 대표적인 시소러스는 WordNet(NLTK 모듈)이 존재한다. Cons: 사람이 수작업으로 레이블링하는 번거로움/시대 변화에 대응하기 어렵다./단어의 미묘한 차이를 표현할 수..

→2022.01.22

[기초] 이미지 classification 기본 모델: VGG, GoogLeNet, ResNet

이 글은 필자가 "밑바닥부터 시작하는 딥러닝 1"을 보고 헷갈리는 부분이나 다시 보면 좋을만한 부분들을 위주로 정리한 글입니다. 헷갈렸던 부분 CNN: (Conv - Relu) - Pooling 거침 Pooling layer는 크기만 줄어들뿐 깊게 만들기 위해서는 Conv layer가 필수적이다. 1x1 Conv vs. 3x3 Conv (1x1 Conv): 주로 채널 개수를 줄이기 위해 사용 (3x3 Conv, stride =1) 1. VGG 합성곱 계층과 풀링 계층으로 구성되는 기본적인 CNN Receptive field를 통한 (5x5) Conv보다 2개의 (3x3) Conv가 더 효율적임을 이야기 (Max pooling) - N개의 3x3 Convolution 을 거친후 FC Max pooling으..

→2022.01.16