Constructing Future
[CS391R] Overview of Robot perception
Robot perception: seeing and understanding the physical world by multimodal robot sensors Robot vision vs. Computer vision Robot vision is embodied, active and environmentally situated. Embodied: Robots have physical bodies and experience the world directly. Their actions are part of a dynamic with the world and have immediate feedback on their own sensation. Active: Robots are active perceivers..
Proximal Policy Optimization Algorithms (PPO) Hyper-parameters
🔖 Questions What is the difference between advantage function and reward and value function? PPO-clip은 KL divergence를 안한다는데 approx_kl이 왜 있는거지? 🔖 파악해야 할 notions Reward Loss entropy loss: entropy bonus that ensures sufficient exploration. value loss $L_t^{VF} (\theta) = {(V_\theta(s_t)-V_t^{targ})}^2$ Policy gradient loss $L_t^{CLIP}$ Procedure Epoch: 전체 데이터셋 돌리기 Mini batch/one batch: 하나의 mini bat..
[CS391R] Introduction of Robot Learning
Course website Types of robot automation custom-built robots -> human expert programming -> Special-purpose behaviors General-purpose robots -> Robot learning -> General-purpose behaviors What is robot learning? The study of methods and principles that make robots learn from data -> Learning is critical for taking robots to the real world. Robot perception: seeing and understanding the physical ..
The Craft of Research
두 번째로 읽기 시작한 "The Craft of Research" 이 책을 읽기 시작한 이유: Writing for Computer Science도 물론 잘 써져있는데 The Craft of research는 Writing for CS에서 서술한 세세한 방법론들을 어떻게 implement할 수 있는지 나와있는 것 같아 시작하게 되었다. 이 책만 다 읽으면 머릿속에 "연구"라는 행위에 대한 나 자신의 타당성이 어느정도 충족될 것 같다. 다음에는 How to solve it으로 가서 직접 어떻게 하는지에 대한 머릿속 체계를 잡을 예정이다. > 이 과정을 위해 Writing for CS의 ch 2-5를 읽으며 research의 머릿속 방향을 다시 잡고 들어가야겠다.
[Policy Gradient] Vanilla Policy Gradient, Trust region policy optimization (TRPO), Proximal Policy Optimization Algorithms (PPO)
paper의 수식을 정리한 글입니다. (도출과정 없음). document를 참조했습니다. 🔖 Simplest Policy Gradient We consider a case of stochastic, parameterized policy $\pi_\theta$. We aim to maximize the expected return $J(\pi_\theta) = \mathbb{E}_{\tau \sim \pi_\theta} [R(\tau)]$. For this, we want to optimize the policy by gradient descent. $\theta_{k+1} = \theta_k + \alpha \nabla_\theta J(\pi_\theta)|_{\theta_k}$ The gradient ..
[Writing for CS] 10. Algorithms & 11. Graphs, Figures, and Tables
10. Algorithms Formalisms Literate code style (Author recommended, If algorithm explanation is not given) Prosecode style: Text with embedded code, rather than as code with textual annotations. Prosecode style of presentation is only effective when the concepts underlying the algorithm have been discussed before the algorithm is given. Pseudocode style List style Level of Detail Don't provide to..
[Logger] wandb 사용법
본 글은 필자의 이해를 돕기 위해 작성된 글로 TensorboardX를 Pytorch에서 구동하는데 일련의 과정을 적은 글입니다. 참고자료1 wandb: tensorboard와 마찬가지로 모델의 파라미터나 accuracy, loss를 기록하는데 유용한 도구 On terminal) wandb login, init pip install wandb wandb login # then put your api key by accessing this link: https://app.wandb.ai/authorize wandb init # set the configuration information ex. entity, project name wandb.init (configuration for model) wandb..
[Writing for CS] 9. Mathematics
Clarity "Any" sometimes means "all" and sometimes means "some" An algorithm or problem is "intractable" only if it is NP-hard, that is, the asymptotic cost (or computational complexity) is believed to be worse than polynomial. "Formula" and "expression" are not necessarily an "equation"; that latter involves an equality Two things are "equivalent" if they are "distinguishable" with regard to som..
[Writing for CS] 8. Punctuation
Comma Commas can be used to give the reader time to breathe. Cutting this into several sentences would undoubtedly improve it further. Minimal-commas rule: Use the minimum number of commas needed to avoid ambiguity. Sentences with many commas often have strangulated syntax; if the commas seem necessary, consider breaking the sentence into shorter ones or rewriting it altogether. Another exceptio..
[Writing for CS] 7. Style Specifics
Titles and Headings: should be concise and informative. Accuracy is more important than catchiness The opening paragraph: Take care to distinguish the description of existing knowledge from the description of the paper's contribution. Don't say too much at once. Tense Present tense is used for (1) eternal truth (2) Statements about the text itself. Past tense issued for describing work and outco..
[Writing for CS] 6. Good Style
Text should be taut. Every sentence should be necessary. Tone One topic per section Have one idea per sentence or paragraph Use short sentences with simple structure. Use an example whenever it adds clarification. Each example should be an illustration of one concept; if you don't know what an example is illustrating, change it. A common error is to include material such as definitions or theore..
[CS294 Pieter Abbeel] 5. Implicit Models - GANs
이 글은 필자가 Pieter Abbeel 의 Deep Unsupervised Learning 2020을 듣고 정리한 글입니다. 🗿Implicit Models? 🗿Original GAN 🗿GAN Progression
[CS294 Pieter Abbeel] 4. Latent Variable Models - Variational AutoEncoder (VAE)
이 글은 필자가 Pieter Abbeel 의 Deep Unsupervised Learning 2020을 듣고 정리한 글입니다. 🖲️ Training Latent Variable Models 🖲️ Variations of VAE 🖲️ Related Ideas
[CS294 Pieter Abbeel] 3. Likelihood Models: Flow Models
이 글은 필자가 Pieter Abbeel 의 Deep Unsupervised Learning 2020을 듣고 정리한 글입니다. This lecture deal with a latent representation to get a density model $p_\theta(x)$. 🪸 Foundations of Flows (1-D) 🪸 2-D Flows 🪸 N-D Flows 🪸 Dequantization
[CS294 Pieter Abbeel] 2. Likelihood Models: Autoregressive Models
이 글은 필자가 Pieter Abbeel 의 Deep Unsupervised Learning 2020을 듣고 정리한 글입니다. This lecture is about how to get the data distribution, which means the primitive way of generative models. Generative models first came up with histograms, which is the basic model of the Likelihood-based model. And then for the neural approach, they use autoregressive models. 🫠 Likelihood-based models 🫠 Sampling-based: Hist..
[Writing for CS] Contents
이 글은 필자가 Justin Zobel의 Writing for Computer Science를 읽고 정리한 글입니다. 필자의 writing skill을 향상하기 위한 독서로, 부족한 부분 및 필수로 확인해야 하는 모음을 모아놓았습니다. Contents Chapter 6. Good Style Chapter 7. Style Specifics Chapter 8. Punctuation Chapter 9. Mathematics Chapter 10. Algorithms Chapter 11. Graphs, Figures, and Tables
[CS330 Chelsea Finn] Deep Multi-task learning and Meta-learning Contents
Goal: Check a higher version of the perception for robotics Contents Course introduction & start of multi-task learning 43m Supervised multi-task learning, transfer learning 1h 19m Meta-learning problem statement, black-box meta-learning 1h 18m Optimization-based meta-learning 1h 18m Few-shot learning via metric learning 1h 25m Advanced meta-learning topics 1h 28m Bayesian meta-learning 1h 27m R..
[CS182 Sergey Levine] Deep Learning - NLP Basics
Goal: Check how sequential processing might be possible for robots Contents 4h 43m Lecture 10. Recurrent Neural Networks 1h 25m Lecture 11. Sequence to Sequence 1h 11m Lecture 12. Transformers 1h 4m Lecture 13. NLP 1h 3m
[CS294 Pieter Abbeel] 1. Intro
이 글은 필자가 Pieter Abbeel 의 Deep Unsupervised Learning 2020을 듣고 정리한 글입니다. This lecture shares what is the goal, pursuit of deep unsupervised learning By Deep Unsupervised Learning, Capture rich patterns in raw data with deep networks in a label-free way → But how? Recreate raw data distribution → Generative models "Puzzle" tasks that require semantic understanding → Self-supervised Learning With Pu..
[CS294 Pieter Abbeel] Deep Unsupervised Learning Contents
이 글은 필자가 Pieter Abbeel 의 Deep Unsupervised Learning 2020을 듣고 정리한 글입니다. Goal: Think the intuition of perceptions for robot. How will I lead my research on this perspective? Contents Intro Autoregressive Models 2h 28m Flow Models 1h 57m Latent Variable Models 2h 20m Implicit Models / Generative Adversarial Networks 2h 33m Self-Supervised Learning / Non-Generative Representation Learning 2h 10m Str..
[David Silver] 7. Policy Gradient: REINFORCE, Actor-Critic, NPG
이 글은 필자가 David Silver의 Reinforcement Learning 강좌를 듣고 정리한 글입니다. Last lecture, we approximated the value or action-value function using parameters $\theta$. Policy was generated directly from the value function. In this lecture, we will directly parameterise the policy as stochastic $\pi_\theta(s,a) = \mathbb{P} [a|s, \theta]$ This taxonomy explains Value-based and Policy-based RL well. Value-base..
[David Silver] 6. Value Function Approximation: Experiment Replay, Deep Q-Network (DQN)
이 글은 필자가 David Silver의 Reinforcement Learning 강좌를 듣고 정리한 글입니다. This lecture suggests the solution for large MDPs using function approximation.We have to scale up the model-free methods for prediction and control. So for lecture 6 and 7, we will learn how can we scale up the model-free methods. How have we dealt with small (not large) MDPs so far? We have represented the value function by a looku..
Presentation/Writing skill (220409 Update)
220409 Update: Weakness & Improvements Presentation skills getting better English interpretation skills getting better (No subtitles on Netflix:)) Bad at Documentation skills: Have to put sufficient information (parentheses writing) Solution: While writing, check whether not defined notion came out. Bad at communication skills Cause 1. Afraid of error: No Cause 2. Afraid of English communication..
[David Silver] 5. Model-Free Control: On-policy (GLIE, SARSA), Off-policy (Importance Sampling, Q-Learning)
이 글은 필자가 David Silver의 Reinforcement Learning 강좌를 듣고 정리한 글입니다. In the previous post, to solve an unknown MDP we had to (1) Estimate the value function (Model-Free Prediction) and (2) Optimize the value function (Model-Free control). In this lecture, we are going to learn (2) how to optimize the value function based on the (1) methodologies which are MC and TD. So the goal which we have to achiev..
[David Silver] 4. Model-Free Prediction: Monte-Carlo, Temporal-Difference
이 글은 필자가 David Silver의 Reinforcement Learning 강좌를 듣고 정리한 글입니다. The last lecture was about Planning by Dynamic Programming, which solves a known MDP. Now we are going to check how can we solve an unknown MDP (i.e. Model-free RL). To solve an unknown MDP we have to (1) Estimate the value function of an unknown MDP. We usually call this Model-free prediction(Policy evaluation). After that, we will ..
Instance segmentation COCO Evaluation metric
본 글은 Instance segmentation task의 evaluation metric (평가 척도)인 COCO evaluation metric을 설명한 글입니다. COCO dataset은 여기서 볼 수 있습니다. On the instance segmentation task, its evaluation metric solves this problem by using the intersection over union (IoU). IoU means measuring the similarity between finite sets, defined as the size of the intersection divided by the size of the union of the sets. Based on the ..
[Modern Robotics] Contents
Goal: Get robot information to get confidence from robot grasp, manipulation parts Ch 2. Configuration Space (3-9, 1h 10m) Ch 3. Rigid-Body Motions (10-20, 1h 50m) Ch 4. Forward Kinematics (21-23, 30m) Ch 5. Velocity Kinematics and Statics (24-29, 1h) Ch 6. Inverse Kinematics (30-32, 30m) Ch 7. Kinematics of Closed Chains (33, 10m) Ch 8. Dynamics of Open Chains (34-43, 1h 40m) Ch 9. Trajectory G..
[David Silver] 3. Planning by Dynamic Programming
이 글은 필자가 David Silver의 Reinforcement Learning 강좌를 듣고 정리한 글입니다. (2023.09.12) 추가적으로 필자가 임재환 교수님의 AI611 대학원 수업을 듣고 이해가 부족한 부분을 채웠습니다. -보라색 처리 This lecture is about a solution of known MDP which is Dynamic Programming. We will talk about what is dynamic programming, and prove MDP is solvable. 🥭 Dynamic Programming Dynamic programming is a method for solving complex problems. By breaking them down in..
[David Silver] 2. Markov Decision Processes
이 글은 필자가 David Silver의 Reinforcement Learning 강좌를 듣고 정리한 글입니다. (2023.09.12) 추가적으로 필자가 임재환 교수님의 AI611 대학원 수업을 듣고 이해가 부족한 부분을 채웠습니다. -보라색으로 추가하였습니다. Markov decision process formally describe an fully observable environment for reinforcement learning. 🥭 Markov Processes Based on the Markov property, A Markov process is a random process, i.e. a sequence of random states $S_1, S_2, \cdots$ with the M..
[David Silver] 1. Introduction to Reinforcement learning
이 글은 필자가 David Silver의 Reinforcement Learning 강좌를 듣고 정리한 글입니다. 🧵 Sequential decision making Goal: select actions to maximize total future reward To maximize total future reward, there might be tasks that long-term reward matters, so it may be better to sacrifice immediate reward to gain more long-term reward. Reward may be delayed. Solution of Sequential decision making Reinforcement Learning Pl..