Robotics & Perception/Reinforcement Learning

Reinforcement learning 기억 되새기기

https://chatgpt.com/share/67b31115-6508-800d-8ada-662b2fed132eBasicsMonte-Carlo, Temporal DifferenceExtensionsPolicy OptimizationSoft Actor CriticQ-learningDouble Q-learningOverestimation issue 피하기 --IQLInterpolating Between Policy Optimization and Q-Learning.Soft Actor-CriticDDPG

→2025.02.17

Robotics & Perception/Basic

Robotics Planning 기초 벼락공부

이 글은 필자의 기본기 재확인을 위해 로보틱스 기초를 키워드 별로 정리한 글입니다. BasicsRobot manipulator --multiple link and jointsLink: length (signed distance from joint i-1 to i), twist (angle from joint i-1 to i), force (linear motion) and torque (rotational motion)Joint -description differ by joint typesPlanning: Given an initial state, goal and model, finding a sequence of actions to achieve a goal.Planning focuses on impr..

→2024.08.05

Robotics & Perception

Seminar Series on Artificial General Intelligence (AGI) #1] Dr. Jim Fan: Generalist Agents in Open-Ended Worlds

이 글은 필자가 231117 일에 들은 카이스트 안성진 교수님께서 주최하신 AGI 세미나를 듣고 필자 위주로 정리한 글입니다. 미래지향적인!! 굉장히 인상깊었던 세미나였습니다. 1년 중에 들은 세미나 중에 퀄리티가 굉장히 높은 매우 행복한 세미나였습니다. Seminars Open-ended environment: How can we comba the world knowledge? Foundation model for agents --> Issue: How can we ground into real-life? We should pursue in language-manner. Because prompting and delivering the concept are both straightforward. Emb..

→2023.11.19

Robotics & Perception/Reinforcement Learning

[Advanced Topics] 02. Representation Learning for RL

이 글은 2023년 가을학기 AI707 이기민 교수님 수업을 듣고 복습차 필자의 이해를 위해 정리한 글입니다. RL from pixel is difficult. Poor sample-efficiency Poor generalization on different environment Representation learning resolves these issue. What is representation learning; Learning the represenation of the data, which contain useful information for ML methods. However, this is also true that optimizing a main task objective might n..

→2023.09.16

Robotics & Perception/Reinforcement Learning

[Advanced Topics] 01. RL with human feedback

이 글은 2023년 가을학기 AI707 이기민 교수님 수업을 듣고 복습차 필자의 이해를 위해 정리한 글입니다. (세계적으로 유명한 교수님들의 직강을 들을 수 있어서 영광이라고 생각한다!!) Rough introduction of Reinforcement Learning Reinforcement Learning: Finding an optimal policy for sequential decision making problem through interactions and learning By interacting the environment, the agent generates roll-outs of the form $\tau = \{(s_0, a_0, r_0), \cdots, (s_H, a_H, r_H)..

→2023.09.16

Robotics & Perception

Debugging tips (항상 업데이트 중)

문제 자체가 성립이 안되는 걸 수도 있다. 데이터셋이 너무 vague하다거나 ex. classification task에 success나 fail이 input에 따른 구별이 어려운 데이터셋 noisy한 데이터가 추가되었거나 ex. 데이터셋을 모으는 과정에서 문제 해결을 위한 데이터셋이 아니거나 ex. classification을 해결해야 하는데 데이터셋이 비중이 맞지 않거나 해결 방법: 데이터셋을 제대로 모으거나 (ex. 데이터셋을 모으는 것에 제약을 주어서 확실한 데이터셋을 모으도록 하거나) 데이터셋의 noisy 네트워크 자체에 버그가 있을 수도 있다. 판단방법: 네트워크가 쉽게 답을 예측할 수 있도록 돕는 indicator를 추가하여 이를 올바르게 예측하는지 찾아보자. --> 이 방법에서 올바르게 예..

→2023.04.08

Robotics & Perception/Probabilistic Robotics

(작성중) Optimal Estimation Algorithms: Kalman and Particle Filters

Kalman filter: Gaussian distribution Particle filter: Sampling-based algorithm

→2022.11.05

Robotics & Perception/Basic

[AI614] 03. Introduction to Task planning

학교 수업을 듣고 복습차 lec 8-10까지 정리한 review 글입니다. Outline: What does this review contain? Representations for task planning -Logic-based AI, First-order logic, and STRIPS Heuristic search for task planning 🚧 Representations for task planning -Logic-based AI, First-order logic, and STRIPS 🚧 Heuristic search for task planning

→2022.10.23

Robotics & Perception/Basic

[AI614] 02. Introduction to Motion planning

학교 수업을 듣고 복습차 lec 5-7까지 정리한 review 글입니다. Outline: What does this review contain? Problem statement Discretization-based methods -A* Sampling-based algorithms -RRTs, PRMs Probabilistic completeness -RRT* Basic notion Configuration: A specification of the position of all points of a robot 로봇 모든 부위 위치 -센터 위치인데 shape알면 다 되는건가? robot shape랑 robot configuration이랑 무슨 차이지? If we know the joint angles (e..

→2022.10.23

Robotics & Perception/Basic

[AI614] 01. Introduction to TAMP, Basics of robot manipulation

학교 수업을 듣고 복습차 Lecture 1-4까지 정리한 review용 글입니다. 내용의 간결성과 전달력을 위해 수식 설명은 제하였습니다. Outline: What does this review contain? Introduction of TAMP Planning vs. Learning based approach Planning -TAMP Positions and orientations of rigid bodies Manipulator forward and inverse kinematics Manipulator velocities and dynamics 🛝 Introduction of TAMP Difference btw industrial robot and intelligent robot agents i..

→2022.10.23

Robotics & Perception/Probabilistic Robotics

[Probabilistic Robotics] Planning and Control: Partially Observable Markov Decision Processes

이 글은 Probabilistic Robotics Chapter 16. Partially Observable Markov Decision Processes를 읽고 정리한 글입니다. To choose the right action, we need accurate state estimation. We need information-gathering tasks, such as robot exploration, for precise state estimation (i.e. reduce uncertainty). There are two types of uncertainty: uncertainty in action, and uncertainty in perception. Uncertainty in action) D..

→2022.09.10

Robotics & Perception/Probabilistic Robotics

[Probabilistic Robotics] Planning and Control: Uncertainty in action/Belief space

blockquote data-ke-style="style2">이 글은 Probabilistic Robotics Chapter 15. Markov Decision Processes를 읽고 정리한 글입니다. To choose the right action, we need accurate state estimation. We need information-gathering tasks, such as robot exploration, for precise state estimation (i.e. reduce uncertainty). There are two types of uncertainty: uncertainty in action, and uncertainty in perception. Uncertainty..

→2022.09.10

Robotics & Perception/Basic

[AI614] Overview of Robot Task and Motion planning

Introduction to Robotics: Mechanics and Control Positions and orientations of rigid bodies Manipulator forward and inverse kinematics Manipulator velocities and dynamics Manipulator equation Motion planning Discretization-based methods -RRTs and PRMs Sampling-based algorithms -RRT* Probabilistic completeness Task planning Representations for task planning -Logic-based AI, First-order logic, and ..

→2022.09.07

Robotics & Perception/Basic

Sim2Real transfer: Domain randomization, domain adaptation, System identification

이 글은 domain randomization의 이해를 돕기 위해 작성된 글입니다. 참고자료1를 해석한 글입니다. 더 자세히 나와있으니 읽어보길 추천합니다! 로보틱스에서 어려운 문제 중 하나는 모델 자체를 실제 환경에서 어떻게 돌아가게 하는지이다. 강화학습 알고리즘의 sample inefficiency와 실제 로봇의 data collection의 문제로 우리는 simulator에 많은 양의 데이터를 제공하여 훈련하도록 하애 한다. 하지만, simulator와 실제 환경 사이의 간극은 로봇을 실제 상황에서 돌아가게 할 때 많이 발생된다. 이러한 간극은 physical parameter, 예를 들어 마찰, kp, dampling, mass, density)나 더 치명적인 비물리적인 모델링 (i.e. 표면사이..

→2022.08.06

Robotics & Perception/Basic

PDDL (Planning Domain Definition Language)

이 글은 필자의 이해를 돕기 위해 작성된 글입니다. 참고자료1, 참고자료2 PDDL은 "classical" planning task의 표준 encoding language입니다. PDDL planning task는 다음과 같이 이루어져 있습니다. problem.pddl Objects: 존재하는 물체들 Initial state: 시작 state Goal specification: 우리가 원하는 goal # gripper-four.pddl (define (problem ) (:domain ) [...] ) domain .pddl Predicates: 물체들에 대한 제약 Actions/Operators: world를 바꾸기 위한 방식들 각 action 당 description, precondition, 과 e..

→2022.08.06

Robotics & Perception/Human-Robot Interaction

[Paper Review] Inner Monologue

Methodology: Closed-loop (with feedback) Semantic constraints and objectives Intractable number of rules Limitations Question: Human needs to reason instead. Not considering the scene. Cannot solve Long Horizon problem Only command consists of pick and place Fail to ground contextually dependent action: How to plan well while being contextually aware For example, Awareness of geometric constrain..

→2022.07.29

Robotics & Perception/Basic

Imitation learning

The simplest version of imitation learning is behavior cloning.

→2022.07.08

Robotics & Perception/Multi-task Learning & Meta-learning

[CS330] 06. Advanced Meta-Learning: Task Construction

→2022.06.15

Robotics & Perception/Multi-task Learning & Meta-learning

[CS330] 03. Supervised solution of Meta-learning problem: Black-Box vs. Optimization-based vs. Non-Parametric

The Meta-Learning Problem Given data from $\mathcal{T}_1, \cdots, \mathcal{T}_n$, quickly solve new task $\mathcal{T}_\textrm{test}$. Assume that meta-training tasks and meta-test task drawn i.i.d. from same task distribution. $\mathcal{T}_1, \cdots, \mathcal{T}_n \sim p(\mathcal{T}), \mathcal{T}_j \sim p(\mathcal{T})$ For example, the task can be: a robot performing different tasks or giving fe..

→2022.06.07

Robotics & Perception/Multi-task Learning & Meta-learning

[CS330] 02. Multi-Task Learning & Transfer learning Basics

What is "Task"? More formally, a task can be described as this format: $\mathcal{T}_i \equiv \{p_i(\textbf{x}), p_i (\textbf{y|x}), \mathcal{L}_i\}$, based on data generating distribution. Multi-task learning: Learn $\mathcal{T}_1, \mathcal{T}_2, \cdots, \mathcal{T}_T$ at once Transfer learning: Solve target task $\mathcal{T}_b$ after solving source task $\mathcal{T}_a$ by transferring knowledge..

→2022.06.05

Robotics & Perception/Multi-task Learning & Meta-learning

[CS330] 01. Course Introduction

Their point of view of why multi-task learning and meta-learning are important Robots can teach us things about intelligence. Faced with the real world Must generalize across tasks, objects, environments, etc Need some common sense understanding to do well Supervision can't be taken for granted Specialists vs. Generalists Specialist: Learn one task in one environment, starting from scratch using..

→2022.06.05

Robotics & Perception/Basic

[CS391R] Overview of Robot Decision Making

Robot Decision making: Choosing the actions a robot performs in the physical world Mathematical Framework of Sequential Decision Making Markov Decision Process Learning for Decision Making reinforcement learning (model-free vs. model-based, online vs offline) Optimizes the policy by trial and error in an MDP. Goal: To maximize the long-term rewards imitation learning (behavior cloning, DAgger, I..

→2022.05.15

Robotics & Perception/Basic

[CS391R] Overview of Robot perception

Robot perception: seeing and understanding the physical world by multimodal robot sensors Robot vision vs. Computer vision Robot vision is embodied, active and environmentally situated. Embodied: Robots have physical bodies and experience the world directly. Their actions are part of a dynamic with the world and have immediate feedback on their own sensation. Active: Robots are active perceivers..

→2022.05.15

Robotics & Perception/Reinforcement Learning

Proximal Policy Optimization Algorithms (PPO) Hyper-parameters

🔖 Questions What is the difference between advantage function and reward and value function? PPO-clip은 KL divergence를 안한다는데 approx_kl이 왜 있는거지? 🔖 파악해야 할 notions Reward Loss entropy loss: entropy bonus that ensures sufficient exploration. value loss $L_t^{VF} (\theta) = {(V_\theta(s_t)-V_t^{targ})}^2$ Policy gradient loss $L_t^{CLIP}$ Procedure Epoch: 전체 데이터셋 돌리기 Mini batch/one batch: 하나의 mini bat..

→2022.05.15

Robotics & Perception/Basic

[CS391R] Introduction of Robot Learning

Course website Types of robot automation custom-built robots -> human expert programming -> Special-purpose behaviors General-purpose robots -> Robot learning -> General-purpose behaviors What is robot learning? The study of methods and principles that make robots learn from data -> Learning is critical for taking robots to the real world. Robot perception: seeing and understanding the physical ..

→2022.05.14

Robotics & Perception/Reinforcement Learning

[Policy Gradient] Vanilla Policy Gradient, Trust region policy optimization (TRPO), Proximal Policy Optimization Algorithms (PPO)

paper의 수식을 정리한 글입니다. (도출과정 없음). document를 참조했습니다. 🔖 Simplest Policy Gradient We consider a case of stochastic, parameterized policy $\pi_\theta$. We aim to maximize the expected return $J(\pi_\theta) = \mathbb{E}_{\tau \sim \pi_\theta} [R(\tau)]$. For this, we want to optimize the policy by gradient descent. $\theta_{k+1} = \theta_k + \alpha \nabla_\theta J(\pi_\theta)|_{\theta_k}$ The gradient ..

→2022.05.11

Robotics & Perception/Unsupervised Learning

[CS294 Pieter Abbeel] 5. Implicit Models - GANs

이 글은 필자가 Pieter Abbeel 의 Deep Unsupervised Learning 2020을 듣고 정리한 글입니다. 🗿Implicit Models? 🗿Original GAN 🗿GAN Progression

→2022.04.21

Robotics & Perception/Unsupervised Learning

[CS294 Pieter Abbeel] 4. Latent Variable Models - Variational AutoEncoder (VAE)

이 글은 필자가 Pieter Abbeel 의 Deep Unsupervised Learning 2020을 듣고 정리한 글입니다. 🖲️ Training Latent Variable Models 🖲️ Variations of VAE 🖲️ Related Ideas

→2022.04.21

Robotics & Perception/Unsupervised Learning

[CS294 Pieter Abbeel] 3. Likelihood Models: Flow Models

이 글은 필자가 Pieter Abbeel 의 Deep Unsupervised Learning 2020을 듣고 정리한 글입니다. This lecture deal with a latent representation to get a density model $p_\theta(x)$. 🪸 Foundations of Flows (1-D) 🪸 2-D Flows 🪸 N-D Flows 🪸 Dequantization

→2022.04.21

Robotics & Perception/Unsupervised Learning

[CS294 Pieter Abbeel] 2. Likelihood Models: Autoregressive Models

이 글은 필자가 Pieter Abbeel 의 Deep Unsupervised Learning 2020을 듣고 정리한 글입니다. This lecture is about how to get the data distribution, which means the primitive way of generative models. Generative models first came up with histograms, which is the basic model of the Likelihood-based model. And then for the neural approach, they use autoregressive models. 🫠 Likelihood-based models 🫠 Sampling-based: Hist..

→2022.04.21

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Robotics & Perception

Reinforcement learning 기억 되새기기

Robotics Planning 기초 벼락공부

Seminar Series on Artificial General Intelligence (AGI) #1] Dr. Jim Fan: Generalist Agents in Open-Ended Worlds

[Advanced Topics] 02. Representation Learning for RL

[Advanced Topics] 01. RL with human feedback

Debugging tips (항상 업데이트 중)

(작성중) Optimal Estimation Algorithms: Kalman and Particle Filters

[AI614] 03. Introduction to Task planning

[AI614] 02. Introduction to Motion planning

[AI614] 01. Introduction to TAMP, Basics of robot manipulation

[Probabilistic Robotics] Planning and Control: Partially Observable Markov Decision Processes

[Probabilistic Robotics] Planning and Control: Uncertainty in action/Belief space

[AI614] Overview of Robot Task and Motion planning

Sim2Real transfer: Domain randomization, domain adaptation, System identification

PDDL (Planning Domain Definition Language)

[Paper Review] Inner Monologue

Imitation learning

[CS330] 06. Advanced Meta-Learning: Task Construction

[CS330] 03. Supervised solution of Meta-learning problem: Black-Box vs. Optimization-based vs. Non-Parametric

[CS330] 02. Multi-Task Learning & Transfer learning Basics

[CS330] 01. Course Introduction

[CS391R] Overview of Robot Decision Making

[CS391R] Overview of Robot perception

Proximal Policy Optimization Algorithms (PPO) Hyper-parameters

[CS391R] Introduction of Robot Learning

[Policy Gradient] Vanilla Policy Gradient, Trust region policy optimization (TRPO), Proximal Policy Optimization Algorithms (PPO)

[CS294 Pieter Abbeel] 5. Implicit Models - GANs

[CS294 Pieter Abbeel] 4. Latent Variable Models - Variational AutoEncoder (VAE)

[CS294 Pieter Abbeel] 3. Likelihood Models: Flow Models

[CS294 Pieter Abbeel] 2. Likelihood Models: Autoregressive Models

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역