[Takeaways] Robot Brains Podcast S3 E20: Jitendra Malik on Building AI from the ground up: Sensorimotor learning before language
Notes after checking the impressing interview: https://youtu.be/k_Wrd1kI1B0?si=QIqUl3Qrcx7y1FEs These are my personal thoughts.
[Grounding LLM] is open to being proven incorrect.
According to the brain development, words are identified later.
I actually agree with this. This implies [corporating control with LLM] is the key, now this area is kinda hot.
Instead, brain development starts with the progression of hand development. Physical interaction: - Begins with control -> By age 5, skills are acquired - During this, Language is learned in context
Integrating such latent format skills into LLM is important.
Skills
Skill acquisition leads to reused skills, which then focus on meta-phrasing tasks. Skill: Refers to a type of motion behavior. Effective rapid motor adaptation occurs based on simulation. --> Rollouts for the child
Both top-down (concept-generative model) and bottom-up components are vital.
I totally agree with this. I am thinking how to integrate such naturally discovered skill into robot learning / and also connecting to the long horizon tasks at the same time.
Computer Vision: - Classical CV (3R): recognition, reconstruction, reorganization (such as segmentation and grouping) - Currently, there's a scaling up and application across different areas. - 3D reconstruction remains unsolved to a degree. (At a human level: only partially) The next step is integration! 1. Vision in robotics 2. Vision leading to cognition: NLP Lately, there's a lot of interest in healthcare and medicine.
Advice
It's important to stay narrow in focus (connecting fields, being adaptive) and concise in information, but passion is crucial.