Robotics & Perception

Seminar Series on Artificial General Intelligence (AGI) #1] Dr. Jim Fan: Generalist Agents in Open-Ended Worlds

이 글은 필자가 231117 일에 들은 카이스트 안성진 교수님께서 주최하신 AGI 세미나를 듣고 필자 위주로 정리한 글입니다. 미래지향적인!! 굉장히 인상깊었던 세미나였습니다. 1년 중에 들은 세미나 중에 퀄리티가 굉장히 높은 매우 행복한 세미나였습니다.
  • Seminars
    • Open-ended environment: How can we comba the world knowledge?
      • Foundation model for agents --> Issue: How can we ground into real-life?
        • We should pursue in language-manner. Because prompting and delivering the concept are both straightforward.
        • Embodiment
          • Philsosphical: Embodiment is the best to learn the world model (action ->simulate possible actions) For example, casuality and decision making
          • Operational: LLM are running out of high-quality tokens
            • I think abstraction would be the key. --> connect with neurosymbolic learning?
            • I think internal memory for active data collection/exploration
      • Representation learning from internet-scale videos
        • Pros: Dynamic perception; intuitive physics
        • Issue: Human-body embodiment, we cannot even get the actions
      • These are all work in multi-modal manner: Mixture of text, image, videos --> outputs action
        • How are we going to process random inputs can be problematic. --> Need to check whether this is already solved.
    • Future of the policy
      • Life-long / Continual learning
      • Hybrid gradient model; Bi-level model
        • Combine high-level (common) + Low-level (grounding) --> How can we realize low-level?
        • Combine foundation model (no gradient, complicated long-horizon, reasoning) + Grounding low-level control (that cannot be explain via language)
        • Neuro Symbolic AI (manipulate/compose symbols) is coming back.
    • Simulator is being important --> in order to simulate in new environment
      • Two things are important in simulator
        1. Sim-2-real
        2. Real-2-sim-2-real --> inverse graphics, neuro symbolics, simulator communicates via code.
      • Generating high-quality dataset
        • Relyin on human dataset (mimicGen) -editing the scene
          • Video synthesis using NerF
    • Community benefits: Twitter
      • Open-source idea
      • Community service
  • Take-aways (정말 주옥같은 이야기)
    1. 내가 생각하고 있는 길이 옳은 길이니 그 생각을 그대로 유지하되, 더 강화하고 구체화하는 시간을 가질 필요가 있다. 나는 이 분야에 비교적으로 신인이다보니, 시간을 더 쏟아야하는 부분은 당연하다.
    2. Need to be ambitiuous as well as balanced. Balanced 부분이 아마 건강과 지속성 관련된 이야기같다.
    3. Think 3-year, not the latter (plan->step->plan)
    4. Solve (good vs. bad)--> Can we identify the question (better vs. just good). In particular, thinking what is valuable / and where are you going to apply your energy for, what can be taught, intrinsically, rejecting things are important)