Embodied Intelligence (EI)
Embodied Artificial Intelligence that Interacts with the Real World
The Embodied Intelligence (EI) project developed AI systems with physical embodiment capable of perceiving, reasoning, and acting in the real world. The project spanned the full perception-to-action pipeline: from visual grounding and multi-agent map fusion for mobile robots to language-conditioned lifelong learning and safe manipulation for robotic arms. A key objective was building agents that can continually adapt — not only learning to see and understand their environment, but doing so sustainably across changing tasks and conditions.
Overview
Physical embodiment changes the nature of intelligence. Unlike disembodied AI that operates on static datasets, embodied agents must handle sensor noise, dynamic environments, partial observability, and the physical consequences of their actions. The EI project addressed this challenge across two complementary platforms:
-
Mobile robot navigation and multi-agent coordination — developing robust visual perception and cooperative map-building strategies that allow multiple agents to rendezvous and share spatial understanding of their environment.
-
Robotic arm manipulation — combining natural language understanding with continual learning so that a robot arm can follow human instructions across a stream of novel objects and manipulation tasks, without catastrophically forgetting prior capabilities.
Both tracks required tight integration of computer vision, natural language processing, and reinforcement learning, reflecting the lab’s broader commitment to multimodal AI.
Research Team
Principal Investigator
- Prof. Byoung-Tak Zhang (Seoul National University)
Researchers (Mobile Perception)
- Jaein Kim
- Dong-Sig Han
Researchers (Language-Conditioned Manipulation)
- Junghyun Kim
- Gi-Cheon Kang
- Suyeon Shin
Researchers (Safe Manipulation & Tracking)
- Hyunseo Kim
- Hye-Jung Yoon
- Minji Kim
Researchers (Inverse Reinforcement Learning)
- Hyundo Lee
- Je-Hwan Ryu
Technical Approach
Multimodal Perception for Mobile Robots
- Visual attention-based map fusion: a multi-agent rendezvous framework that enables robots to merge independently built occupancy maps using visual attention, enabling cooperative localization even when agents have explored disjoint areas.
- Object detection and visual grounding for task-relevant perception in cluttered real-world scenes.
Language-Conditioned Lifelong Manipulation
- GVCCI (Grounded Visual Concept Continual Instance learning): a continual contrastive learning framework that allows a robotic arm to learn visual groundings of language-referred object instances over an open-ended stream of tasks, without forgetting previously acquired concepts.
- ROS-based real-world control pipeline: end-to-end system connecting natural language commands to robot arm joint control via a ROS middleware stack.
Safe and Adaptive Object Tracking
- EXOT (Exit-Aware Object Tracker): a real-time tracker that detects when a target object is about to exit the robot’s reachable workspace or camera field-of-view, enabling the robot to abort or adapt its grasp strategy before a collision or failure occurs.
Imitation and Inverse Reinforcement Learning
- Mirror-descent inverse reinforcement learning (Mirror-Descent IRL) for robust imitation from suboptimal or noisy human demonstrations — supporting safe transfer of human-demonstrated manipulation skills to robotic systems.
Publications
-
Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Suyeon Shin, Byoung-Tak Zhang. “GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation.” IROS 2023. arXiv:2307.05963
-
Hyunseo Kim, Hye-Jung Yoon, Minji Kim, Dong-Sig Han, Byoung-Tak Zhang. “EXOT: Exit-Aware Object Tracker for Safe Robotic Manipulation of Moving Object.” ICRA 2023. arXiv:2306.05262
-
Jaein Kim, Dong-Sig Han, Byoung-Tak Zhang. “Robust Map Fusion with Visual Attention Utilizing Multi-Agent Rendezvous.” ICRA 2023. DOI:10.1109/ICRA48891.2023.10161072
-
Dong-Sig Han, Hyunseo Kim, Hyundo Lee, Je-Hwan Ryu, Byoung-Tak Zhang. “Robust Imitation via Mirror-Descent Inverse Reinforcement Learning.” NeurIPS 2022. arXiv:2210.11201
-
Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Seoyun Yang, Minjoon Jung, Byoung-Tak Zhang. “PGA: Personalizing Grasping Agents with Single Human-Robot Interaction.” IROS 2024. arXiv:2310.12547
-
Gi-Cheon Kang, Junghyun Kim, Jaein Kim, Byoung-Tak Zhang. “PROGrasp: Pragmatic Human-Robot Communication for Object Grasping.” ICRA 2024. arXiv:2309.07759
-
Hye-Jung Yoon, Juno Kim, Yesol Park, Byoung-Tak Zhang. “Visual Perception-Based Assistive Mobile Robot System for Manipulation Tasks.” ICRA 2023 Workshop on Assistive Manipulation.
-
Dongwoon Song, Taewoong Kang, et al., Byoung-Tak Zhang, Jae-Bok Song, Seung-Joon Yi. “RoboCup@Home 2021 Domestic Standard Platform League Winner.” RoboCup 2021. DOI:10.1007/978-3-030-98682-7_24
Collaboration
The robotic manipulation work was developed in close collaboration with the Intelligent Robotic Systems Lab at Seoul National University (Prof. Jae-Bok Song, PI: Seung-Joon Yi), co-fielding a joint team for the RoboCup@Home Domestic Standard Platform League (2021 winners). The home-service robotics platform (Team Tidyboy) served as a real-world testbed for integrated perception, navigation, and manipulation capabilities developed across both labs.