- A robot developed through the RIKEN Guardian Robot Project*. It is equipped with recognition and dialogue functions created within the project to enable robots to autonomously provide unobtrusive support.
- The two aiai robots converse with each other in the Osaka dialect.
- The stone pedestal each aiai sits on is equipped with a monitor that displays real-time English translations of their conversation.
- Degrees of freedom: Total of 8 degrees of freedom consisting of 2 in the head, 2 per arm, 1 in the waist, and 1 for mouth opening/closing
- Sensors: Around the robot body are 1 depth image camera, 2 LiDAR sensors, and two 16-channel microphone arrays
- Appearance: Realistic monkey-like form, including a monkey’s face and full-body fur
* Guardian Robot Project
Launched in April 2019, this initiative promotes the development and social implementation of next-generation robots combining neuroscience with AI technologies for purposes of creating a future society where humans, AI, and robots can coexist seamlessly. The project integrates strengths from psychology, neuroscience, cognitive science, and AI research. Unlike conventional robots, which typically perform only limited tasks based on detailed human instructions, this project envisions robots capable of autonomously perceiving their environment and the condition of the humans they assist, engaging appropriately without undermining human autonomy, and delivering unobtrusive support.
(1) Recognition of human position and speech content
The two LiDAR sensors measure the position of people nearby, and using this positional data and the audio data captured by the two microphone arrays, the system isolates human speech from the surrounding noise. A recognition system based on a large language model allows the robot to understand the content of human speech.
(2) Recognition of environmental context
With its depth image sensor and a recognition system trained on large-scale data, aiai can perceive contextual details about people nearby—for example, what color clothes they are wearing or what they are carrying.
(3) Dialogue function
The robot recognizes the appearance and speech of visitors and observes its surroundings. It generates conversation using a large language model that takes into account the events happening in the moment, allowing visitors to experience a sense of shared space.
(4) Two-robot dialogue function
Rather than directly addressing the visitors, the two robots engage in a conversation with each other based on their perception of the environment. In doing so, they draw the visitors into the contextual world generated by the robots themselves.