Lifelong Learning of Acoustic Events for Robot Audition

Bayram B., İnce G.

2023 IEEE/SICE International Symposium on System Integration, SII 2023, Georgia, United States Of America, 17 - 20 January 2023 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/sii55687.2023.10039086
  • City: Georgia
  • Country: United States Of America
  • Istanbul Technical University Affiliated: Yes


Scene analysis relies on sensing and understanding the events and objects in a dynamic environment. Lifelong robot learning for scene analysis is a continuous process to learn distinct events, actions, and noises using different sensory modalities in a lifelong manner. In real environments, the spatio-temporal nature of the data captured by sensors may not be stationary, therefore novel events or unseen instances of the known events may exist which affects the performance of scene analysis. In this work, a robot audition framework for Auditory Scene Analysis (ASA) is proposed which enables a real robot to acoustically detect and incrementally learn novel acoustic events in a real domestic environment. To achieve the source-specific analysis, a lifelong learning approach in ASA for robot audition is developed, which includes the following steps: (1) Sound Source Localization (SSL), (2) audio feature extraction, (3) Acoustic Event Recognition (AER), (4) Acoustic Novelty Detection (AND), and (5) adaptation of new event classes into the AER and AND models. The steps are performed on streaming raw audio signals captured in a domestic environment by a robot equipped with a microphone array. The self-learning process on acoustic signals stemming from different events occurs without human supervision. Thus, the proposed system allows the robot to have the capability for lifelong learning of novel acoustic events. The effectiveness of the proposed robot audition framework for lifelong ASA is evaluated in terms of the accuracy of acoustic event recognition and computational time to meet the demands of lifelong learning in real-time.