September 29, 2021
Using sound to understand complex contextual events is key to ambient computing
Hearing is a critical sense when it comes to understanding context. That seems obvious for humans, as most of us use our hearing to sense danger but also to perceive joy and happiness. That ambient ability to understand what is happening through sound is now also obvious for machines.
By empowering all consumer products to understand what is happening around them in real-time, these devices can do really helpful things for their users. This includes optimizing entertainment experiences, giving helpful alerts, adapting visual or voice interfaces, or seamlessly executing tasks on our behalf.
The motivation for our R&D team to tackle the challenges of realizing contextual awareness through sound recognition is that this opens up many opportunities to deliver exceptional value to consumers. It is suitable right across the wide range of our customers’ end products and makes user interactions with advanced AI frictionless and natural.
In some cases, sound recognition can be combined with other inputs such as image, location and movement to provide a comprehensive contextual understanding. Nevertheless, there are many situations where sound alone contains all the necessary contextual information. For example, if you have children, think about the number of times that you have listened to what they are doing in another part of the house or the park: the combination of sounds – or in some cases the lack of sounds – provides you with enough information to either intervene or prompt.
This ability to disambiguate situations at home or while you are out and about is a significant driver behind the research and development that my colleagues and I are leading in contextual awareness through machine listening. In fact, we were recently granted two patents in this area:
The first patent covers context awareness at a very broad level. It describes detecting one or more sound events and/or acoustic scenes associated with a predetermined context (e.g. getting ready to leave the house) and providing an assistive output on fulfilment. For example, this could be a smartphone or a smart speaker recognizing that you are about to leave the house but alerting you that it hears a tap running and advising that you may want to close this tap.
In this example, the benefit to the user is that the device or personal assistant are only helping out when they detect a good reason to, rather than prompting you every time you walk out of the door, which will get irritating very quickly.
The second patent covers smart or connected speakers being able to adjust their room correction procedures due to perceiving contextual changes in the audio activity around them. For example, rather than just adapting to its location when you first set it up or every time you move it, the speaker can dynamically enhance its playback settings to match the changing activity around it. So next time you have friends round for a party, the quality of the music could automatically match a concert at Wembley stadium or a night at Ibiza’s Amnesia club – or, if it’s about perfect playback, compensate for the sound absorption that comes with the presence of a crowd.
When patents are granted, it gives a rare public glimpse into our research work, which sets the standard for the fundamental technologies that enable machines to understand and respond to their environment.
Another example is our work on new deep neural network architectures called LA-LSTMs (Look-Ahead Long Short Term Memory), Temporal Resolution Scaling and Acoustically Smart Temporal Augmentation. These milestones result in enabling an unparalleled degree of sophistication in user experience. They also highlight that the challenges presented by contextual awareness require innovation that spans all areas of machine learning technology, from data augmentation and labelling right through to compact inference platforms that deliver innovative user experiences.
The sense of hearing is fundamental to our human understanding of context. Likewise, as ambient computing reaches all aspects of consumers’ lives, the compact and accurate sense of hearing that our team designs and produces is essential to such a fundamentally helpful capability.