November 26, 2019
Acoustic scene technology that will disrupt smartphones and headphones
Today, we announce exciting new context-sensing technology that recognises acoustic surroundings, enabling devices like smartphones and headphones to adapt to these dynamic environments and deliver a wave of innovative benefits.
Imagine you’re at an airport and you need to use your phone’s voice assistant to check your flight details in the check-in area. It’s busy, there are plenty of travellers rushing around with baggage and bored children are acting up.
Then you head to the departure gate early, hoping to finish off that presentation before the flight is due to depart.
Our acoustic scene technology will recognise the local soundscape, enabling features such as your virtual assistant to correctly hear you the first time in the check-in area.
And then, as the departure gate gets busier, your phone notifications, alerts and calls will adapt as the scene around you moves from calm to chaotic.
The phone’s embedded AI recognises that things are changing acoustically and makes sure you don’t miss those calls and messages from your family before you board.
In another scenario, you live in a city like New York City where it is common for workers to spend more than 90 minutes commuting each way, every day.
Every evening, the traffic is always horrendous with cabs, cars, buses and trucks passing. Once you’ve set off, the subway car is packed with commuters heading home and rowdy groups out for a night on the town.
But at least you’re heading in the direction of home which is still a walk across the park before you actually get to your apartment building.
Here, your headphones will adjust the noise cancellation, transparency and equalization settings intelligently as you move through different scenes, from chaotic and lively areas to those which are calm and sombre.
This allows you to continue to consume your media with hassle-free operation – so without any taps, swipes or accessing apps on your devices.
The in-built AI running on the headphones or earbuds makes the optimal decision based on important contextual information.
Travelling through a busy airport and the grind of a daily commute are situations the vast majority of us can relate to – and Audio Analytic’s AI technology is going to disrupt how we interact with the products we all use every day as we navigate our daily environments.
There is strong consumer demand for this technology. In extensive consumer research conducted earlier this year, we found that 87% of consumers in the UK and US wanted their hearables to adapt settings, such as noise cancellation, based on the actual acoustic environment.
Dr Chris Mitchell, CEO and founder of Audio Analytic, said: “If you can better understand context, you can better help consumers. Our scene recognition technology presents an opportunity to intelligently react to the world as it is happening.
“It means that devices like smartphones and headphones don’t need our constant attention, making the experiences of using tech hassle-free and seamless, whether you want to enjoy your favourite music or avoid missing an important call.”
We are already working with leading consumer tech companies to design this amazing new capability into the next generation of products.
Next week, we will be demonstrating scene recognition at the Qualcomm Snapdragon Tech Summit 2019 in Maui, Hawaii.
What is an acoustic scene?
Acoustic scenes fit into four broad categories which can be described by their complexity, affordances and acoustic characteristics. Complexity is essentially how busy and interesting the sounds around you are (defined as chaotic or calm), and affordance is the importance of those sounds (defined as lively or sombre). See Diagram 1.
Our approach is supported by academic research into how humans assess acoustic environments1, as well as our significant research and development into acoustic science. The four scenes recognised by our technology are:
- Chaotic / lively. This could be a busy bar or airport.
- Chaotic / sombre. This could include travelling by train, subway or car, or being near traffic, such as a busy city centre.
- Calm / lively. This includes open spaces such as parks.
- Calm / sombre. This includes closed and less active environments such as the home and meeting rooms.
We’ve found that the soundscapes around us can fit into one of these four categories. Whether the target device is headphones or a smartphone, certain features and applications can be adapted.
It would even be possible for consumers to adapt their settings based on personal preferences or to combine this contextual information with other elements on the device, which would further strengthen the value proposition being offered.
Over time, we envisage more granularity within each of these four categories but this combination of scenes enables device manufacturers to offer exciting and helpful benefits. Meanwhile, scene recognition still meet targets for compactness, which is a key feature of all of our sound recognition technology.
So why are physical location and acoustic scene not the same? As Chris explains, the two have very different meanings and interpretations.
“Take an airport departure gate as an example,” he says, “you arrive early and you’re on your own.
“The acoustic environment is calm and sombre at first. But as other passengers slowly arrive, the soundscape becomes much more chaotic and lively.
“The consumer hasn’t changed their physical location, they haven’t even moved from where they are sitting, but their smartphone or headphones still adapted to what’s happening around them during that time.”
Our scene recognition technology isn’t restricted to commuters and global jet setters. We’ve identified a large number of UX scenarios where scene recognition can meet the needs of a wide range of consumer personas.
And the ability for machines to understand acoustic scenes as well as individual sounds is a central pillar to the amazing world that our second generation sound recognition technology creates.
As well as the Qualcomm Snapdragon Tech Summit, we’ll be demonstrating acoustic scene recognition and our other exciting innovations at CES 2020 in Las Vegas.
1 Link to full article.
Dr Chris Mitchell is the CEO and founder of Audio Analytic, based in Cambridge, UK.