November 7, 2019
Second Generation Sound Recognition is here…
Sound recognition is entering an exciting new phase.
We’ve laid the foundation, convinced others that it should be a must-have feature of consumer tech, and our software has been deployed globally.
We are the world experts in sound recognition technology.
While consumers are adopting glass break, smoke/carbon monoxide alarm, dog bark and baby cry detection within their homes, our tech has moved on considerably from those early days.
When we set out achieving our vision of giving all machines a sense of hearing, the initial focus had to be straight-forward use cases like smart home security (such as glass-break and smoke alarm). This was largely because the concept of edge-based sound recognition was new at the time.
As you’d expect, Audio Analytic, the consumer tech sector, and the fundamental technology behind sound recognition in itself, have evolved since then.
Over the next month, we will talk more about this next phase of sound recognition, both from a technical capability and the benefits it brings to our customers and consumers.
Sound recognition has now reached its second generation – and we’ve already started demonstrating the exciting possibilities with our customers.
As far as we’re concerned, the first generation has been a massive hit, characterised by the fact that consumers all around the world can buy a product that gives them peace of mind and protects their property, pets and possessions.
Our mission to see sound recognition deployed as a ‘must-have’ component of an AI offering was successful – we are really excited about the amazing benefits our technology can bring to a wide range of consumer products.
So now let’s take a look at what Second Generation Sound Recognition will be like…
Engineering truly embeddable software
Across the sector, there is a drive to get more AI implemented at the edge of devices – and sound recognition is no exception to this.
Engineering compact tinyML models is becoming key as more devices are pushing towards cloudless AI to meet consumer privacy concerns, give product developers maximum freedom and reduce ongoing cloud infrastructure costs.
Hardware and software space on devices is also becoming more and more competitive, and therefore constrained as the number of features available on products continues to increase.
Audio Analytic’s engineers have been developing sound recognition software that’s truly embeddable on devices, from hearables and smartphones to smart speakers and smart home security products.
Understanding wider audio events and scenes – not just audio events
With Second Generation Sound Recognition, devices will become more sophisticated at recognising acoustic scenes as well as increasing numbers of audio events.
In order to truly give machines a sense of hearing, it is important that devices recognise more than just the specific events and, in addition, can recognise and intelligently react to the acoustic environment that the consumer finds themselves in.
Our sound recognition software is able to understand acoustic scenes as well, and this area is particularly relevant for the hearables and smartphone markets.
Think about a typical commute. It is very likely that you will pass through different physical environments and these will have dynamic acoustic scenes, and these change over time.
A single journey in itself can involve travelling through chaotic, lively, calm and sombre surroundings – and your electronic devices will be able to adapt accordingly.
In a later article, we’ll explain how we cracked scene recognition and will share an overview of how our technology will underpin a range of really exciting and valuable value propositions for consumers.
Engineering a multi-class model that will recognise over 50 sounds
The more sounds we teach machines to recognise and the better understanding of the environments, the more helpful they can be.
We also get closer to that sense of hearing you and I have if we’re standing in a room or out and about. In fact, once you start reaching 50 sounds, your machine is starting to reach the hearing intelligence of a young child.
We’re well on our way to our system being able to recognise many, many sounds thanks to our AlexandriaTM data set. So far, it has over 15 million labelled sound events across over 700 label types.
Every week, the number increases and this makes AlexandriaTM the largest data set of its kind in the world. Nothing of this size, or of this quality, currently exists anywhere else.
Çağdaş Bilen, our senior research engineer, takes us through the Sound Map we have created to showcase this extraordinary dataset.
Where will sound recognition make an impact on our lives?
In the future, sound recognition will be a key feature in consumer electronic devices whether they sit inside a home, outside the home, in your pocket, in your ears or on the driveway.
We have a clear understanding of who the end customers are and how sound recognition can help meet their needs both now and in the future.
User experience scenarios, hero moments, personas and value propositions are helping our customers identify valuable new benefits and features for their next-generation devices and services.
I can assure you that sound recognition is no longer just about glass-breaks and smoke alarms, it opens a whole world of new possibilities.
The question is this. Are you ready?
Dr Chris Mitchell is the CEO and founder of Audio Analytic, based in Cambridge, UK.
Like this? You can subscribe to our blog and receive an alert every time we publish an announcement, a comment on the industry or something more technical.
About Audio Analytic
Audio Analytic is the pioneer of AI sound recognition technology. The company is on a mission to give machines a compact sense of hearing. This empowers them with the ability to react to the world around us, helping satisfy our entertainment, safety, security, wellbeing, convenience, and communication needs across a huge range of consumer products.
Audio Analytic’s ai3™ and ai3-nano™ sound recognition software enables device manufacturers to equip products at the edge with the ability to recognize and automatically respond to our growing list of sounds and acoustic scenes.