Infinitely scalable technology platform
We’ve built the world’s largest audio dataset for consumer products and combined it with an AI framework that extracts and models the ideophonic features of sounds.
Intelligent sound recognition requires a deep knowledge of the ideophonic features of sounds. It is the only way to teach machines how to hear.
By recognising all of the different ideophones, our technology enables our customers to deliver reliable, accurate and helpful experiences to consumers.
To teach our technology to recognise sounds, we have to expose it to real-world data. Quantity matters, but it is also about relevance and diversity.
That is why we record audio events and acoustic scenes either in our dedicated Sound Labs, through our network of volunteers, or via our dedicated data collection team.
Sound recognition was a zero-data problem when we started this journey.
We built Alexandria, a dedicated audio dataset for machine learning, which is used to train our sound recognition algorithms.
All audio data is organised taxonomically with full data provenance built in from the start.
We’ve built our own artificial intelligence framework specifically for sound recognition.
It enables us to extract, model and train our technology using the ideophonic alphabet we pioneered, and produce accurate descriptions of each sound, which we call sound profiles.
Speech recognition and wake words are limited by the type of sounds that the human mouth can produce, as well as conditioned by the communicative structure of human language, which can both be exhaustively mapped.
Similarly, music mostly results from physical resonance, and is conditioned by the rules of various musical genres.
So whilst the human ear and brain are very good at interpreting sounds in spite of acoustic variations, computers were originally designed to process repeatable tasks. Thus, teaching a machine how to recognise speech and music greatly benefits from such pre-defined rules and prior knowledge.
Sounds, on the other hand, can be much more diverse, unbounded and unstructured than speech and music.
Think about a window being smashed, and all the different ways glass shards can hit the floor randomly, without any particular intent or style. Or think about the difference between a long baby cry and a short dog bark, or the relative loudness of a naturally spoken conversation versus an explosive glass crash.
Now you understand why sound recognition required us to develop a special kind of expertise: collecting sound data ourselves and tackling real-world sound recognition problems made us pioneering experts in understanding the full extent of sound variability.
Sound recognition for a wide range of products
Expertise in data collection, the world's leading audio dataset for machine learning and a highly specialised AI framework enable us to create ai3™ - a flexible software platform capable of detecting a large number of sounds in a wide range of devices.
Window glass break
Emergency vehicle siren
Vehicle reversing alert
Watch Sacha's presentation on how sound is not speech, recorded at Google's NYC HQ.
Our patent portfolio is a combination of patents covering the uses of sound recognition in products, and also covering a small proportion of the technology techniques we use and have used for our sound recognition.
DCASE (Detection and Classification of Acoustic Scenes and Events) is the world’s leading peer-based sound recognition community, encouraging academia and industry to collaborate and share research on the detection and classification of acoustic scenes and events.