December 19, 2019
Visualising the complex world of sounds
Our colourful and vibrant Sound Map illustrates the complexity of sound, and the challenges involved with teaching machines to hear.
From a young age, our brains are trained to recognise and attach meaning to the different sounds we come across every day. A sense of hearing may appear to be a simple and natural capability of humans, but it’s a much more complicated process for machines.
And this is what the Sound Map (Figure 1) demonstrates.
For example, alarm clocks, microwaves and smoke alarms have ‘beeping’ sounds. We know each one is different because our understanding and interpretation of each beep goes beyond the acoustic information. We also process the temporal patterns of sounds, their tone and the context in which we hear them in.
Machines have to be guided to take a different approach with their sense of hearing. The process is highly complex, and our Sound Map emphasises the complexity of the dataset needed to build a sophisticated sound recognition system.
The map has been created from our industry-leading AlexandriaTM dataset, the world’s largest commercially exploitable dataset of sounds for machine learning and a foundational part of our ML pipeline. AlexandriaTM contains 15 million labelled sounds across over 700 label types, ranging from acoustic scenes to emergency vehicle sirens, and from vocal sounds like coughs, sniffing and laughter to glass breaks.
Each sound can be broken down into hundreds of different features which, when measured and projected into just two dimensions, enables us to create the AlexandriaTM Sound Map. It appears similar to an astronomer’s map of the stars where constellations and galaxies are displayed in 2D for textbooks and posters. Like all maps, the AlexandriaTM Sound Map helps us navigate the world of sounds; and it’s useful because we hold a lot of data about a lot of sounds.
Creating this map also means that we can communicate the extraordinary complexity of real world sounds in a way that is accessible for everyone. Each dot represents an individual sound and each sound category is assigned a colour. And through the colours and clusters, you can appreciate how challenging it is for sound recognition systems to separate and identify sounds.
As laid out by Chris in an earlier blog, sound recognition has entered an exciting second generation. A key part of this is giving consumer products as diverse as smart speakers, smartphones and earbuds the ability to recognise multiple sounds and scenes – and AlexandriaTM has played a critical role in this achievement.
Sound recognition may be a complex problem, but we have solved it. At next month’s CES in Las Vegas, my colleagues will be demonstrating our technology’s ability to recognise 50 different sound classes and we are very excited to be spearheading this new second generation of sound recognition.