Audio Analytic Labs

Welcome to Audio Analytic Labs' blog and information page!

AA Labs is Audio Analytic's technological brain. We've made Automatic Environmental Sound Recognition a practical reality, as evidenced by the company's products, and our mission is to constantly improve and enhance it through the application of our world class technical research and development.

Humans have a natural ability to recognise environmental sounds, but what about designing machines, computer programmes and products that have a similar level of acoustic intelligence? Solving this complex problem requires the application of leading edge skills in Signal Processing, Acoustic Sciences, Machine Learning and Big Data. This is the core of AA Lab's mission.

The way AA Labs maintains the company's excellence in sound recognition innovation is through a blend of cutting edge in-house research and open collaborations with public partners, each carefully selected from the highest levels of academia. This translates into three main objectives:

  1. Carry out our own in-house core research
  2. Foster public research by influencing the academic sector and supplying a technology productisation channel for partners
  3. Integrating first-hand knowledge of the state-of-the-art with our own in-house innovation to maintain our technological competitiveness

Our world class technical expertise is the engine that continually powers Audio Analytic's competitive dominance and its market-leading ai3 (acoustic intelligence 3) sound recognition technology.

If you have the relevant skills and would like to work with us, or if you are a research institution interested in R&D collaborations, please contact us at: AALabs@audioanalytic.com

Next Audio Analytic Tech Talk: May 10th 2016 – “Lessons in music and experimental design” – Dr. Bob Sturm, QMUL

By

You may know that Audio Analytic Labs is running a very successful series of Tech Talks in Cambridge. We have had prestigious speakers in the past, coming from the sound recognition research community, such as Professor Mark Plumbley from University of Surrey or Dr. Emmanouil Benetos from Queen Mary University of London. The series is ongoing, and new speakers are being regularly lined up for talks that we hope you will enjoy.

The next Tech Talk in the series will be:
“Your machine learnings may not be learning what you think they are learning: Lessons in music and experimental design”
by Dr. Bob Sturm, Lecturer in Digital Media, School of Electronic Engineering and Computer Science (EECS), Queen Mary University of London.
Date: Tuesday May 10th, 2016, 18:30 – Venue: Metail, 50 St Andrew’s Street, CB2 3AH, Cambridge.

For more information about this talk, please have a look at the Meetup announcement:
http://www.meetup.com/The-Audio-Analytic-Tech-Talks/events/230541500/.
In order to receive AA Tech Talks announcements automatically, please join our meetup group:
http://www.meetup.com/The-Audio-Analytic-Tech-Talks/

You can also view the past AA Tech Talks on AA Lab’s Youtube channel:
http://bit.ly/aa-labs

Attendance to these talks is free. We hope to meet you in person at one of them!

AA Labs at the Cambridge Science Festival 2016

By | Audio Analytic Labs | No Comments

It has been a honour and a delight to be invited by the IfM (Cambridge’s Institute for Manufacturing) to provide a demonstration about automatic sound recognition in the framework of the Cambridge Science Festival 2016.

For the AA Labs engineers, the Science Festival is every year an occasion to spend a one day hackathon out of the ordinary to prepare the demo, followed by an enjoyable Saturday out to present the demo to the general public. This year, the demo took our young visitors and their parents through a 5 steps story:

  1. Making their own guitar cup that they could bring back home as a souvenir.
  2. Understanding how sound gets generated by string instruments, with the help of a custom-made sound board.
  3. Understanding how spectrograms can be used to visualise and analyse sounds, through a fun exercise of “paint your own spectrogram”, followed by sound synthesis from the spectrogram image.
  4. Experimenting with a Moo detector, based on a sound sensor trained on the sound of a Moo box. (With all the fun challenges of sound recognition in a room full of kids…)
  5. Participating into the “What would you use that for?” competition at the end of the demo circuit.

The spectrogram painting proved particularly attractive: have you ever wondered how the spelling of your name sounds like?

Overall it has been a very inspirational day: a chance to express the team’s creativity, which we are very proud of, and to convey our passion of sound technology to the future engineering generations. Judging from the pictures below, the visiting children and parents had a great time too!

AA Labs is already looking forward to next year’s edition…

Feel free to click on the pictures below to have a feel for the fantastic atmosphere of that very joyful and inspirational day.

(PLEASE NOTE: the pictures below were taken by one of our engineers on an informal basis. If you wish for any of the pictures to be removed, please email aalabs_admin@audioanalytic.com and the picture you wish to see removed will be immediately deleted.)

 

SciFest2015_IMG_3461SciFest2015_DSC_0982SciFest2015_DSC_1011SciFest2015_DSC_1045SciFest2015_DSC_1145SciFest2015_DSC_1124SciFest2015_DSC_0994SciFest2015_DSC_1139SciFest2015_DSC_1026SciFest2015_DSC_1158SciFest2015_DSC_1067SciFest2015_DSC_0973SciFest2015_DSC_1081SciFest2015_DSC_1111SciFest2015_DSC_0943SciFest2015_DSC_1131SciFest2015_DSC_1113SciFest2015_DSC_1160SciFest2015_DSC_1106SciFest2015_DSC_1087SciFest2015_DSC_1084SciFest2015_DSC_0984SciFest2015_DSC_1076SciFest2015_DSC_1060SciFest2015_DSC_1032SciFest2015_DSC_1054SciFest2015_DSC_1007SciFest2015_DSC_1017SciFest2015_DSC_1044SciFest2015_DSC_1133SciFest2015_DSC_1004SciFest2015_DSC_1098

AA Lab’s impressions on the Workshop on Auditory Neuroscience, Cognition and Modelling 2016

By

By Iñigo Martinez de Rituerto de Troya, Analytics Engineer at AA Labs

At Audio Analytic, we are pushing the boundaries of acoustic intelligence, developing a system that recognises environmental sounds which are semantically significant. Understanding sound and producing automatic sound recognition is a multifaceted challenge which requires expertise from various fields, such as signal processing, machine learning, computer science, cognitive science and auditory neuroscience. Thus, while developing in-house solutions and products which bring this cutting edge understanding of sound into people’s lives, we are always keeping a keen eye on the developments happening in the academic community, in the spirit of being open to academic collaborations and taking an active part in cutting-edge worldwide research.

The WANCM_workshop Workshop on Auditory Neuroscience, Cognition and Modelling (WANCM2016) hosted by the Centre for Digital Music (C4DM) at Queen Mary, University of London (QMUL), on Feburary 17th, was an initiative to bring together researchers from the fields of auditory neuroscience, auditory modeling, signal processing and machine learning. The range of perspectives within auditory research that were represented fostered interesting conversations at the intersection of the multiple disciplines that work towards understanding sound. Gathered at QMUL’s Charterhouse Square campus near the iconic Barbican Centre, the workshop hosted one day of oral and poster presentations, as well as three stand-out keynote presentations from researchers at Aarhus University, IRCAM/CNRS and the University of Cambridge. In this post, we will review some highlights of the day’s proceedings.

Prof. Elvira Brattico (Aarhus University) inagurated the workshop by presenting work being undertaken at the newly opened Center for Music in the Brain (MIB). The MIB unites expertise from a range of departments across Aarhus, including the Royal Academy of Music, along with the clinical facilities at the Centre for Functionally Integrative Neuroscience, creating new opportunities for cross-disciplinary insights into the perception of music. Prof. Brattico presented evidence for traces of Mismatched Negative (MMN; a neurophysiological correlate, or neuromarker, for brain activity deviations due to the detection of irregularities in a stimulus) when listening to musical instruments playing in foreign scales or hitting wrong notes. She also presented evidence for stronger acculturation in trained musicians of high-level musical schemes, which vary greatly between musics of different cultures, based on stronger ERAN (early right anterior negativity; a neuromarker for recognising deviations in hierarchical musical organization). The talk was ended by an enlightening animation of the electrical activity of a human brain listening to tango, revealing the macroscopic influence that organised sound has on the cortex.

Dr. Jean-Julien Acouturier (IRCAM/CNRS) gave an intriguing demonstration of his long-lasting CREAM project (Cracking the Emotional Code of Music), first developed at CNRS and since moved to IRCAM. Dr. Acouturier et al designed an experiment where participants’ voices were manipulated to imbue their voices with an emotive quality of either happiness, sadness or fear. The key finding was that participants remained unaware that anything had changed in the sound of their voice. Furthermore, they even took on the imposed emotion in the way they spoke, while remaining unwitting to the influence. This work demonstrates that people do not actively monitor the emotiveness of their own voice, and that this can in fact be altered by externally applied feedback. The acoustic manipulations are in fact fairly simple, just some filtering, compression, amplitude modulation, among others, but how finely they were executed is quite remarkable. The software is freely available as a Max patch at http://cream.ircam.fr/?p=44

Dr. Richard Turner’s work in sound texture synthesis contextualised his proposal of a probabilistic time-frequency demodulation algorithm which is impressively robust to low signal-to-noise ratios (SNR). Poor SNR signals often plague popular methods for pitch detection (namely approaches using the linear Hilbert transform). Demodulation is widely used in assistive hearing devices (such as hearing aids or cochlear implants), to decompose a speech signal into a slowly-varying “modulation” signal that represents the amplitude envelope of the sound, and a fast-varying “carrier” signal which carries the content of the signal in more finely temporal structure. Dr. Turner’s approach employs Gaussian processes to perform Probabilistic Amplitude Demodulation, the parameters of which are inferred from the structure inherent in the signal itself, and can also incorporate context-specific constraints as prior information. So for example, one can use knowledge about the temporal granularity of a sound texture to steer the Gaussian processes into the feature-space that best encapsulates the nature of the signal. These features lead to a demodulation algorithm that can deal with uncertain signals in noisy environments, ideal (and indeed requisite) for processing natural sounds. This work brings us closer to being able to give people with hearing loss the ability to distinguish speech in noisy environments and better enjoy the complex sonic landscapes of music.

The poster sessions went deeper into the territory explored by the oral presentations, diving into artificial neural networks and biologically-inspired computational auditory scene analysis (CASA). Transfer learning featured in several projects as a means of boosting predictive models whose domain renders collection of sufficient data difficult or impractical. Eduardo Coutinho (Imperial College London & University of Liverpool) offered an interesting example, extracting the emotional semantics embedded in music and speech, revealing common semantics between these two seemingly disparate but altogether related means of communication.

Cleo Pike (University of Surrey) and Amy Beeston (Sheffield University) presented findings that highlight one area in which the human ear still far outperforms machine listening techniques: cognition in reverberant environments. The human ear is able to adapt to highly reverberant environments reasonable quickly (in the order of 500ms), restoring speech intelligibility to baseline levels, while computational methods are still notoriously computationally expensive. Furthermore, humans only need two ears to do this, while most computational systems exploit more complex microphone arrays to do the job. They proposed some physiological structures of the peripheral auditory system as potentially yielding clues to new methods of approaching the problem of dereverberation in computer systems. Interestingly, they believe this processing is taken care of before the auditory signals reach the cortex, and may be a virtue of binaural hearing. Their findings help further understand the biological substrate of binaural reverberation in human listeners.

On the ground, attendees discussed the DCASE challenge 2016 (Detection and Classification of Acoustic Scenes and Events: http://www.cs.tut.fi/sgn/arg/dcase2016/), and the emphasis on real world sound recognition and event detection. Although not officially mentioned as a DCASE organiser, Audio Analytic Labs has been invited to advise on dataset labelling and evaluation methodologies for DCASE’s Task 4 “Domestic Audio Tagging”. We look forward to seeing the compared results of the participant’s state of the art algorithms over the public domain Chime-Home dataset which was built for this occasion through collaborative work between AA Labs (Dr. Sacha Krstulovic), Queen Mary University of London (Dr. Peter Foster), University of Surrey (Prof. Mark Plumbley) and University of Sheffield (Dr. Jon Barker).

There was also mention of the soundsoftware.ac.uk initiative, a platform by the C4DM to promote “sustainable software for audio and music research”, through the provision of open source tools, methodologies for reproducible research and more. Overall, there was a great sense of comradery among researchers, and a sense that different strains of auditory research were converging towards a more fruitful academic panorama.

At the end of the day, the organisers iterated the general desire for the workshop to continue in the future. Thanks to the great turn out, the organisers proposed for the workshop to continue in the future, possibly in a different location. They are currently looking for hosting institutions to run the workshop next year.

Videos of the lectures are available online at: http://c4dm.eecs.qmul.ac.uk/wancm2016/programme.html

(Linked image courtesy Emmanouil Benetos, QMUL/C4DM.)

Cambridge Innovation Capital and IQ Capital co-lead £1M funding of Audio Analytic Ltd.

By

 

??????????????

LISTENING LIGHTBULB USES AUDIO ANALYTIC’S SMART HOME TECH

Cambridge Innovation Capital and IQ Capital co-lead £1M funding of sound recognition company

A smart lightbulb that can recognise the sound of an intruder breaking a window has won a major industry award. This has fuelled further demand for Audio Analytic, the world’s leading sound recognition company. To address its growing order book and fund new developments, the company has raised £1 million from investors including IQ Capital Partners, Cambridge Innovation Capital and Cambridge Angels.

In January ‘Sengled Voice’ won the 2016 CES Best of Innovation in the Smart Home category*. It is a light bulb with an integrated microphone and speaker that allows the detection of noises, such as glass breaking or a smoke alarm, to be analysed using Audio Analytic’s ai3 (acoustic intelligence 3) technology and communicated to the homeowner via a mobile alert.

The Smart Home market is growing rapidly and Audio Analytic is working with the leading brands to incorporate its unique sound recognition and identification software into products – such as sophisticated baby alarms, net cams, smart home hubs, intelligent lighting – that can identify sounds associated specifically with safety and security and alert the home owner.

Dr Chris Mitchell, founder and CEO of Audio Analytic, says that the company has pioneered this technology: “I have always been interested in how sound can be detected and identified so when I finished my PhD and found there were no companies addressing this market I set up Audio Analytic.”

The Cambridge-based company is now 17 strong and the new funding will be used to bring on new people to manage the sales pipeline and to invest in further R&D.

Mitchell continues: “The Audio Analytic Labs really are the technological brain to Audio Analytic; where fundamental research is undertaken. We are leading the way at the technology level and this further investment will help keep us there.”

As leaders in the field of Automatic Environmental Sound Recognition, Audio Analytic software is much in demand and the £1 million investment will support this.

Mitchell comments: “We have a strong client-base in the smart home space. This includes Sengled, a global innovator in LED lighting, integrating our software into their product helped them win the prestigious innovation award at CES.”

The company so far has catalogued seven sound types, including breaking glass (comprising four different types), baby cries and smoke and carbon monoxide alarms, and has the world’s most extensive reference-base of these noises, allowing high precision recognition.

Mitchell explains that there are considerable differences between the different brands of alarms, so creating the database has involved importing hundreds of alarms, mostly from the US.

He says: “A major challenge was creating the world’s first sound libraries designed for sound recognition. This resulted in us breaking panes of glass at the “Hush House” at Alconbury, which was previously used for testing jet engines, or setting off smoke and carbon monoxide alarms in lots of houses.”

A report by Markets and Markets in February 2015** estimates the smart home market will reach $58.68 billion by 2020.

Victor Christou, Chief Executive Officer at Cambridge Innovation Capital, comments: “Recognition of the importance of smart audio in the home automation market has grown massively recently.  With the world’s largest audio database and key customer relationships already established, Audio Analytic is the category leader in this sector.  CIC is very pleased to be supporting another world class, home-grown, Cambridge business.”

Max Bautin, Managing Partner at IQ Capital, added: “We have tracked impressive progress at Audio Analytic over several years and are delighted to be part of this exciting company at a time when IoT and smart home tech is starting to grow so rapidly.

“It is also great to see Cambridge investors working together yet again to fund an exceptional Cambridge start-up with global potential.”

 

– ENDS –

 

Notes for editors follow

*”Best of Innovation Award CES 2016″ – Sengled Voice (http://cesweb.org/innovation)

**http://www.marketsandmarkets.com/Market-Reports/smart-homes-and-assisted-living-advanced-technologie-and-global-market-121.html

 

Media information about Cambridge Innovation Capital

Rachel Holdsworth/Anna Masefield, Holdsworth Associates PR

T: 01954 202789 E: anna.masefield@holdsworth-associates.co.uk

 

Louise Rich, Head of Investor Relations and Communications

T: +44 (0)7944 981468 louise.rich@cambridgeinnovationcapital.com

 

About Audio Analytic

Audio Analytic is leading the world of acoustically connected things.

Its unique embedded software makes devices aware of the sounds around them. If a smoke alarm goes off or a glass panel is broken by intruders while no-one is at home, the software immediately recognises the sound and tells the device to alert the home owner and the smart home to take appropriate protective action.

Other software sensors available are smoke, car and carbon monoxide alarm detection as well as baby cry. Audio Analytic is an award-winning company, founded in 2008 and headquartered in Cambridge, UK.

Audio Analytic Labs, the company’s research division, is maintaining excellence in sound recognition innovation through a balanced blend of cutting edge in-house research and open collaborations with carefully selected public partners such as the University of Surrey, QMUL and INRIA.

Find out more at www.audioanalytic.com

 

About Cambridge Innovation Capital

Cambridge Innovation Capital (“CIC”) invests in high-growth technology companies. It is led by an experienced investment team, an outstanding board and advisory panel of leading scientists and entrepreneurs.

CIC combines a strong relationship with the University of Cambridge with deep financial and industry links. It raised initial capital of £50 million in 2013 from long term institutional and strategic investors such as Invesco Perpetual, Lansdowne and the Cambridge University Endowment Fund. The company strives to build leading businesses using a long term return strategy – removing the pressure to deliver the early exits associated with the traditional venture capital model.

Positioned within the dynamic Cambridge Cluster, CIC has a unique appreciation for world-leading scientific development. The company is committed to ensuring that its investment partners can build leading businesses from brilliant technologies – with the support of some of the most influential figures in the sector.

For further information, including details of other CIC portfolio companies, see www.cambridgeinnovationcapital.com

 

About IQ Capital

IQ Capital is a technology focussed venture capital firm with £75m under management, based in Cambridge and London. We invest up to £5m into innovative growth stage technology companies poised to rapidly scale. We also make seed investments into disruptive and IP-rich technology start-ups sourced directly or introduced through our network. We are approachable, straightforward and an extremely well connected team, working together for over 15 years to support founders achieve excellent exits.  Our realisations include recent trade sales to Google, Apple, Becton Dickinson and Huawei and numerous IPOs including Autonomy. We are currently investing from the £45m IQ Capital Fund II launched in 2015.

www.iqcapital.co.uk

Tech Talk, 23rd Feb: Matrix Decomposition Methods for Audio Analysis

By

Audio Analytic Labs is very excited to welcome Dr. Emmanouil Benetos from Queen Mary University of London, to talk about Matrix Decomposition Methods for Audio Analysis, on this Tuesday February 23rd. Not to be missed!

Matrix Decomposition Methods for Audio Analysis – Dr. Emmanouil Benetos, QMUL

Tuesday Feb. 23rd, 18:30

Venue: Metail, 50 St Andrew’s Street, CB2 3AH, Cambridge

Abstract:

Audio analysis -also called machine listening- involves the development of algorithms capable of extracting meaningful information from audio signals such as speech, music, or environmental sounds. Within the audio analysis field, matrix decomposition methods (also called spectrogram factorization methods) form a major part of current research, leading to systems that are robust, computationally efficient, and interpretable. In this talk I will present current research on matrix decomposition methods for analysing environmental sounds and music recordings. Example applications of matrix decomposition methods will be presented on sound event detection and automatic music transcription, both of which are considered fundamental and yet open problems. The final part of the talk will be a discussion on the limitations of current audio analysis technologies, as well as on identifying promising directions for future research.

Biography:

Emmanouil Benetos is Royal Academy of Engineering Research Fellow at the Centre for Digital Music, Queen Mary University of London. He holds a BSc and MSc in Informatics, and a BMus in Piano Performance. After receiving his PhD in Electronic Engineering (2012), he joined City University London as University Research Fellow (2013-15). His research interests include signal processing and machine learning methods for audio analysis, as well as applications of these methods to music information retrieval and environmental sound analysis. On research grants, he is PI for the RAEng research fellowship “A machine learning framework for audio analysis and retrieval”, and was Co-I for the AHRC-funded projects “Digital Music Lab – Analysing Big Music Data” and “An Integrated Audio-Symbolic Model of Music Similarity”. He has authored/co-authored over 50 papers in the fields of audio signal processing and music informatics, co-organised the 2013 IEEE challenge on Detection and Classification of Acoustic Scenes and Events (D-CASE) and the 2015 MIREX challenge on Music/Speech Classification & Detection.

Website: http://www.eecs.qmul.ac.uk/~emmanouilb/

The talk will be followed by Q&A and networking around nibbles.

Venue: Metail, 50 St Andrew’s Street, CB2 3AH, Cambridge

Doors open 18:30, talk to start promptly at 18:45.

Attendance is free.

Audio Analytic Ltd. is leading the world of acoustically connected things.  Our unique software is used by smart home companies the world over to make devices aware of sounds around them.  We are delighted to invite prominent machine learning and audio processing scientists to talk about their work, and hope that these talks will provide inspiration to the Cambridge Machine Learning community. The talks will also be made available online for future reference. For more info, please visit:

http://www.audioanalytic.com

Many thanks to Metail http://www.metail.com for providing the venue.

Please RSVP on:

http://www.meetup.com/The-Audio-Analytic-Tech-Talks/events/227419873/?_af=event&_af_eid=227419873

AA_Smart_Mic_V2_MAIN_ASSY_03_121214

Audio Analytic released a new paper on “Intelligent Sound Detection”

By

Intelligent Sound Detection with Audio Analytic

Dr. Sacha Krstulović, V.P. of Technology
Audio Analytic Ltd.

 

With the rise of the Internet of Things and the rebirth of Home Automation, the world is seeing an ever-increasing number of connected devices that provide value to people by analysing ambient data automatically. Such systems and devices are usually referred to as “smart” or “intelligent” because they include some degree of high-level interpretation of the captured data. The data itself varies in nature, ranging from images captured by connected cameras, through to body signals sensed by wearables, or house parameters measured by smart meters and thermostats. However few devices are as yet able to capture and analyse the high value information held in ambient sound.

What is intelligent sound detection?

Humans have dreamt for a long time of creating machines that would be able to emulate human behaviour: sci-fi is full of talking robots such as Robby from Forbidden Planet or HAL9000 from 2001 Space Odyssey, and early attempts at building machines able to talk, play games or otherwise help humans date back to the 18th century. In recent times, computer systems able to emulate spoken communication and music recognition have been popularised and have met commercial success with services such as Siri, Shazam and SoundHound. The common denominator of such services is the recognition of the particularly valuable sounds that are speech and music.

But what about the rest of the sounds around us? Wouldn’t it be useful if your surveillance camera or some other connected device could tell you if something were to go wrong in your home by the sound of it, for example if a smoke alarm went off or if someone were trying to break one of your windows? Enter Audio Analytic: our software performs intelligent sound detection.

How is intelligent sound detection different from speech and music recognition?

The challenges posed by recognising sound beyond speech and music are multiple. To begin with, the structure of a “soundscape” (a landscape of sounds) can be very different from a speech utterance or a musical piece: while sentences are structured by a grammar and musical pieces by a score, in a soundscape any sound can follow any other sound, with very loose sequential constraints, if any. Sounds can also overlap, with a sound of interest occurring on top of a noisy background. Further, while the sounds of speech are restricted to those that the human vocal tract can produce, and the sounds of music are most often structured as tuned notes and produced by resonance processes, environmental sounds can truly sound like anything and can be produced by a multiplicity of physical phenomena: crashing objects, explosions, beeping circuits, animal sounds, machines humming, which all result from different physical processes.

The quality of audio captured by devices is a further challenge. Speech is most often spoken close to the microphone of a mobile phone, or at the sweet spot of some in-car audio capture system, situations known as close talk or near field audio capture. Environmental sounds, on the other hand, can happen anywhere at a distance from the microphone, a situation known as far field audio capture. In the case of music, the audio can be delivered to the system as high quality recordings if classified from the user’s music collection, or with some degree of care for audio quality if it is captured from, say, a mobile phone’s microphone. But environmental sound recognition has to work on any embedded device, including devices whose audio circuitry was designed with cost reduction in mind rather than quality of audio transmission. For example, many consumer devices natively use mono audio capture, thus preventing the use of beamforming, a technique which may help far field audio capture but requires an array of at least two microphones operating in stereo. Dealing with such practical constraints and suboptimal audio is thus part of the art of automatic environmental sounds recognition.

In addition, while mainstream speech and music recognition services are backed up by huge computational power hosted in enormous data centres, environmental sound recognition software is most often expected to run “on the edge”, i.e., it has to work with the limited computational power available on its embedded host, for example, directly on the chip of a surveillance camera.

In order to solve these major challenges of complex soundscape, variable audio quality and low computational power, a sound recognition system must be very good at modelling a wide range of acoustical phenomena, while also being able to cope with a multiplicity of noise conditions and microphone types. Just like with speech recognition, which must be tolerant to variations in people’s voices when they have a cold, a sound recognition system must also be able to generalise between different instances of a sound, e.g., glass broken at different thicknesses and sizes, different models of smoke alarms etc. And, all this intelligence must be able to run at a very small computational cost.

How is Audio Analytic approaching this challenge?

We are expert in the development and application of sound recognition research.

The inherent complexity of intelligent sound detection means that there are no credible canned solutions. So Audio Analytic has developed intelligent solutions that involve a unique blend of state of the art knowledge about acoustic modelling and machine learning, and hands on know-how about audio data, audio capture devices, highly efficient embedded software, and a multitude of other practical aspects of sound recognition. We hold three technology patent families in the field.

Our in-house research team is highly experienced in the field. Most of the engineers hold a PhD, and their backgrounds include industrial R&D with prominent companies such as Toshiba and Nuance, and academic research with the best worldwide electrical engineering university programmes.

Sound recognition is a more relatively recent academic research topic than speech and music recognition, and there is a growing number of research centres around the world with whom Audio Analytic maintains contact. In particular, we run a constant programme of mutually valuable research projects with the top academic labs in the field, such as University of Surrey’s Centre for Vision, Speech and Signal Processing (Prof. Mark Plumbley) and Queen Mary University’s Centre for Digital Music (Prof. Simon Dixon).

How does it work?

So how can sound recognition be done, and how does Audio Analytic do it? Audio Analytic’s underlying technology is known as “machine learning”: just as humans learn to recognise sound from example, our algorithms are able to learn from large masses of audio recordings. To push the human analogy further, humans perceive sounds through the combined action of the ears, which capture sounds and extract some acoustic descriptors (e.g., tone, volume etc.), and the brain, which recognises the acoustic patterns and is able to generalise this recognition across a variety of instances of a particular sound.

Similarly, the Audio Analytic system is divided into a feature extraction module which extracts acoustic characteristics as the ears do, and a pattern matching engine which learns from data as the brain does. In building this technology, Audio Analytic has brought some deep skills to bear:

  • Collecting and managing large amounts of audio recordings of environmental sounds, for training and testing purposes. Many of these were recorded as actual field data rather than lab simulations.
  • Developing a feature extraction module able to capture the rich range of acoustic characteristics exhibited by a multitude of environmental sounds, which are much richer than the characteristics of speech and music.
  • Choosing and tuning the best machine learning algorithm for sound classification.
  • Making all this technology run on small embedded devices at a manageable and practical computational cost.

The alternative approaches we see in the market are typically too simplistic for the task. The sound recognition algorithms used are mostly based solely on volume detection or auditory models, which is akin to ears without a brain: somehow able to characterise sounds, but unable to recognise them in a way that would generalise sufficiently across variations of the sounds of interest.

Conclusions

Sound recognition is inherently a hard computational problem, yet when properly addressed it yields high value by informing people if sounds of interest happen when they’re not present to hear them. As the Smart Home and Internet of Things market grows and develops, sound recognition is becoming recognised as an increasingly important source of valuable and actionable information. This means that any sound recognition technology deployed across connected devices in the home, workplace and general environment needs to be as accurate, reliable and efficient as possible.
We believe that Audio Analytic’s expertise and experience makes us the best in the world at sound recognition, and we continually develop and evolve our technology for the benefit of our customers and end-users.