March 21, 2016

AA Lab’s impressions on the Workshop on Auditory Neuroscience, Cognition and Modelling 2016

At Audio Analytic, we are pushing the boundaries of acoustic intelligence, developing a system that recognises environmental sounds which are semantically significant. Understanding sound and producing automatic sound recognition is a multifaceted challenge which requires expertise from various fields, such as signal processing, machine learning, computer science, cognitive science and auditory neuroscience. Thus, while developing in-house solutions and products which bring this cutting edge understanding of sound into people’s lives, we are always keeping a keen eye on the developments happening in the academic community, in the spirit of being open to academic collaborations and taking an active part in cutting-edge worldwide research.

The Workshop on Auditory Neuroscience, Cognition and Modelling (WANCM2016) hosted by the Centre for Digital Music (C4DM) at Queen Mary, University of London (QMUL), on Feburary 17th, was an initiative to bring together researchers from the fields of auditory neuroscience, auditory modeling, signal processing and machine learning. The range of perspectives within auditory research that were represented fostered interesting conversations at the intersection of the multiple disciplines that work towards understanding sound. Gathered at QMUL’s Charterhouse Square campus near the iconic Barbican Centre, the workshop hosted one day of oral and poster presentations, as well as three stand-out keynote presentations from researchers at Aarhus University, IRCAM/CNRS and the University of Cambridge. In this post, we will review some highlights of the day’s proceedings.

Prof. Elvira Brattico (Aarhus University) inagurated the workshop by presenting work being undertaken at the newly opened Center for Music in the Brain (MIB). The MIB unites expertise from a range of departments across Aarhus, including the Royal Academy of Music, along with the clinical facilities at the Centre for Functionally Integrative Neuroscience, creating new opportunities for cross-disciplinary insights into the perception of music. Prof. Brattico presented evidence for traces of Mismatched Negative (MMN; a neurophysiological correlate, or neuromarker, for brain activity deviations due to the detection of irregularities in a stimulus) when listening to musical instruments playing in foreign scales or hitting wrong notes. She also presented evidence for stronger acculturation in trained musicians of high-level musical schemes, which vary greatly between musics of different cultures, based on stronger ERAN (early right anterior negativity; a neuromarker for recognising deviations in hierarchical musical organization). The talk was ended by an enlightening animation of the electrical activity of a human brain listening to tango, revealing the macroscopic influence that organised sound has on the cortex.

Dr. Jean-Julien Acouturier (IRCAM/CNRS) gave an intriguing demonstration of his long-lasting CREAM project (Cracking the Emotional Code of Music), first developed at CNRS and since moved to IRCAM. Dr. Acouturier et al designed an experiment where participants’ voices were manipulated to imbue their voices with an emotive quality of either happiness, sadness or fear. The key finding was that participants remained unaware that anything had changed in the sound of their voice. Furthermore, they even took on the imposed emotion in the way they spoke, while remaining unwitting to the influence. This work demonstrates that people do not actively monitor the emotiveness of their own voice, and that this can in fact be altered by externally applied feedback. The acoustic manipulations are in fact fairly simple, just some filtering, compression, amplitude modulation, among others, but how finely they were executed is quite remarkable. The software is freely available as a Max patch at

Dr. Richard Turner’s work in sound texture synthesis contextualised his proposal of a probabilistic time-frequency demodulation algorithm which is impressively robust to low signal-to-noise ratios (SNR). Poor SNR signals often plague popular methods for pitch detection (namely approaches using the linear Hilbert transform). Demodulation is widely used in assistive hearing devices (such as hearing aids or cochlear implants), to decompose a speech signal into a slowly-varying “modulation” signal that represents the amplitude envelope of the sound, and a fast-varying “carrier” signal which carries the content of the signal in more finely temporal structure. Dr. Turner’s approach employs Gaussian processes to perform Probabilistic Amplitude Demodulation, the parameters of which are inferred from the structure inherent in the signal itself, and can also incorporate context-specific constraints as prior information. So for example, one can use knowledge about the temporal granularity of a sound texture to steer the Gaussian processes into the feature-space that best encapsulates the nature of the signal. These features lead to a demodulation algorithm that can deal with uncertain signals in noisy environments, ideal (and indeed requisite) for processing natural sounds. This work brings us closer to being able to give people with hearing loss the ability to distinguish speech in noisy environments and better enjoy the complex sonic landscapes of music.

The poster sessions went deeper into the territory explored by the oral presentations, diving into artificial neural networks and biologically-inspired computational auditory scene analysis (CASA). Transfer learning featured in several projects as a means of boosting predictive models whose domain renders collection of sufficient data difficult or impractical. Eduardo Coutinho (Imperial College London & University of Liverpool) offered an interesting example, extracting the emotional semantics embedded in music and speech, revealing common semantics between these two seemingly disparate but altogether related means of communication.

Cleo Pike (University of Surrey) and Amy Beeston (Sheffield University) presented findings that highlight one area in which the human ear still far outperforms machine listening techniques: cognition in reverberant environments. The human ear is able to adapt to highly reverberant environments reasonable quickly (in the order of 500ms), restoring speech intelligibility to baseline levels, while computational methods are still notoriously computationally expensive. Furthermore, humans only need two ears to do this, while most computational systems exploit more complex microphone arrays to do the job. They proposed some physiological structures of the peripheral auditory system as potentially yielding clues to new methods of approaching the problem of dereverberation in computer systems. Interestingly, they believe this processing is taken care of before the auditory signals reach the cortex, and may be a virtue of binaural hearing. Their findings help further understand the biological substrate of binaural reverberation in human listeners.

On the ground, attendees discussed the DCASE challenge 2016 (Detection and Classification of Acoustic Scenes and Events:, and the emphasis on real world sound recognition and event detection. Although not officially mentioned as a DCASE organiser, Audio Analytic Labs has been invited to advise on dataset labelling and evaluation methodologies for DCASE’s Task 4 “Domestic Audio Tagging”. We look forward to seeing the compared results of the participant’s state of the art algorithms over the public domain Chime-Home dataset which was built for this occasion through collaborative work between AA Labs (Dr. Sacha Krstulovic), Queen Mary University of London (Dr. Peter Foster), University of Surrey (Prof. Mark Plumbley) and University of Sheffield (Dr. Jon Barker).

There was also mention of the initiative, a platform by the C4DM to promote “sustainable software for audio and music research”, through the provision of open source tools, methodologies for reproducible research and more. Overall, there was a great sense of comradery among researchers, and a sense that different strains of auditory research were converging towards a more fruitful academic panorama.

At the end of the day, the organisers iterated the general desire for the workshop to continue in the future. Thanks to the great turn out, the organisers proposed for the workshop to continue in the future, possibly in a different location. They are currently looking for hosting institutions to run the workshop next year.

Videos of the lectures are available online at:

(Linked image courtesy Emmanouil Benetos, QMUL/C4DM.)


Like this? You can subscribe to our blog and receive an alert every time we publish an announcement, a comment on the industry or something more technical.

About Audio Analytic

Audio Analytic is the pioneer of AI sound recognition software. The company is on a mission to map the world of sounds, offering our sense of hearing to consumer technology. By transferring our sense of hearing to consumer products and digital personal assistants we give them the ability to react to the world around us, helping satisfy our entertainment, safety, security, wellbeing and communication needs.

Audio Analytic’s ai3™ sound recognition software enables device manufacturers and chip companies to equip products with Artificial Audio Intelligence, recognizing and automatically responding to our growing list of sound profiles.