A framework for the robust evaluation of sound event detection
In our published paper we introduce the Polyphonic Sound Detection Score (PSDS), which redefines true positives (TPs) and false positives (FPs) for sound recognition in a more task-oriented manner and formulates the performance analysis across many classes into a single robust metric. The advantages of this new metric make it an industry standard for the evaluation of sound recognition models.
PSDS source code available on Github
We have made the PSDS source code available on GitHub under the open source MIT license, enabling researchers to apply this scoring method freely and independently to their own sound recognition models. The PSDS GitHub repository comprises of a Jupyter notebook and python package, which contains a library that calculates the PSDS of polyphonic sound event detection systems.
Watch our presentation from ICASSP 2020
Our research work on PSDS was presented at the prestigious IEEE International Conference on Acoustics, Speech and Signal Processing (aka ICASSP). You can watch our technical video presentation on the ICASSP website.
What is the Polyphonic Sound Detection Score?
First published and presented at ICASSP2020, the Polyphonic Sound Detection Score (PSDS) is an industry-standard evaluation framework and metric for polyphonic sound recognition systems designed by Audio Analytic and based on our extensive expertise in real-world sound recognition.
PSDS solves the fundamental shortcomings of previous evaluation approaches, and we believe that the wider sound recognition community will benefit from our expertise: feel free to access the source code and benefit from using PSDS in your own work. We have made PSDS available on GitHub and the repository comprises of a python package which contains a library with all the necessary tools to calculate and exploit the PSDS of polyphonic sound event detection systems.
Audio Analytic has identified three key limitations that need to be addressed for an evaluation metric to be meaningful and robust when detecting sound events from multiple
classes (for example glass break, dog bark etc.), which can occur simultaneously.
- Redefining sound event detection: Valid sound event detections should be defined by intersection with the ground truth labels, rather than collars around start and end times
- Consideration for cross-triggers: cross-triggers, when treated as a special case of false positives, yield extra insight into data properties, and also help error analysis
- Dependence on operating point: the metric should be independent of the system’s operating point, while remaining relevant to user experience.
Our approach to evaluating the performance of polyphonic sound event detection systems revisits the definition of system errors, makes the evaluation more robust and expands the evaluation to include the factors which matter to user experience.
PSDS adopted by DCASE for Task 4 challenge
The organisers of the 2020 DCASE Challenge have included Audio Analytic’s Polyphonic Sound Detection Score (PSDS) as one of the two evaluation metrics for ‘Task 4: Sound event detection and separation in domestic environments’.