March 24, 2020
PSDS adopted by DCASE for Task 4 challenge
The organisers of the 2020 DCASE Challenge have included our Polyphonic Sound Detection Score (PSDS) as one of the two evaluation metrics for ‘Task 4: Sound event detection and separation in domestic environments’. PSDS is the subject of a research paper, which was accepted for publication at the International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020) this year.
I sat down with the authors of the research paper to discuss their thoughts on seeing our research being quickly adopted by the industry’s leading research challenge:
- CB: Dr Cagdas Bilen, Staff Research Engineer
- GF: Giacomo Ferroni, Senior Research Engineer
- FT: Francesco Tuveri, Senior Research Engineer
- JAO: Juan Azcarreta Ortiz, Research Engineer
- SK: Dr Sacha Krstulovic, Director, AA Labs
Firstly, congratulations. Less than six months after first submitting the paper to ICASSP it has been adopted for the DCASE Challenge and has been accepted by the ICASSP peer reviewers to be presented at the event later this year. What does it mean for you?
GF: “It is very exciting personally and professionally. The first step to having it adopted by researchers in academia and industry is to make it available in the most important SED challenge there is. PSDS opens new avenues to interpret the systems and it helps keep the challenge goal closer to the real-world applications. We’ve given the community a tool that is all aboutimproving system analysis and understanding more about the systems under test, and consequently, fuelling ideas.”
FT: “I agree, evaluating results with PSDS alongside the traditional evaluation metric will trigger interesting discussions during the DCASE Workshop event later this year and in the months that follow. As a researcher you hope that your work allows others to open many doors to new ideas, so I really look forward to seeing our work achieve this.”
As PSDS is used by research groups during Task Four, what impact will it have on their work?
SK: “We expect it to uncover new insights about the models being compared, because with PSDS new aspects of the models under comparison can be explored. For example: their stability of performance across classes or their cross-triggering effects.
Besides, the definition of true positives and false positives introduced for PSDS is more tolerant to labelling variations and as such should re-qualify some true positives which were discarded by the collar-based approach. PSDS is also independent of the operating point settings and as such focuses on the global modelling power of each system. Model ranking might be requalified in the light of all of that.”
JAO: “Thanks to PSDS, DCASE participants will be less likely to overcook their models to only one operating point. PSDS will show participants how their sound recognition systems generalize to different operating conditions, and they will be even able to tune parameters to simulate different UX evaluation scenarios. With the previous F1-score metric this simulation was not really possible.
I also hope that the findings uncovered by using PSDS will provide the DCASE community with new insights to evolve the challenge tasks in future years. I am confident that PSDS will close the gap between sound recognition research and its many industrial applications.”
CB: “I can’t wait to see how people react to the PSDS plots and how they extract insights from it after – or even during – the challenge. I also expect that the score order and the gaps between the scores of different systems will differ greatly between the traditional F1-score and PSDS, which will surely create a lot of discussion. As Francesco said, our job as researchers is to help others open doors to new ideas and approaches.”
SK: “I agree with Cagdas regarding the impact of comparing F1-score and PSDS. When introducing a new method, whether an algorithm or a metric, it is beneficial to compare the new method to the old one in order to show and underline the benefits of the new method.”
FT: “We should see some models performing quite differently depending on the metric. For example, we might see some models that are mid-ranking with the F1-score jump to the higher part of the ranking with PSDS, and vice-versa.”
Beyond DCASE and ICASSP this year, what does the future hold in store for PSDS?
CB: “I hope that after people have used it and seen the clarity that it provides, it becomes adopted as the standard for evaluating sound event detection systems and becomes common place in the community.”
SK: “Obviously there will be a bit of pride if PSDS gets adopted as the industry standard, but that outcome belongs to scientific method and appreciation by our peers of the benefits brought by PSDS. We are contributing our industrial knowledge, acquired from field studies and experience from developing real products, into the global research community. Using it within the DCASE Challenge, presenting it at ICASSP and then discussing and exploring the impact with our peers will be an enriching experience for all of us. It is possible that someone will come up with something better someday, perhaps an evolution of PSDS that they will have thought of after reading our paper. For me, that is really exciting.”
GF: “We could see it applied in other fields of research that have similar constraints, which would be fascinating to see. To fuel other areas of cutting-edge research, even outside of our domain of sound recognition would be really fulfilling.”
You can find the full technical paper on the Polyphonic Sound Detection Score, along with links to GitHub and the Jupyter Notebook, here.