Reverberant speech separation based on audio-visual dictionary learning and binaural Cues

Liu, Qingju, Wang, Wenwu, Jackson, Philip and Barnard, Mark (2012) Reverberant speech separation based on audio-visual dictionary learning and binaural Cues. In: IEEE Statistical Signal Processing Workshop (SSP); 05 - 08 Aug 2012, Michigan, U.S.. (2012 IEEE Statistical Signal Processing Workshop (SSP)) ISSN (print) 2373-0803

Full text not available from this archive.

Abstract

Probabilistic models of binaural cues, such as the interaural phase difference (IPD) and the interaural level difference (ILD), can be used to obtain the audio mask in the time-frequency (TF) domain, for source separation of binaural mixtures. Those models are, however, often degraded by acoustic noise. In contrast, the video stream contains relevant information about the synchronous audio stream that is not affected by acoustic noise. In this paper, we present a novel method for modeling the audio-visual (AV) coherence based on dictionary learning. A visual mask is constructed from the video signal based on the learnt AV dictionary, and incorporated with the audio mask to obtain a noise-robust audio-visual mask, which is then applied to the binaural signal for source separation. We tested our algorithm on the XM2VTS database, and observed considerable performance improvement for noise corrupted signals.

Item Type: Conference or Workshop Item (Paper)
Event Title: IEEE Statistical Signal Processing Workshop (SSP)
Research Area: Computer science and informatics
Faculty, School or Research Centre: Faculty of Science, Engineering and Computing > School of Computer Science and Mathematics
Depositing User: Katrina Clifford
Date Deposited: 05 Jul 2019 14:22
Last Modified: 05 Jul 2019 14:22
DOI: https://doi.org/10.1109/SSP.2012.6319789
URI: http://eprints.kingston.ac.uk/id/eprint/43483

Actions (Repository Editors)

Item Control Page Item Control Page