论文标题
最小处理近端听力增强
Minimum Processing Near-end Listening Enhancement
论文作者
论文摘要
手机或公告系统中的语音的清晰度和质量通常会受到聆听环境中背景噪音的影响。通过预处理语音信号,可以提高语音清晰度和质量 - 这被称为近端听力增强(NLE)。虽然,现有的NLE技术能够大大提高刺耳的噪音环境中的清晰度,但在有利的噪声条件下,语音的清晰度达到了无法进一步增强的天花板。实际上,现有方法的重点仅仅是提高可理解性会导致语音信号的不必要处理,并导致语音扭曲和质量下降。在本文中,我们为NLE提供了一个新的理由,该基本原理是根据处理罚款的最低处理,只要满足了一定的绩效限制,例如可理解性,就可以使用处理罚款。我们为基于近似语音可理解性指数的可理解性估算器提供了一个封闭式解决方案,而处理惩罚是处理后的语音和干净语音之间的均方错误。这会产生一种NLE方法,该方法通过将处理限制为实现所需的可理解性所需的最低限度来适应噪声条件的变化,同时通过最大程度地减少语音扭曲的量来关注有利的噪声情况下的质量。通过仿真研究,我们显示了所提出的方法在客观测量和主观听力测试中的语音质量或比现有方法更好,同时仍然可以在现有方法上保持客观的语音可理解性能。
The intelligibility and quality of speech from a mobile phone or public announcement system are often affected by background noise in the listening environment. By pre-processing the speech signal it is possible to improve the speech intelligibility and quality -- this is known as near-end listening enhancement (NLE). Although, existing NLE techniques are able to greatly increase intelligibility in harsh noise environments, in favorable noise conditions the intelligibility of speech reaches a ceiling where it cannot be further enhanced. Actually, the focus of existing methods solely on improving the intelligibility causes unnecessary processing of the speech signal and leads to speech distortions and quality degradations. In this paper, we provide a new rationale for NLE, where the target speech is minimally processed in terms of a processing penalty, provided that a certain performance constraint, e.g., intelligibility, is satisfied. We present a closed-form solution for the case where the performance criterion is an intelligibility estimator based on the approximated speech intelligibility index and the processing penalty is the mean-square error between the processed and the clean speech. This produces an NLE method that adapts to changing noise conditions via a simple gain rule by limiting the processing to the minimum necessary to achieve a desired intelligibility, while at the same time focusing on quality in favorable noise situations by minimizing the amount of speech distortions. Through simulation studies, we show the proposed method attains speech quality on par or better than existing methods in both objective measurements and subjective listening tests, whilst still sustaining objective speech intelligibility performance on par with existing methods.
