论文标题
使用概率热图的共振体估计和跟踪
Formant Estimation and Tracking using Probabilistic Heat-Maps
论文作者
论文摘要
实轴是由人声道的声学共振引起的光谱最大值,其准确的估计是最基本的语音处理问题之一。最近的工作表明,可以使用深度学习技术准确地估算这些频率。但是,当出现与接受过培训的域名不同领域的演讲时,这些方法的性能下降,限制了它们作为通用工具的用法。 本文的贡献是提出一种新的网络体系结构,该架构在各种不同的扬声器和语音域上都表现良好。我们提出的模型由一个共享编码器组成,该编码器作为输入频谱图并输出域不变表示形式。然后,多个解码进一步处理此表示形式,每个解码都负责预测不同的强义剂,同时考虑较低的强义预测。我们模型的一个优点是,它基于热图,该热图生成了超出共振剂预测的概率分布。结果表明,我们提出的模型更好地表示各个域上的信号,并导致更好的强峰频率跟踪和估计。
Formants are the spectral maxima that result from acoustic resonances of the human vocal tract, and their accurate estimation is among the most fundamental speech processing problems. Recent work has been shown that those frequencies can accurately be estimated using deep learning techniques. However, when presented with a speech from a different domain than that in which they have been trained on, these methods exhibit a decline in performance, limiting their usage as generic tools. The contribution of this paper is to propose a new network architecture that performs well on a variety of different speaker and speech domains. Our proposed model is composed of a shared encoder that gets as input a spectrogram and outputs a domain-invariant representation. Then, multiple decoders further process this representation, each responsible for predicting a different formant while considering the lower formant predictions. An advantage of our model is that it is based on heatmaps that generate a probability distribution over formant predictions. Results suggest that our proposed model better represents the signal over various domains and leads to better formant frequency tracking and estimation.
