小规模情感数据和大规模中性数据相结合的情感韵律建模研究

邵艳秋; 穗志方; 韩纪庆; 王志伟

小规模情感数据和大规模中性数据相结合的情感韵律建模研究

A Study on the Emotional Prosody Model Building Based on Small-Scale Emotional Data and Large-Scale Neutral Data

摘要

摘要: 建立好的情感韵律模型是合成情感语音的重要环节，而在情感语音的研究过程中，一个必须面对的现实问题就是通常情感数据量相比于中性数据量要少得多.将一个含有高兴、生气、悲伤3种情感语音的小规模数据库和一个较大规模的中性语音数据库相结合，进行情感韵律建模研究.对影响情感的韵律参数进行了分析，建立了基于人工神经网络的情感韵律模型.针对情感数据量相对于中性数据量的不足而导致的过拟合现象，提出了3种解决办法，即混合语料法、最小二乘融合法和级联网络法.这些方法都在不同程度上扩大了情感语料的作用，使得情感预测效果都有所提高.尤其是级联网络法，将中性模型的结果作为级联网络的一个输入，相当于扩大了情感模型的特征空间，更加强化了情感模型各输入特征的作用，在3种情感的各韵律参数生成中效果是最好的.

Abstract: Emotional prosody model building is very important for emotional speech synthesis. However, in the courses of researches, it is a serious problem that the quantity of emotional data is much less than neutral data. The corpus including three emotions, i.e. happiness, anger and sadness, is built in this paper. The parameters that affect the emotional prosody are analyzed and an emotional prosody model based on neural network is built. In the process of training the prosody model, because emotional corpus is too small, the problem of over-fitting caused by data sparsity will occur. In order to utilize the large-scale neutral data to improve the quality of emotional prosody model, three methods are proposed, namely, the method of mixed corpus, data fusion based on least-square algorithm, and multistage network. All of these methods amplify the impact of emotional corpus. So, the prediction results of emotional parameters are all improved to some extent. Especially the method of multistage network, which uses the result of neutral model as one input of the network, corresponds to enlarge the features space and strengthen the function of the emotional input features. The results show that the multistage network is the best one of the three methods.

HTML全文

参考文献(0)

施引文献

资源附件(0)