Abstract:
Emotional prosody model building is very important for emotional speech synthesis. However, in the courses of researches, it is a serious problem that the quantity of emotional data is much less than neutral data. The corpus including three emotions, i.e. happiness, anger and sadness, is built in this paper. The parameters that affect the emotional prosody are analyzed and an emotional prosody model based on neural network is built. In the process of training the prosody model, because emotional corpus is too small, the problem of over-fitting caused by data sparsity will occur. In order to utilize the large-scale neutral data to improve the quality of emotional prosody model, three methods are proposed, namely, the method of mixed corpus, data fusion based on least-square algorithm, and multistage network. All of these methods amplify the impact of emotional corpus. So, the prediction results of emotional parameters are all improved to some extent. Especially the method of multistage network, which uses the result of neutral model as one input of the network, corresponds to enlarge the features space and strengthen the function of the emotional input features. The results show that the multistage network is the best one of the three methods.