Abstract:
In this paper, the clustering problem of syllable pitch contours is studied. By doing clustering and reasonable sample selection, the size of the large speech corpus can be significantly reduced. Besides, by introducing the speech coding technique, a small-size multi-sample tonal mono-syllable corpus can be built to satisfy the demands of clarity and naturalness for embedded text-to-speech systems. For pitch contours with different lengths, a non-fixed-length contours clustering approach is proposed. This approach introduces the idea of dynamic programming (DP) into clustering. Firstly, the pitch of contours is normalized (zero-mean). Then, the best path is found between two contours using the DP method. Finally, the distance measure of two contours along this path is calculated. If the shapes of the two pitch contours are similar, the distance measure value will be very low. In the stage of sample selection, the tone domain of syllables is divided by pitch means and then the typical samples are identified according to its levels and clusters. Clustering experiments show that better clustering results can be achieved by this approach compared with traditional approaches. And new clustering approach is also validated by synthesis experiments.