ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2014, Vol. 51 ›› Issue (9): 1945-1954.doi: 10.7544/issn1000-1239.2014.20140189

Special Issue: 2014深度学习

Previous Articles     Next Articles

Audio Classical Composer Identification by Deep Neural Network

Hu Zhen, Fu Kun, Zhang Changshui   

  1. (Department of Automation, Tsinghua University, Beijing 100084) (Tsinghua National Laboratory for Information Science and Technology (TNList), Beijing 100084) (State Key Laboratory of Intelligent Technology and Systems (Tsinghua University), Beijing 100084)
  • Online:2014-09-01

Abstract: Music is a kind of signal that has hierarchical structure. In music information retrieval (MIR) area, higher level features, such as emotion and genre, are typically extracted based on lower level features such as pitch and spectrum energy. Deep neural networks have good capacity of hierarchical feature learning, which indicates that deep learning is potentially to obtain good performance on music dataset. Audio classical composer identification (ACC) is an important problem in MIR which aims at identifying the composer for audio classical music clips. In this work, a hybrid model based on deep belief network (DBN) and stacked denoising autoencoder (SDA) is built to identify the composer from audio signal. The model get an accuracy of 76.26% in the testing data set which is better than some thoroughbred models and shallow models. After dimensionally reduced by linear discriminant analysis (LDA) it is also clear that the samples from different classes become farther away from each other when being transformed by more layers in our model. By comparing models in different sizes we give some empirical instruction for ACC problem. Similar to image, music features are hierarchical too and different parts of our brain handle signals differently. So we propose a hybrid model and our results encourage us to believe that our proposed model makes sense in some applications. During the experiments, we also find some practical guides for choosing network parameters. For example, number of neurons in the first hidden layer should be approximately 3 times to the dimension of input data.

Key words: ACC (audio classical composer identification), deep neural network, hybrid model, feature learning, over-fitting

CLC Number: