Abstract:
Music is a kind of signal that has hierarchical structure. In music information retrieval (MIR) area, higher level features, such as emotion and genre, are typically extracted based on lower level features such as pitch and spectrum energy. Deep neural networks have good capacity of hierarchical feature learning, which indicates that deep learning is potentially to obtain good performance on music dataset. Audio classical composer identification (ACC) is an important problem in MIR which aims at identifying the composer for audio classical music clips. In this work, a hybrid model based on deep belief network (DBN) and stacked denoising autoencoder (SDA) is built to identify the composer from audio signal. The model get an accuracy of 76.26% in the testing data set which is better than some thoroughbred models and shallow models. After dimensionally reduced by linear discriminant analysis (LDA) it is also clear that the samples from different classes become farther away from each other when being transformed by more layers in our model. By comparing models in different sizes we give some empirical instruction for ACC problem. Similar to image, music features are hierarchical too and different parts of our brain handle signals differently. So we propose a hybrid model and our results encourage us to believe that our proposed model makes sense in some applications. During the experiments, we also find some practical guides for choosing network parameters. For example, number of neurons in the first hidden layer should be approximately 3 times to the dimension of input data.