Abstract:
The study and establishment of sentence classification model have an important impact on the study of nature language processing and understanding. In this paper, we propose a sentence classification model named SCNN based on sparse and self-taught convolutional neural networks in extracting characteristics of the features from data in the CNN model. Firstly, in this method, the convolutional layer itself studies the effective combinations from the feature matrices of the previous layers in order to dynamically learn the relationships of data features in the scope of the sentence, eliminating the user-defined feature-map input of the convolutional layers. Secondly, during the unsupervised training process, using L1-norm to increase sparse constraints, the complexity of the proposed model can be effectively decreased, on the contrary, the accuracy of SCNN model can be effectively increased. Finally, by employing K-Max Pooling in the feature extraction layer, the maximal feature sequence can be selected, and relative orders among features can be effectively preserved. SCNN can cope with sentence with variant length, and furthermore, the model can apply to any language due to its independence to any linguistic features like syntax and parse trees. Experiments on the standard corpus dataset show that the proposed model is effective for the task of the sentence classification.