Abstract:
International patent classification (IPC) and Chinese library classification (CLC), as important classification marks, play an important role in the organization and management of patent information and journal literature respectively. How to accurately establish the mapping relationship between two classifications is of great significance to the realization of cross-browsing and retrieval of patent information and journal resources. In the paper, a siamese network based on BERT pre-training contextual language model is proposed to establish the mapping relationship between IPC and CLC. A siamese network model is used to abstract the description texts of two classification categories respectively, and the sentence vectors of the same dimension are calculated by average pooling the word representation after abstraction, and the similarity score between sentences is calculated based on cosine similarity to complete classification mapping. The mapping corpus between IPC category and CLC category is manually annotated. The experimental results on the corpus show that the proposed method is significantly better than the rule-based method and other deep neural network methods, such as Sia-Multi, Bi-TextCNN, Bi-LSTM etc. The relevant code, models, and manual annotation corpus are publicly released.