• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
He Xianmin, Li Maoxi, He Yanqing. Siamese BERT-Networks Based Classification Mapping of Scientific and Technological Literature[J]. Journal of Computer Research and Development, 2021, 58(8): 1751-1760. DOI: 10.7544/issn1000-1239.2021.20210323
Citation: He Xianmin, Li Maoxi, He Yanqing. Siamese BERT-Networks Based Classification Mapping of Scientific and Technological Literature[J]. Journal of Computer Research and Development, 2021, 58(8): 1751-1760. DOI: 10.7544/issn1000-1239.2021.20210323

Siamese BERT-Networks Based Classification Mapping of Scientific and Technological Literature

Funds: This work was supported by the National Natural Science Foundation of China (61662031) and the Fund of the Institute of Scientific and Technical Information of China (ZD2020-18).
More Information
  • Published Date: July 31, 2021
  • International patent classification (IPC) and Chinese library classification (CLC), as important classification marks, play an important role in the organization and management of patent information and journal literature respectively. How to accurately establish the mapping relationship between two classifications is of great significance to the realization of cross-browsing and retrieval of patent information and journal resources. In the paper, a siamese network based on BERT pre-training contextual language model is proposed to establish the mapping relationship between IPC and CLC. A siamese network model is used to abstract the description texts of two classification categories respectively, and the sentence vectors of the same dimension are calculated by average pooling the word representation after abstraction, and the similarity score between sentences is calculated based on cosine similarity to complete classification mapping. The mapping corpus between IPC category and CLC category is manually annotated. The experimental results on the corpus show that the proposed method is significantly better than the rule-based method and other deep neural network methods, such as Sia-Multi, Bi-TextCNN, Bi-LSTM etc. The relevant code, models, and manual annotation corpus are publicly released.
  • Cited by

    Periodical cited type(7)

    1. 黄敏,魏嘉琴,李茂西. 基于预训练语言模型的IPC与高相似CLC类目自动映射. 中文信息学报. 2025(02): 153-161 .
    2. 吴龙涛,黄李洲,黄凰,施加松. 基于提示学习的生物恐怖威胁信息指纹零样本文本分类技术. 防化研究. 2024(03): 63-71 .
    3. 宋艳辉,陈歆琦. 基于作者研究相似性的CLC与IPC类目同现映射研究. 情报学报. 2024(08): 927-935 .
    4. 冉从敬,田文芳,贾志轩. 基于混合方法的“科学论文-专利技术”关联关系模型构建——以生物医药领域为例. 情报科学. 2024(06): 132-143 .
    5. 李晓瑛,刘懿,李爱花,杨雪梅,唐小利. 生物医学领域多源文献数据学科映射方法优化研究. 数字图书馆论坛. 2023(03): 1-9 .
    6. 蒋彦廷,吴钰洁. 英文文献的《中图法》分类号自动标注研究——基于文本增强与类目映射策略. 数字图书馆论坛. 2022(05): 39-46 .
    7. 周瀚章. 基于BERT的上下文感知分类研究. 长江信息通信. 2021(11): 72-74 .

    Other cited types(6)

Catalog

    Article views (626) PDF downloads (328) Cited by(13)
    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return