Mining Patent Knowledge for Automatic Keyword Extraction
-
摘要: 关键词是人们快速判断是否要详细阅读文件内容的重要线索,关键词自动抽取在信息检索、自然语言处理等研究领域均有重要应用.设计了一种新的关键词自动抽取方法,使计算机能够像人类专家一样,利用知识库对目标文本进行学习和理解,最终自动抽取出关键词.专利数据因其数据量庞大、内容丰富、表达准确、专业权威而被选中作为知识库来源.详细讨论了专利数据的特性,挖掘不同专利间的知识关联,针对某一知识领域构造背景知识库,在此基础上进行目标文本的关键词自动抽取.与目标文本相关的专利文集中每个专利的专利发明人、权利人、专利引用和分类信息都被用于在不同的专利文档之间发现关联性,利用关联信息扩充背景知识库,获得目标文档在各个相关知识领域的背景知识库.基于背景知识库设计了词知识特征值,以反映词在目标文本背景知识中的重要程度.最后,把关键词抽取问题转化为分类问题,利用支持向量机(support vector machine, SVM)抽取出目标文本的关键词.在专利数据集和开放数据集的实验结果证明明显优于现有算法.Abstract: Keywords are important clues that can help a user quickly decide whether to skip, to scan, or to read the article. Keyword extraction plays an increasingly crucial role in information retrieval, natural language processing and other several text related researches. This paper addresses the problem of automatic keyword extraction and designs a novel automatic keyword extraction approach making use of patent knowledge. This approach can help computer to learn and understand the document as human being according to its background knowledge, finally pick out keywords automatically. The patent data set is chosen as external knowledge repository because of its huge amount of data, rich content, accurate expression and professional authority. This paper uses patent data set as the external knowledge repository serves for keyword extraction. An algorithm is designed to construct the background knowledge repository based on patent data set, also a method for automatic keyword extraction with novel word features is provided. This paper discusses the characters of patent data, mines the relation between different patent files to construct background knowledge repository for target document, and finally achieves keyword extraction. The related patent files of target document are used to construct background knowledge repository. The information of patent inventors, assignees, citations and classification are used to mining the hidden knowledge and relationship between different patent files. And the related knowledge is imported to extend the background knowledge repository. Novel word features are derived according to the different background knowledge supplied by patent data. The word features reflecting the document’s background knowledge offer valuable indications on individual words’ importance in the target document. The keyword extraction problem can then be regarded as a classification problem and the support vector machine (SVM) is used to extract the keywords. Experiments have been done using patent data set and open data set. Experimental results have proved that using these novel word features, the novel approach can achieve superior performance in keyword extraction to other state-of-the-art approaches.
-
-
期刊类型引用(20)
1. 韦修喜,彭茂松,黄华娟. 基于多策略改进蝴蝶优化算法的无线传感网络节点覆盖优化. 计算机应用. 2024(04): 1009-1017 . 百度学术
2. 刘超敏,胡玉平. 基于VGG—19和卡尔曼预处理的WSNs测距方法. 传感器与微系统. 2023(10): 139-142 . 百度学术
3. 刘松旭,张大鹏,乌云娜,刘鹏. 基于RSSI模型的无线传感器网络定位算法. 计算机仿真. 2022(01): 427-431 . 百度学术
4. 崔焕庆,张娜,罗汉江. 基于改进鸽群算法的无线传感器网络定位方法. 传感技术学报. 2022(03): 399-404 . 百度学术
5. 陈岩 ,高振国 ,王海军 ,欧阳云 ,缑锦 . 隐私保护能力可调的节点定位协议. 计算机研究与发展. 2022(09): 2075-2088 . 本站查看
6. 刘琳岚,肖庭忠,舒坚,牛明晓. 基于门控循环单元的链路质量预测. 工程科学与技术. 2022(06): 51-58 . 百度学术
7. 赵高丽,宋军平. 水下传感器网络自组织连通恢复仿真. 计算机仿真. 2021(03): 152-156 . 百度学术
8. 刘恒,钟俊,刘辉. 基于优化核极限学习的WSN网络汇聚节点故障诊断. 新乡学院学报. 2021(06): 28-32 . 百度学术
9. 石秦峰,徐祥涛,杨晓东. 基于节点汇聚链路模型的光纤传感器物联网节点控制. 激光杂志. 2021(07): 109-113 . 百度学术
10. 张晶,罗施章,付谱平. 基于虚拟力移动锚节点的3D-DVHop-ACR定位算法. 控制与决策. 2021(10): 2409-2417 . 百度学术
11. 张盛安,周洋,方浩,孙玉洁. 贵州电网贵阳供电局网络资源敏捷定位关键问题设计. 电力大数据. 2021(05): 79-85 . 百度学术
12. 王礼霞,邰清清. 基于高阶马尔可夫链的无线传感器网络异常节点检测. 黑龙江工业学院学报(综合版). 2021(08): 93-97 . 百度学术
13. 宰红斌,刘建国,唐保国,马建国,上官明霞,单荣荣. 基于WSN的输电线路状态监测与数据采集跨层优化方法. 电气工程学报. 2021(03): 161-169 . 百度学术
14. 郑岚. 多信道通信网络环境下基于节点组簇技术通信资源调度算法. 山西能源学院学报. 2021(05): 97-99 . 百度学术
15. 徐逸夫,段隆振. 基于蛙跳算法的无线传感器网络节点重部署. 计算机仿真. 2021(10): 328-332 . 百度学术
16. 宋亚磊. 基于虚拟引力约束的光纤传感器网络节点空洞智能修复算法研究. 传感技术学报. 2021(10): 1395-1400 . 百度学术
17. 易柏言. 关于无线传感器网络的时间同步技术探究. 科技创新与应用. 2020(15): 152-153 . 百度学术
18. 王林,刘盼. 基于卷积神经网络的行人目标检测系统设计. 计算机测量与控制. 2020(07): 64-68+96 . 百度学术
19. 左伟伟. 基于微积分算子的网络节点发包概率分布研究. 电子设计工程. 2020(23): 116-119+124 . 百度学术
20. 李庐,赵晓峰. 基于拓扑感知映射算法的传感器网络数据稳定传输方法. 湖南科技学院学报. 2020(05): 54-57 . 百度学术
其他类型引用(6)
计量
- 文章访问数: 2076
- HTML全文浏览量: 0
- PDF下载量: 945
- 被引次数: 26