基于RDF句子的语义网文档搜索

吴鸿汉  瞿裕忠  李慧颖

基于RDF句子的语义网文档搜索

吴鸿汉瞿裕忠李慧颖

Searching Semantic Web Documents Based on RDF Sentences

Wu Honghan, Qu Yuzhong, and Li Huiying

摘要

摘要: 语义网文档搜索是发现语义网数据的重要手段.针对传统信息检索方法的不足,提出基于RDF句子的文档词向量构建方法.首先,文档被看作RDF句子的集合,从而在文档分析和索引时能够保留基于RDF句子的结构信息.其次,引入资源的权威描述的定义,能够跨越文档边界搜索到语义网中互连的数据.此外,扩展了传统的倒排索引结构, 使得系统能够提取出更加便于阅读和理解的片段.在大规模真实数据集上的实验表明,该方法可以显著地提高文档检索的效率,在可用性上具有明显的提升.

Abstract: Keyword-based semantic Web document search is one of the most efficient approaches to find semantic Web data. Most existing approaches are based on traditional IR technologies, in which documents are modeled as bag of words. The authors identify the difficulties of these technologies in processing RDF documents, namely, preserving data structures, processing linked data and generating snippets. An approach is proposed to model the semantic Web document from its abstract syntax: RDF graph. In this approach, a document is modeled as a set of RDF sentences. It preserves the RDF sentence-based structures in the processes of document analyzing and indexing. The authoritative descriptions of named resources are also introduced and it enables the linked data across document boundaries to be searchable. Furthermore, to help users quickly determine whether one result is relevant or not, The traditional inverse index structure is extended to enable more understandable snippet extraction from matched documents. Experiments on real world data show that this approach can significantly improve the precision and recall of semantic Web document search. The precision at top one result is improved up to 19% and a steady improvement (near 10%) is observed. According to 50 random queries, the recall increases up to 60% averagely. Remarkable improvements in system usability are also obtained.

HTML全文

参考文献(0)

施引文献

资源附件(0)