Searching Semantic Web Documents Based on RDF Sentences
-
Graphical Abstract
-
Abstract
Keyword-based semantic Web document search is one of the most efficient approaches to find semantic Web data. Most existing approaches are based on traditional IR technologies, in which documents are modeled as bag of words. The authors identify the difficulties of these technologies in processing RDF documents, namely, preserving data structures, processing linked data and generating snippets. An approach is proposed to model the semantic Web document from its abstract syntax: RDF graph. In this approach, a document is modeled as a set of RDF sentences. It preserves the RDF sentence-based structures in the processes of document analyzing and indexing. The authoritative descriptions of named resources are also introduced and it enables the linked data across document boundaries to be searchable. Furthermore, to help users quickly determine whether one result is relevant or not, The traditional inverse index structure is extended to enable more understandable snippet extraction from matched documents. Experiments on real world data show that this approach can significantly improve the precision and recall of semantic Web document search. The precision at top one result is improved up to 19% and a steady improvement (near 10%) is observed. According to 50 random queries, the recall increases up to 60% averagely. Remarkable improvements in system usability are also obtained.
-
-