Abstract:
Relevance ranking is a key to Web search in determining how results are retrieved and ordered. As keyword-based search does not guarantee relevance in meanings, semantic search has been put forward as an attractive and promising approach. Recently several kinds of semantic information have been adopted in search respectively, such as thesauruses, ontologies and semantic markups, as well as folksonomies and social annotations. However, although to integrate more semantics would logically generate better search results, search mechanism to fully adopt different kinds of semantic information is still in absence and to be researched. To these ends, an integrated semantic search mechanism is proposed to incorporate textual information and keyword search with heterogeneous semantic information and semantic search. A statistical based measurement of semantic relevance, defined as semantic probabilities, is introduced to integrate both keywords and four kinds of semantic information including thesauruses, categories, ontologies and folksonomies. It is calculated with all textual information and semantic information, and stored in a newly proposed index structure called semantic-keyword dual index. Based on this uniform measurement, the search mechanism is developed that fully utilizes existing keyword and semantic search mechanisms to enhance heterogeneous semantic search. Experiments show that the proposed approach can effectively integrate both keyword-based information and heterogeneous semantic information in search.