ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (9): 1954-1964.doi: 10.7544/issn1000-1239.2015.20140686

• 软件技术 • 上一篇    下一篇



  1. (清华大学计算机科学与技术系 北京 100084) (
  • 出版日期: 2015-09-01
  • 基金资助: 

Semantic-Enhanced Spatial Keyword Search

Han Jun, Fan Ju, Zhou Lizhu   

  1. (Department of Computer Science and Technology, Tsinghua University, Beijing 100084)
  • Online: 2015-09-01

摘要: 空间关键词搜索立足于查找满足用户查询意图且空间距离相近的兴趣点(point of interest, POI),在地图搜索等领域有着广泛的应用.传统的空间关键词搜索方法仅考虑关键词与POI点在文本上的匹配程度,忽略了查询的语义信息,因而会导致相关结果丢失以及无关结果引入等问题.针对传统方法的局限,提出了语义增强的空间关键词搜索方法S3(semantic-enhanced spatial keyword search).该方法对查询关键词中包含的语义信息进行分析,并结合语义相关性和空间距离对POI点进行有效的排序.S3方法主要有以下2个技术挑战:1)如何对语义信息进行分析.为此,S3引入了知识库对POI数据进行语义扩充,并提出了一种基于图的语义距离度量方式.结合语义距离和空间距离,S3给出POI点的综合排序方案.2)如何在大规模数据上即时地返回top-k搜索结果.针对这一挑战,提出了一种新型的语义-空间混合索引结构GRTree(graph rectangle tree),并研究了有效的剪枝策略.在大规模真实数据集上的实验表明,S3不仅能够返回更为相关的结果,而且有着很好的效率和可扩展性.

关键词: 空间关键词搜索, 语义增强, 知识库, 语义距离, 即时搜索

Abstract: Spatial keyword search finds the points-of-interest (POIs) which are not only relevant to users’ query intent, but also close to query location. Spatial keyword search has many important applications, such as map search. Previous methods for spatial keyword search have the limitation that they only consider textual relevance of POIs to query keywords, and neglect the semantics of queries. So these methods may not be able to return relevant results or return many irrelevant results. To address this problem, this paper introduces a semantic-enhanced spatial keyword search method, named S3(semantic-enhanced spatial keyword search). Given a query, S3 analyzes the semantics of the query keywords to measure semantic distances of POIs to the query. Then, it utilizes a novel POI ranking mechanism by combining both semantic and spatial distance for effective POI search. S3 has the following challenges. Firstly, S3 introduces knowledge bases to help capture query semantics and introduces a ranking scoring function that considers both semantic distance and spatial distance. Secondly, it calls for instant search on large-scale POI data sets. To address this challenge, we devise a novel index structure GRTree, and develop some effective pruning techniques based on this structure. The extensive experiments on a real dataset show that S3 not only produces high-quality results, but also has good efficiency and scalability.

Key words: spatial keyword search, semantic enhancement, knowledge base, semantic distance, instant search