ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (7): 1577-1591.doi: 10.7544/issn1000-1239.2017.20160005

• 软件技术 • 上一篇    下一篇

Web数据库top-k多样性关键字查询推荐方法

孟祥福1,毕崇春1,张霄雁1,唐晓亮2,唐延欢1   

  1. 1(辽宁工程技术大学电子与信息工程学院 辽宁葫芦岛 125105);2(辽宁工程技术大学软件学院 辽宁葫芦岛 125105) (marxi@126.com)
  • 出版日期: 2017-07-01
  • 基金资助: 
    国家自然科学基金青年科学基金项目(61401185);辽宁省自然科学基金项目(20170540418);辽宁省教育厅科学技术研究项目(LJYL018)

Web Database top-k Diverse Keyword Query Suggestion Approach

Meng Xiangfu1, Bi Chongchun1, Zhang Xiaoyan1, Tang Xiaoliang2, Tang Yanhuan1   

  1. 1(School of Electronic and Information Engineering, Liaoning Technical University, Huludao, Liaoning 125105);2(School of Software, Liaoning Technical University, Huludao, Liaoning 125105)
  • Online: 2017-07-01

摘要: Web数据库用户通常使用他们熟知的关键字表达查询意图,这可能导致获取的结果不能很好满足其查询需求,因此为他们提供top-k个与初始查询语义相关且多样化的候选查询有助于用户扩展知识范围,从而更准确完善地表达其查询意图.提出一种top-k多样性关键字查询推荐方法.1)利用不同关键字在查询历史中的同现频率和关联关系评估关键字之间的内耦合和间耦合关系;2)根据关键字之间的耦合关系构建语义矩阵,进而利用语义矩阵和核函数方法评估不同关键字查询之间的语义相关度.为了快速返回top-k个与初始查询相关且多样性的候选查询,根据查询之间的语义相关度,利用概率密度函数分析查询的典型程度,并利用近似算法从查询历史中找出典型查询.对于所有的典型查询,从中选出少数代表性查询,根据其他典型查询与代表性查询之间的语义相关度,为每个代表性查询构建相应的查询序列;当一个新的查询到来时,评估其与代表性查询之间的语义相关度,然后利用阈值算法(threshold algorithm, TA)在预先创建的查询序列上快速选出top-k个与给定查询语义相关的多样性候选查询.实验结果和分析表明:提出的关键字之间耦合关系计算和查询之间的语义相关度评估方法具有较高准确性,top-k多样性选取方法具有较好效果和较高执行效率.

关键词: Web数据库, 多样性推荐, 耦合关系, 典型化分析, top-k选取

Abstract: Web database users often use the keywords that are familiar to them for expressing their query intentions and this may lead to unsatisfactory results due to the limitation of the users’ knowledge. Providing top-k diverse and relevant queries can broaden user knowledge scope and thus can help them to formulate more efficient queries. To address this problem, this paper proposes a top-k diverse keyword query suggestion approach. It first leverages frequency of co-occurrence and correlations between different keywords in query history to measure the intra-and inter-keyword couplings. And then, a semantic matrix, which reserves the coupling relationships between keywords, is generated. Based on the semantic matrix, the semantic similarities between keyword queries can be measured by using a kernel function. To quickly provide the top-k diverse and semantically related queries, this approach first finds the typical queries from query history by using the probabilistic density estimation method. After this, it finds the representative queries from the set of typical queries and then creates the orders for each representative query according to the similarities of remaining queries in the set of typical queries to the representative query. When a new query coming, the similarities between the given query and representative queries are computed, and then the top-k diverse and semantically related queries can be selected by using threshold algorithm (TA) over the orders of representative queries. The experimental results demonstrate that both the keyword coupling relationship and query semantic similarity measuring methods can achieve the high accuracy, and the effectiveness of top-k diverse query selection method is also demonstrated.

Key words: Web database, diverse suggestion, coupling relationship, typicality analysis, top-k selection

中图分类号: