Web数据库top-k多样性关键字查询推荐方法

孟祥福; 毕崇春; 张霄雁; 唐晓亮; 唐延欢

doi:10.7544/issn1000-1239.2017.20160005

Web数据库top-k多样性关键字查询推荐方法

Web Database top-k Diverse Keyword Query Suggestion Approach

摘要

摘要: Web数据库用户通常使用他们熟知的关键字表达查询意图，这可能导致获取的结果不能很好满足其查询需求，因此为他们提供top-k个与初始查询语义相关且多样化的候选查询有助于用户扩展知识范围，从而更准确完善地表达其查询意图.提出一种top-k多样性关键字查询推荐方法.1)利用不同关键字在查询历史中的同现频率和关联关系评估关键字之间的内耦合和间耦合关系；2)根据关键字之间的耦合关系构建语义矩阵，进而利用语义矩阵和核函数方法评估不同关键字查询之间的语义相关度.为了快速返回top-k个与初始查询相关且多样性的候选查询，根据查询之间的语义相关度，利用概率密度函数分析查询的典型程度，并利用近似算法从查询历史中找出典型查询.对于所有的典型查询，从中选出少数代表性查询，根据其他典型查询与代表性查询之间的语义相关度，为每个代表性查询构建相应的查询序列；当一个新的查询到来时，评估其与代表性查询之间的语义相关度，然后利用阈值算法(threshold algorithm, TA)在预先创建的查询序列上快速选出top-k个与给定查询语义相关的多样性候选查询.实验结果和分析表明：提出的关键字之间耦合关系计算和查询之间的语义相关度评估方法具有较高准确性，top-k多样性选取方法具有较好效果和较高执行效率.

Abstract: Web database users often use the keywords that are familiar to them for expressing their query intentions and this may lead to unsatisfactory results due to the limitation of the users’ knowledge. Providing top-k diverse and relevant queries can broaden user knowledge scope and thus can help them to formulate more efficient queries. To address this problem, this paper proposes a top-k diverse keyword query suggestion approach. It first leverages frequency of co-occurrence and correlations between different keywords in query history to measure the intra-and inter-keyword couplings. And then, a semantic matrix, which reserves the coupling relationships between keywords, is generated. Based on the semantic matrix, the semantic similarities between keyword queries can be measured by using a kernel function. To quickly provide the top-k diverse and semantically related queries, this approach first finds the typical queries from query history by using the probabilistic density estimation method. After this, it finds the representative queries from the set of typical queries and then creates the orders for each representative query according to the similarities of remaining queries in the set of typical queries to the representative query. When a new query coming, the similarities between the given query and representative queries are computed, and then the top-k diverse and semantically related queries can be selected by using threshold algorithm (TA) over the orders of representative queries. The experimental results demonstrate that both the keyword coupling relationship and query semantic similarity measuring methods can achieve the high accuracy, and the effectiveness of top-k diverse query selection method is also demonstrated.

HTML全文

参考文献(0)

施引文献

资源附件(0)