ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (10): 2206-2215.doi: 10.7544/issn1000-1239.2014.20130827

• 软件技术 • 上一篇    下一篇



  1. 1(国防科学技术大学计算机学院 长沙 410073);2(装备指挥学院 北京 101416) (
  • 出版日期: 2014-10-01
  • 基金资助: 

An Algorithm for Top-k Query Refinement Based on User’s Feedback

Zhang Jianfeng1, Han Weihong1, Fan Hua1, Zou Peng2, Jia Yan1   

  1. 1(College of Computer, National University of Defense Technology, Changsha, 410073); 2(Academy of Equipment Command and Technology, Beijing, 101416)
  • Online: 2014-10-01

摘要: top-k查询主要用来从海量的数据中返回用户最为偏好的k个对象.目前已经有大量的研究工作致力于top-k查询中的性能研究,近年来针对top-k查询结果进行解释的研究逐渐得到了广泛的关注.在top-k查询中,由于用户不能精确地指定自己的偏好,因此针对top-k查询的结果用户可能产生这样的质疑:“既然连对象p都出现在top-k结果中,为什么我期望的对象m块没有出现在top-k结果?”针对用户这样的疑问,提出了一种基于用户反馈的top-k查询修改算法,该算法首先定义了用来衡量初始化top-k查询变化的评估模型函数,基于该评估模型函数,使用抽样方法得到候选权重集合,针对每一个候选权重通过渐进式top-k算法来得到新的最优化查询.最后在模拟数据上验证了提出算法的效率.

关键词: top-k查询, 用户疑问, 用户反馈, 偏好修正, 查询修改

Abstract: Top-k queries are mainly used to return the most important objects in the potentially huge answer space. After huge effort working on top-k query performance, an explain capability has attract more attentions in recent years. In top-k query, since users can not precisely specify their own preferences, he/she may feel frustrated with the top-k results and propose a question such that “why my expecting tuple m is not appeared in top-k results as long as tuple p has been appeared in top-k results”. To solve this problem, a novel method is proposed to answer user’s why-not question by modifying original top-k query. Firstly, an assessment model is defined to evaluate the difference between the original top-k query and the modified new query. Then based on the assessment model, a sample method is used to sample the candidate weight vectors from an accuracy subspace of weight space. After that, for each candidate weight vector, a progressive top-k algorithm is used to get the optimal new query, and the terminal conditions of the progressive top-k algorithm are improved by the assessment model. Furthermore, the new query with the best evaluated score is returned as the solution. Finally, an extensive performance study using synthetic data set is reported to verify the efficiency of the proposed algorithm.

Key words: top-k query, why-not question, user’s feedback, preference refinement, query refinement