高级检索
    张东站 苏志锋 林子雨 薛永生. 基于关系数据库的top-k聚合关键词查询[J]. 计算机研究与发展, 2014, 51(4): 918-929.
    引用本文: 张东站 苏志锋 林子雨 薛永生. 基于关系数据库的top-k聚合关键词查询[J]. 计算机研究与发展, 2014, 51(4): 918-929.
    Zhang Dongzhan, Su Zhifeng, Lin Ziyu, and Xue Yongsheng. top-k Aggregation Keyword Search over Relational Databases[J]. Journal of Computer Research and Development, 2014, 51(4): 918-929.
    Citation: Zhang Dongzhan, Su Zhifeng, Lin Ziyu, and Xue Yongsheng. top-k Aggregation Keyword Search over Relational Databases[J]. Journal of Computer Research and Development, 2014, 51(4): 918-929.

    基于关系数据库的top-k聚合关键词查询

    top-k Aggregation Keyword Search over Relational Databases

    • 摘要: 基于关系数据库的关键词查询,使得用户在不需要掌握结构化查询语言和数据库模式的情况下,可以方便地进行关系数据库查询.给定一个关键词查询,已有的方法通过数据库中的主外键关联,查询得到包含关键词的元组集合.但是,在很多实际应用中,元组集合的聚合结果对用户更有价值;研究了基于关系数据库的top-k聚合关键词查询,提出了基于递归的聚合单元枚举算法——基于递归的完全搜索(recursion-based full search, RFS).为了获得更好的查询性能,设计了新的排序方法、二维索引和快速搜索算法——基于输出的快速搜索(output-based quick search, OQS),从而可以高效地枚举top-k个聚合单元;在不同的数据集上进行了大量的实验,实验结果表明OQS算法具有良好的查询性能.

       

      Abstract: Structured query language (SQL) is a classical approach to performing query over relational databases. However, it is difficult to search information for ordinary users who are unfamiliar with the underlying schema of the database and SQL. While keyword search technology used in information retrieval (IR) systems allows users to just simply input a set of keywords to get the required results. Therefore, it is desirable to integrate DB and IR, which allows users to search relational databases without any knowledge of database schema and query languages. Given a keyword query, the existing approaches find individual tuples which match a set of query keywords based on primary-foreign-key relationships in databases. However, it is more useful for users to get the aggregation result of tuples in many real applications, and those existing methods cannot be used to deal with such issue. Therefore, this paper focuses on the problem of top-k aggregation keyword search over relational databases. Here recursion-based full search algorithm, i.e., RFS, is proposed to get all aggregation cells. To achieve high performance, new ranking techniques, keyword-tuple-based two dimensional index and quick search algorithm, i.e., OQS, are developed for effectively identifying top-k aggregation cells. A large number of experiments have been implemented upon two large real datasets, and the experimental results show the benefits of our approach.

       

    /

    返回文章
    返回