高级检索

    一种大规模图数据上已知项搜索的优化方法

    An Optimization Approach of Known-Item Search on Large-Scale Graph Data

    • 摘要: 近年来,在社交网络、生物信息、软件工程、知识工程等领域,以图为天然组织结构的数据开始大量涌现,从而使得图数据的查询、搜索、挖掘等问题迅速成为研究热点.然而,由于图的计算复杂度高,现有的图数据关键词搜索方法的可伸缩性差,难以应用于大规模图数据.创新性地从对用户搜索意图的探索出发,探讨了可能存在的不同类型的图搜索及其优化潜力,提出了根据不同类型搜索的特点采用专门的优化策略的思想;并针对其中非常重要和常见的“已知项搜索”提出了一种启发式优化方法,利用图中局部拓扑信息构建索引,并使用MapReduce技术处理大规模图数据,实现在搜索前裁剪匹配顶点,以少量可能存在的top-k答案丢失为代价来显著缩减搜索空间.实验证明该方法能够极大地减少已知项搜索的响应时间.

       

      Abstract: Recently, a large amount of naturally graph-structured data are generated in various fields, such as social network, bioinformatics, software engineering, knowledge engineering, and etc. Consequently, querying, searching and mining such “graph data” have become the popular research topics. However, due to the extremely high computational complexity, existing keyword search approaches for graph data do not scale well on large graphs. This paper is motivated by a novel study of the user information needs of graph search. We discuss some possible types of graph search and their potentials of being optimized, and propose an idea that we should adopt customized optimization strategies according to the features of different types of graph search. Based on that, we propose an effective heuristic optimization approach for an important and usual particular type of search called “known-item search”. We construct an index to capture the local topology information in the graph. For large-scale graph data, we can use the MapReduce technique to handle the indexing task. By using the index, we can prune matched vertices before the search, and thereby significantly reduce the search space with a little possible loss of the top-k answers as trade-offs. Our experiments testify that the approach can decrease the response time of the known-item search dramatically.

       

    /

    返回文章
    返回