Abstract:
With the size of graph data increasing explosively, the form of graph data is much more complicated. Heterogeneous information networks can be modeled as graphs, which contain multiple types of nodes and multiple types of edges, e.g., bibliographic database, online shopping website and knowledge graphs. Reachability query in heterogeneous information networks is investigated in this paper. By using different types of relationships between nodes, the reachability of nodes is queried while satisfying specific path schema. This problem is polynomial. However, the time costing can’t be tolerant by scanning the big network for answering one query. The existing reachability work can be classified into two categories: K-hop reachability query and label-constraint reachability query. But these techniques can’t be used for processing path-based reachability query in heterogeneous information networks. Therefore, in order to respond online queries efficiently, a novel index structure is proposed, which decomposes path schemas and precomputes the reachability of nodes in sub-path schemas. Online query is computed efficiently by decomposing the query path schema and using the reachability of the indexes. A path schema decomposition strategy is developed by searching the partial order graph of path schemas in order to minimize the query time. Experiments on real world and synthetic data demonstrate the effectiveness of algorithms for reachability query in heterogeneous information networks.