ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (2): 282-294.doi: 10.7544/issn1000-1239.2015.20140229

Special Issue: 2015大数据管理

Previous Articles     Next Articles

Functional Dependencies Discovering in Distributed Big Data

Li Weibang, Li Zhanhuai, Chen Qun, Jiang Tao, Liu Hailong, Pan Wei   

  1. (College of Computer Science, Northwestern Polytechnical University, Xi’an 710072)
  • Online:2015-02-01

Abstract: Discovering functional dependencies (FDs) from relational databases is an important database analysis technique, which has a wide range of applications in knowledge discovery, database semantics analysis, data quality assessment and database design. Existing functional dependencies discovery algorithms are mainly applied in centralized data, and are suitable to the case of small data size only. However, it is far more challenging to discover functional dependencies in distributed databases, especially with big data. In this paper, we propose a novel functional dependencies discovering approach in distributed big data. Firstly we execute functional dependencies discovering algorithm in parallel in each node, then prune the candidate set of functional dependencies based on the results of discovery. Secondly we group the candidate set of functional dependencies according to the features of candidate functional dependencies’ left hand side, and execute functional dependencies discovery algorithm based on each candidate set in parallel, and get all the functional dependency eventually. We analyze the number of candidate functions with regard to different groups, and data shipment and load balance are taken into account when discovering functional dependencies. Experiments on real-world big datasets demonstrate that compared with previous discovering methods, our approach is more effective in efficiency.

Key words: discovering functional dependencies, functional dependencies, big data, knowledge discovery, parallel computing

CLC Number: