ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (12): 2857-2865.doi: 10.7544/issn1000-1239.2015.20140685

Previous Articles     Next Articles

An Algorithm on Mining Approximate Functional Dependencies in Probabilistic Database

Miao Dongjing, Liu Xianmin, Li Jianzhong   

  1. (School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001)
  • Online:2015-12-01

Abstract: An approximate functional dependency (AFD) is a functional dependency almost hold, and the most existing works are only able to mine AFDs from general data. Sometimes, data is stored in probabilistic database, in order to mine AFDs from such type of data, we define the probabilistic AFD, namely (λ, δ)-AFD which is different from the previous definition. We propose a dynamic programming to compute the confidence probability of a candidate AFD and check if the confidence probability is more than the probability threshold, however, as the high time complexity of dynamic programming, we derive the lower bound based on Chernoff bound to prune candidates as much as possible. Then, under help of the anti-monotone property, we propose a mining algorithm based on lexicographical order and some pruning criterions to speed up the mining process. At last, experiments are performed on the synthetic and the real-life data sets, and the results show the effectiveness of the pruning criterions and the scalability of our mining algorithm, and we show the interesting results mined from DBLP data set.

Key words: approximate functional dependency (AFD), data mining, probabilistic database, data quality, inconsistency

CLC Number: