ISSN 1000-1239 CN 11-1777/TP

• 软件技术 •

### 一种扩展条件函数依赖的发现算法

1. (哈尔滨工业大学 哈尔滨 150001) (xianmliu@gmail.com)
• 出版日期: 2015-01-01
• 基金资助:
基金项目：国家“九七三”重点基础研究发展计划基金项目(2012CB316200)|国家自然科学基金青年基金项目(61003046)

### Discovering Extended Conditional Functional Dependencies

Liu Xianmin, Li Jianzhong

1. (Harbin Institute of Technology, Harbin 150001)
• Online: 2015-01-01

Abstract: eCFD (extended conditional functional dependency) is proposed as the extension of CFD (conditional functional dependency) for data cleaning. Compared with CFD, eCFD can take more patterns of values and catch more semantic information. However, there are only few works about eCFD. This paper focuses on the problem of eCFD discovering, whose counterpart of CFD has been studied very much. As we know, this paper is the first work about eCFD discovering. To avoid inconsistencies and remove redundancies, based on the definitions of strongly validated and weakly non-redundant eCFDs, formal definition of eCFD discovering problem is given and MeCFD method is proposed to solve this problem. MeCFD first generates all basic eCFDs which are weakly non-redundant and semantically equivalent to all strongly validated eCFDs, then constructs compound eCFDs through merging basic eCFDs. Searching candidate space in depth-first order makes MeCFD use only constant memory space to maintain data partitions. Efficient pruning strategies are proposed to improve the performance of MeCFD. Theoretical analysis shows the correctness of MeCFD. Experiments over real data sets show the good scalability of MeCFD and the effectiveness of pruning strategies and optimizing methods.