• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Li Weibang, Li Zhanhuai, Chen Qun, Jiang Tao, Liu Hailong, Pan Wei. Functional Dependencies Discovering in Distributed Big Data[J]. Journal of Computer Research and Development, 2015, 52(2): 282-294. DOI: 10.7544/issn1000-1239.2015.20140229
Citation: Li Weibang, Li Zhanhuai, Chen Qun, Jiang Tao, Liu Hailong, Pan Wei. Functional Dependencies Discovering in Distributed Big Data[J]. Journal of Computer Research and Development, 2015, 52(2): 282-294. DOI: 10.7544/issn1000-1239.2015.20140229

Functional Dependencies Discovering in Distributed Big Data

More Information
  • Published Date: January 31, 2015
  • Discovering functional dependencies (FDs) from relational databases is an important database analysis technique, which has a wide range of applications in knowledge discovery, database semantics analysis, data quality assessment and database design. Existing functional dependencies discovery algorithms are mainly applied in centralized data, and are suitable to the case of small data size only. However, it is far more challenging to discover functional dependencies in distributed databases, especially with big data. In this paper, we propose a novel functional dependencies discovering approach in distributed big data. Firstly we execute functional dependencies discovering algorithm in parallel in each node, then prune the candidate set of functional dependencies based on the results of discovery. Secondly we group the candidate set of functional dependencies according to the features of candidate functional dependencies’ left hand side, and execute functional dependencies discovery algorithm based on each candidate set in parallel, and get all the functional dependency eventually. We analyze the number of candidate functions with regard to different groups, and data shipment and load balance are taken into account when discovering functional dependencies. Experiments on real-world big datasets demonstrate that compared with previous discovering methods, our approach is more effective in efficiency.
  • Related Articles

    [1]Xu Ying, Wang Mengdi, Cheng Long, Liu Lian, Zhao Shixin, Zhang Lei, Wang Ying. Pipe-RLHF: A Computation Mode-Aware Parallel Framework for RLHF[J]. Journal of Computer Research and Development, 2025, 62(6): 1513-1529. DOI: 10.7544/issn1000-1239.202550127
    [2]Xiong Huanliang, Zeng Guosun, Wu Canghai. A Novel Scalability Metric for Parallel Computing[J]. Journal of Computer Research and Development, 2014, 51(11): 2547-2558. DOI: 10.7544/issn1000-1239.2014.20130750
    [3]Zhang Aiqing, Mo Zeyao, Yang Zhang. Three-Level Hierarchical Software Architecture for Data-Driven Parallel Computing with Applications[J]. Journal of Computer Research and Development, 2014, 51(11): 2538-2546. DOI: 10.7544/issn1000-1239.2014.20131241
    [4]Wang Yongxian, Zhang Lilun, Liu Wei, Che Yonggang, Xu Chuanfu, and Wang Zhenghua. Grid Repartitioning Method of Multi-Block Structured Grid for Parallel CFD Simulation[J]. Journal of Computer Research and Development, 2013, 50(8): 1762-1768.
    [5]Cai Yong, Li Guangyao, and Wang Hu. Parallel Computing of Central Difference Explicit Finite Element Based on GPU General Computing Platform[J]. Journal of Computer Research and Development, 2013, 50(2): 412-419.
    [6]Zhang Shihui, Kong Lingfu, and Feng Liang. An Improved Hestenes SVD Method and Its Parallel Computing and Application in Parallel Robot[J]. Journal of Computer Research and Development, 2008, 45(4): 716-724.
    [7]Wang Nianbin, Song Yibo, Yao Nianmin, Liu Daxin. A Parallel Data Processing Middleware Based on Clusters[J]. Journal of Computer Research and Development, 2007, 44(10): 1702-1708.
    [8]Wu Xiangjun, Jin Zhiyan, Chen Dehui, Song Junqiang, Yang Xuesheng. A Parallel Computing Algorithm and Its Application in New Generation of Numerical Weather Prediction System (GRAPES)[J]. Journal of Computer Research and Development, 2007, 44(3).
    [9]Liu Jie, Chi Lihua, Hu Qingfeng, Li Xiaomei. An Improved TFQMR Algorithm for Large Linear Systems Suited to Parallel Computing[J]. Journal of Computer Research and Development, 2005, 42(7): 1235-1240.
    [10]Zhang Weimin, Zhu Xiaoqian, and Zhao Jun. Implementation of Phase Domain Decomposition Parallel Algorithm of Three-Dimensional Variational Data Assimilation[J]. Journal of Computer Research and Development, 2005, 42(6): 1059-1064.
  • Cited by

    Periodical cited type(3)

    1. 刘辉,李亮,李莉,卢钰新. 一种可回溯的缓存数据存储方法. 信息技术与信息化. 2023(08): 99-101+105 .
    2. 丁建立,李慧. 基于持久性内存的民航重复数据删除方法. 现代电子技术. 2022(10): 131-136 .
    3. 魏学亮,杨明顺,冯丹,刘景宁,吴兵,肖仁智,童薇. 面向安全持久性内存的元数据协同管理方法. 计算机研究与发展. 2022(11): 2437-2450 . 本站查看

    Other cited types(14)

Catalog

    Article views (1902) PDF downloads (1009) Cited by(17)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return