• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Wang Ning, Li Jie. Two-Tiered Correlation Clustering Method for Entity Resolution in Big Data[J]. Journal of Computer Research and Development, 2014, 51(9): 2108-2116. DOI: 10.7544/issn1000-1239.2014.20131345
Citation: Wang Ning, Li Jie. Two-Tiered Correlation Clustering Method for Entity Resolution in Big Data[J]. Journal of Computer Research and Development, 2014, 51(9): 2108-2116. DOI: 10.7544/issn1000-1239.2014.20131345

Two-Tiered Correlation Clustering Method for Entity Resolution in Big Data

More Information
  • Published Date: August 31, 2014
  • Volume, velocity, variety and veracity are four striking features of big data, which bring new challenges to data integration. Entity resolution is one of the most important steps in data integration. For big data, conventional entity resolution methods tend to be inefficient and ineffective in practice, especially on the noise immunity. In order to address the inconsistency issue of resolution results produced by the big data's four features, we introduce the concept of common neighborhood into the correlation clustering problem. Our top tier for pre-partition is designed based on the neighborhood, which can quickly and effectively complete the preliminary partition of blocks. The introduction of the concept of kernel gives a more precise definition of the correlation degree between a node and a cluster. As a consequence, our bottom tier for adjustment can accurately cluster nodes and improve the accuracy of the correlation clustering. Our two-tiered method for entity resolution is simple and efficient for the use of coarse similarity function. Meanwhile, our method achieves good performance on noise immunity with the introduction of the neighborhood. Extensive experiments demonstrate that the proposed two-tiered method achieves high accuracy and good noise immunity compared with those traditional methods, and is also scalable for big data.
  • Related Articles

    [1]Xue Zhihang, Xu Zheming, Lang Congyan, Feng Songhe, Wang Tao, Li Yidong. Text-to-Image Generation Method Based on Image-Text Semantic Consistency[J]. Journal of Computer Research and Development, 2023, 60(9): 2180-2190. DOI: 10.7544/issn1000-1239.202220416
    [2]Li Zituo, Sun Jianbin, Yang Kewei, Xiong Dehui. A Review of Adversarial Robustness Evaluation for Image Classification[J]. Journal of Computer Research and Development, 2022, 59(10): 2164-2189. DOI: 10.7544/issn1000-1239.20220507
    [3]Liang Dachuan, Li Jing, Liu Sai, Li Dongmin. Multiple Object Saliency Detection Based on Graph and Sparse Principal Component Analysis[J]. Journal of Computer Research and Development, 2018, 55(5): 1078-1089. DOI: 10.7544/issn1000-1239.2018.20160681
    [4]Ji Zhong, Nie Linhong. Texture Image Classification with Noise-Tolerant Local Binary Pattern[J]. Journal of Computer Research and Development, 2016, 53(5): 1128-1135. DOI: 10.7544/issn1000-1239.2016.20148320
    [5]Zhou Yu, He Jianjun, Gu Hong, Zhang Junxing. A Fast Partial Label Learning Algorithm Based on Max-loss Function[J]. Journal of Computer Research and Development, 2016, 53(5): 1053-1062. DOI: 10.7544/issn1000-1239.2016.20150267
    [6]Bai Xuefei, Wang Wenjian, Liang Jiye. An Active Contour Model Based on Region Saliency for Image Segmentation[J]. Journal of Computer Research and Development, 2012, 49(12): 2686-2695.
    [7]Dong Jie and Shen Guojie. Remote Sensing Image Classification Based on Fuzzy Associative Classification[J]. Journal of Computer Research and Development, 2012, 49(7): 1500-1506.
    [8]Zeng Dan, Chen Jian, Zhang Qi, and Shi Hao. Global Topology Based Image Stitching Using Hierarchical Triangulation[J]. Journal of Computer Research and Development, 2012, 49(1): 144-151.
    [9]Zhao Xudong, Liu Peng, Liu Jiafeng, and Tang Xianglong. Stationarity and Correlation Test of Image Sequences Based Classification on Scenes with Different Weather Conditions[J]. Journal of Computer Research and Development, 2011, 48(11): 1973-1982.
    [10]Qin Lei, Gao Wen. Scene Image Categorization Based on Content Correlation[J]. Journal of Computer Research and Development, 2009, 46(7): 1198-1205.

Catalog

    Article views (1344) PDF downloads (966) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return