• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zhao Ming, Luo Jizhou, Li Jianzhong, and Gao Hong. XCluster: A Cluster-Based Queriable Multi-Document XML Compression Method[J]. Journal of Computer Research and Development, 2010, 47(5): 804-814.
Citation: Zhao Ming, Luo Jizhou, Li Jianzhong, and Gao Hong. XCluster: A Cluster-Based Queriable Multi-Document XML Compression Method[J]. Journal of Computer Research and Development, 2010, 47(5): 804-814.

XCluster: A Cluster-Based Queriable Multi-Document XML Compression Method

More Information
  • Published Date: May 14, 2010
  • XML is the de facto standard for data exchange and data storage in network applications. The main problem in the management of XML data is the redundancy caused by its mingling structure and data, which causes high costs in storing, exchanging and processing of XML data. Data compression techniques can be used to reduce such redundancy. However, most of the existing XML compression methods only try to reduce the redundancy in each single XML document, while ignoring the redundancy among XML documents. Presented in this paper, is a new XML compression method XCluster, which utilizes the similarity among XML documents. Queries can be evaluated on the compressed XML documents generated by XCluster directly. XCluster uses the improved pq-gram approximate distance between root-ordered tag trees to cluster the input XML documents hierarchically first. Then it compresses the structures in each clustered subset of XML documents by obtaining a representative structure through merging operation. Finally, it puts data of nodes with same tags into same buckets and encodes data in each bucket with a suitable algorithm according to the type of data. Extensive experiments on both real datasets and synthetic datasets show that XClutster outperforms XGrind and XQilla in both compression ratio and efficiency of query processing.
  • Related Articles

    [1]Ren Jiadong, Liu Xinqian, Wang Qian, He Haitao, Zhao Xiaolin. An Multi-Level Intrusion Detection Method Based on KNN Outlier Detection and Random Forests[J]. Journal of Computer Research and Development, 2019, 56(3): 566-575. DOI: 10.7544/issn1000-1239.2019.20180063
    [2]Liu Lu, Zuo Wanli, Peng Tao. Tensor Representation Based Dynamic Outlier Detection Method in Heterogeneous Network[J]. Journal of Computer Research and Development, 2016, 53(8): 1729-1739. DOI: 10.7544/issn1000-1239.2016.20160178
    [3]Zhao Xingwang, Liang Jiye. An Attribute Weighted Clustering Algorithm for Mixed Data Based on Information Entropy[J]. Journal of Computer Research and Development, 2016, 53(5): 1018-1028. DOI: 10.7544/issn1000-1239.2016.20150131
    [4]Huang Tianqiang, Yu Yangqiang, Guo Gongde, Qin Xiaolin. Trajectory Outlier Detection Based on Semi-Supervised Technology[J]. Journal of Computer Research and Development, 2011, 48(11): 2074-2082.
    [5]Zhang Jing, Sun Zhihui, Yang Ming, Ni Weiwei, Yang Yidong. Fast Incremental Outlier Mining Algorithm Based on Grid and Capacity[J]. Journal of Computer Research and Development, 2011, 48(5): 823-830.
    [6]Yu Hao, Wang Bin, Xiao Gang, Yang Xiaochun. Distance-Based Outlier Detection on Uncertain Data[J]. Journal of Computer Research and Development, 2010, 47(3): 474-484.
    [7]Ni Weiwei, Chen Geng, Lu Jieping, Wu Yingjie, Sun Zhihui. Local Entropy Based Weighted Subspace Outlier Mining Algorithm[J]. Journal of Computer Research and Development, 2008, 45(7): 1189-1194.
    [8]Jin Yifu, Zhu Qingsheng, Xing Yongkang. An Algorithm for Clustering of Outliers Based on Key Attribute Subspace[J]. Journal of Computer Research and Development, 2007, 44(4): 651-659.
    [9]Ni Weiwei, Lu Jieping, Chen Geng, and Sun Zhihui. An Efficient Data Stream Outliers Detection Algorithm Based on k-Means Partitioning[J]. Journal of Computer Research and Development, 2006, 43(9): 1639-1643.
    [10]Yang Yidong, Sun Zhihui, Zhang Jing. Finding Outliers in Distributed Data Streams Based on Kernel Density Estimation[J]. Journal of Computer Research and Development, 2005, 42(9): 1498-1504.

Catalog

    Article views (702) PDF downloads (441) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return