ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (3): 571-581.doi: 10.7544/issn1000-1239.2016.20150620

Previous Articles     Next Articles

A Compression-Based Approximate Query Method for Massive Incomplete Data

WangYan1,2,LiuGenghao1,WangJunlu1,SongBaoyan1   

  1. 1(School of Information, Liaoning University, Shenyang 110036); 2(School of Information Science and Engineering, Northeastern University, Shenyang 110819)
  • Online:2016-03-01

Abstract: With the explosive increase of data, incomplete data are widespread. Traditional methods of data repair will cause high processing cost for mass data, and cannot be fully restored. Thus the approximate querying on these huge amounts of incomplete data for meeting the given requirements attracted greater attention from academics. Therefore, this paper proposes an approximate query method for massive incomplete data based on compression. Tagging the missing attribute value field and finding out the frequent query conditions, this method compresses these data based on the statistical frequent query conditions, and establishes the corresponding indexes. According to the attribute partition rules, index files are compressed again in order to further save storage space. In the stage of query, this method uses encoding dictionary to make selection and projection operations on the index compression files for getting approximate query results of incomplete data in the end. Experimental results show that this method can quickly locate the position of incomplete data compression, improve the query efficiency, save the storage space, and ensure the integrity of the query results.

Key words: incomplete data, approximate query, data compression, index, encoding dictionary

CLC Number: