Index of Meta-Data Set of the Similar Files for Inline De-Duplication in Distributed Storage Systems
-
Graphical Abstract
-
Abstract
Distributed storage systems have been widely adopted in the cloud storages and enterprise storage infrastructure, because of their high scalability and cost effectiveness. In the storage systems, data de-duplication can save most of storage space for the devices, and can improve the efficiency of data transmission. The key of de-duplicating in the distributed storage systems is how to implement a high performance and scalability meta-data index that should not hurt the writing throughput. This paper proposes an index of meta-data sets of the similar files. The index uses a locality sensitive Hashing function to organize meta-data set, and accesses the disk only one time for the lookups for the chunks of a file. Consequently, the index improves the indexing performance with high scalability and a small memory footprint, which is suitable for the cloud and enterprise storages.
-
-