高级检索

    支持分布式存储删冗的相似文件元数据集合索引

    Index of Meta-Data Set of the Similar Files for Inline De-Duplication in Distributed Storage Systems

    • 摘要: 分布式存储技术因其良好的可扩展性、高性价比在当前云存储系统和企业存储中心得到广泛应用.在分布式存储系统中进行内嵌删冗可以有效降低系统存储开销,提高数据存取效率,实现内嵌删冗的关键在于高性能和可扩展的元数据索引方法.该方法应确保删冗操作不影响存储性能.通过分析影响索引性能的关键因素,提出一种分布式相似文件元数据集合索引的构建方法.该方法使用位置敏感Hash函数,将具有相同数据片的相似文件元数据组成集合并建立索引,使一个文件所有数据片元数据检索只需要访问一次外存,有效提高元数据检索效率.并且所生成的索引具有良好可扩展性和很小的内存开销,适合在采用分布式存储结构的云存储系统或者企业存储系统中进行应用.

       

      Abstract: Distributed storage systems have been widely adopted in the cloud storages and enterprise storage infrastructure, because of their high scalability and cost effectiveness. In the storage systems, data de-duplication can save most of storage space for the devices, and can improve the efficiency of data transmission. The key of de-duplicating in the distributed storage systems is how to implement a high performance and scalability meta-data index that should not hurt the writing throughput. This paper proposes an index of meta-data sets of the similar files. The index uses a locality sensitive Hashing function to organize meta-data set, and accesses the disk only one time for the lookups for the chunks of a file. Consequently, the index improves the indexing performance with high scalability and a small memory footprint, which is suitable for the cloud and enterprise storages.

       

    /

    返回文章
    返回