高级检索

    一种基于流水线的重复数据删除系统读性能优化方法

    A Reading Performance Improvement Method in Deduplication Based on Pipeline

    • 摘要: 重复数据删除技术已逐渐应用到以云计算为代表的主存储系统中,这些系统对读响应时间的高要求使读性能成为重复数据删除系统中需要解决的重要问题,而已有研究对如何提高重复数据删除系统读性能关注很少.针对这一问题,对重复数据删除系统中读取流程和性能瓶颈进行了量化分析,提出了一种基于流水线的数据读取模型,然后通过并行计算机制对模型进行了进一步的优化.基于这一模型设计实现了实验系统,通过实验证明:对于网络安全监测日志文本数据和虚拟机镜像文件,应用此模型后,重复数据删除系统读速度的提高可达5倍以上;基于流水线的数据读取模型适用性强,对提高不同消冗率的数据读速度均有明显作用.

       

      Abstract: The application of data deduplication has been extended to the primary storage systems like cloud computing. In those systems,the reading performance has become a very important factor because of the high demand of reading response time. However, not so much attention has been paid to reading performance in the area of data deduplication. In this paper, we analyze the reading process and bottleneck in this area,and propose a reading model based on pipeline (RMBP). And we additionally improve this reading model using the mechanism of parallel calculation. Then we do theoretical analysis to evaluate its effect in the improvement of reading speed. Furthermore, we design a paralleled and pipelined data deduplication system based on this reading model. We also do experiments using three different kinds of data in this system. The experimental results show that: the system using RMBP can increase the reading speed with all kinds of the experimental data; for the network security logs and the virtual machine image data, the system using RMBP can get a 5 times higher reading throughput; RMBP can significantly improve the reading performance in scenarios of different data deduplicaiton ratio, and has good extensive applicability hence.

       

    /

    返回文章
    返回