高级检索

    一种面向纠删码的存储库优化方法

    Efficient Optimization of Erasure Coding for Storage Library

    • 摘要: 信息时代,数据存储的可靠性、一致性、安全性和实时性至关重要. 纠删码在允许多个存储设备发生故障的同时保证了最低的存储开销,被大量应用在数据存储领域. 纠删码的编码与解码运算具有计算密集的特征,其性能高低直接影响着存储系统的使用效率. 作为编码和解码运算中最耗时的部分,多层循环包裹的伽罗华域乘法计算是纠删码优化的一个焦点. 首先分析了伽罗华域乘法计算的查表方法中常用的log查表法、完全乘法查表法、移位分解法的优劣势,然后对已有的伽罗华域GF(28)查表方法进行了优化,提出4 b分割法以大幅减少查表开销. 在此基础上,利用64位现代处理器体系结构特点,从数据访问粒度扩展和单指令多数据(single instruction multiple data,SIMD)向量化利用实现数据级并行化2个角度优化了多层循环中的数据级访问粒度,提高了编码与解码的运算性能. 基于开源存储加速库(Intel storage acceleration library,ISA-L)在申威平台和x86平台上实现和验证了上述优化方法的有效性,结果表明:所提优化方法在不同数据规模下均有加速效果,申威平台与优化前相比平均性能加速比为3.28倍,x86平台与优化前相比平均性能加速比为2.36倍.

       

      Abstract: In the information stage, the importance of data storage lies in ensuring the reliability, consistency, security, and real-time accessibility of information. Erasure codes (EC) play a crucial role in data storage systems due to their ability to minimize storage overhead and handle multiple component failures. However, the process of encoding and decoding EC involves intensive computation, impacting storage system efficiency. This paper focuses on optimizing EC, with a special emphasis on the Galois field (GF) multiplication within multi-layer loops, a time-consuming aspect of EC. We first evaluate the pros and cons of three methods for GF multiplication calculation: the log table searching method, the complete multiplication table searching method, and the shift decomposition method. Subsequently, a 4 b splitting (SP) method is proposed to reduce memory access overhead during table searching in GF(28). We delve into the SP’s analysis and leverage the 64 b modern processor architecture and vector instruction set characteristics to introduce data-level parallelism in multi-layer loops. This involves amplifying data access granularity and implementing single instruction multiple data (SIMD) vectorization. Based on the open-source Intel storage acceleration library (ISA-L), all optimization methods are implemented and tested on the Sunway processor and the x86 processor. The experimental results show the effectiveness of proposed optimization in improving EC performance across different data scalability scenarios. When compared to the original ISA-L, our optimizations exhibit an average performance speedup of 3.28x on the Sunway, 2.36x on the x86.

       

    /

    返回文章
    返回