高级检索

    混洗SRAM:SRAM中的并行按位数据混洗

    Shuffle-SRAM: In-SRAM Parallel Bitwise Data Shuffle

    • 摘要: 向量处理单元(vector processing unit, VPU)已被广泛应用于神经网络、信号处理和高性能计算等处理器设计中,但其总体性能仍受限于专门用于对齐数据的混洗操作. 传统上,处理器使用其数据混洗单元来处理混洗操作. 然而,使用数据混洗单元来处理混洗指令将带来昂贵的数据移动开销,并且数据混洗单元只能串行混洗数据. 事实上,混洗操作只会改变数据的布局,理想情况下混洗操作应在内存中完成. 随着存内计算技术的发展,SRAM不仅可以作为存储部件,同时还能作为计算单元. 为了实现存内混洗,提出了混洗SRAM,它可以在SRAM体中逐位地并行混洗多个向量. 混洗SRAM的关键思想是利用SRAM体中位线的数据移动能力来改变数据的布局. 这样SRAM体中位于同一位线上不同数据的相同位可以同时被移动,从而使混洗操作拥有高度的并行性. 通过适当的数据布局和向量混洗扩展指令的支持,混洗SRAM可以高效地处理常用的混洗操作. 评测结果表明,对于常用的混洗操作,混洗SRAM可以实现平均28倍的性能增益,对于FFT,AlexNet,VggNet等实际的应用,可以实现平均3.18倍的性能增益. 混洗SRAM相较于传统SRAM的面积开销仅增加了4.4%.

       

      Abstract: While vector processing unit is widely employed in processors for neural networks, signal processing, and high performance computing, it suffers from expensive shuffle operations dedicated to data alignment. Traditionally, processors handle shuffle operations with its data shuffle unit. However, data shuffle unit will introduce expensive overhead of data movement and only can shuffle data in serial. In fact, shuffle operations only change the layout of data and ideally should be done entirely within memory. Nowadays, SRAM is no longer just a storage component, but also as a computing unit. To this end, we propose Shuffle-SRAM in this paper, shuffle-SRAM can shuffle multiple data elements simultaneously bit by bit within an SRAM bank. The key idea is to exploit the bit-line wise data movement ability in SRAM so as to shuffle multiple data in parallel, where all the bits of different data elements on the same bit-line of SRAM can be shuffled simultaneously, achieving a high level of parallelism. Through suitable data layout preparation and the vector shuffle extension instructions, Shuffle-SRAM efficiently supports a wide range of commonly used shuffle operations efficiently. Our evaluation results show that Shuffle-SRAM can achieve a performance gain of 28 times for commonly used shuffle operations and 3.18 times for real world applications including FFT, AlexNet, and VggNet. The SRAM area overhead only increases by 4.4%.

       

    /

    返回文章
    返回