ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (2): 426-437.doi: 10.7544/issn1000-1239.2018.20160775

• 系统结构 • 上一篇    下一篇


马超1, 戴紫彬2, 李伟3, 南龙梅3, 金羽2   

  1. 1(国家高性能集成电路(上海)设计中心 上海 201204); 2(解放军信息工程大学 郑州 450001); 3(集成电路国家重点实验室(复旦大学) 上海 200433) (
  • 出版日期: 2018-02-01
  • 基金资助: 

RPRU: A Unified Architecture for Rotation and Bit-Extraction Operations in General-Propose Processor

Ma Chao1, Dai Zibin1, Li Wei2, Nan Longmei2, Jin Yu1   

  1. 1(National High Performance Integrated Circuit Design Center, Shanghai 201204); 2(PLA Information Engineering University, Zhengzhou 450001); 3(State Key Laboratory of ASIC and System (Fudan University), Shanghai 200433)
  • Online: 2018-02-01

摘要: 比特抽取与循环移位操作都可以利用位级置换完成.目前,它们在硬件实现时,大都采用分离的、各自独立的设计方式,这造成了硬件逻辑资源的浪费.尽管有些研究成果将它们统一设计,但是实现路由算法的电路却是独立的,逻辑资源消耗较多.因此,通过研究循环移位和比特抽取这2种比特级操作在多级动态互连网络Inverse Butterfly中的映射原理,并结合该网络的自路由和递归特性,提出了一种针对这2种操作的统一路由算法.该算法不仅具有较高的并行性,而且硬件实现简洁,利于处理器架构集成.在此基础上,构造了一种可重构比特抽取-移位硬件单元(reconfigurable parallel bit extraction-rotation hardware unit, RPRU),并对其关键路径电路进行了优化设计.然后,在CMOS 90nm工艺下完成了逻辑综合.实验结果表明:利用该路由算法所构造的硬件单元与以往同类设计相比,面积减少了近30%.

关键词: 比特抽取, 循环移位, 统一路由算法, 硬件单元, Inverse Butterfly网络

Abstract: Parallel bit extraction and rotation-shift operations can be completed by bit level permutation. At present, they are mainly implemented independently, which results in the waste of hardware logic resources. Although some of the researches unified the two operations into a single hardware unit, it was required to design two dedicated circuits to implement the routing algorithms for each operation. Consequently, the consumption of the logic resources is still high. To solve this problem, a unified routing algorithm is proposed by studying the mapping principle of rotation-shift and parallel bit extraction operations based on one kind of dynamic multistage interconnect network named Inverse Butterfly Network. The algorithm utilizes the self-routing and recursive characteristics of the network. It not only has high parallelism, but also is simple in hardware implementation, which is conductive to integration for the general-propose processor architecture. On this basis, we also develop a reconfigurable parallel bit extraction hardware unit with rotation-shift function named RPRU, and optimize the critical path of the unit. Then, we synthesize it into CMOS 90nm process. The experimental results show that the area of our RPRU using the unified algorithm is less by 30% than that of the previous design with identical functions.

Key words: parallel bit extraction, rotation-shift operations, unified routing algorithm, hardware unit, Inverse Butterfly network