Abstract:
Hybrid bit-width block floating point (BFP) offers a flexible solution for low bit-width convolution computations, optimizing storage efficiency and computational precision. By assigning higher bit-widths to numerically sensitive layers while using lower bit-widths for redundant or stable regions, this approach preserves near-floating-point accuracy with substantially reduced computational and storage cost. Recent researches have deployed hardware solutions such as field programmable gate arrays (FPGAs) for hybrid bit-width BFP-based convolution accelerations, but they tend to underutilize FPGA resources by overlooking the full potential of digital signal processors (DSPs). Specifically, the underutilization of DSPs often leads to unnecessary resource waste and limited computational throughput, which restricts the overall performance of FPGA-based BFP convolution accelerators. This work develops a novel FPGA-based BFP convolution processing unit, termed “BE-HB”, capable of coupling two sets of BFP convolution calculations in dual-mode bit-width (i.e., 8 b or 16 b) using a single DSP for high performance. We then introduce a novel mapping method that reuses the shared exponents and private mantissas of BFP representations to perform two sets of BFP convolution computations within 8 b or 16 b DSP data paths. By leveraging the exponent sharing, data packing and data reuse, the proposed approach significantly reduces hardware resource overhead. Compared with representative baseline designs, the proposed design achieves an average reduction of 61.4% in LUT utilization while maintaining model accuracy, thereby delivering superior performance and resource efficiency, which makes it more suitable for resource-constrained FPGA-based edge computing platforms.