Advanced Search
    Lü Qianru, Xu Jinwei, Jiang Jingfei, Li Dongsheng. DAQ: Divide-and-Conquer Strategy Based Adaptive Low-Bit Quantization Method for Vision Transformer[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550145
    Citation: Lü Qianru, Xu Jinwei, Jiang Jingfei, Li Dongsheng. DAQ: Divide-and-Conquer Strategy Based Adaptive Low-Bit Quantization Method for Vision Transformer[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550145

    DAQ: Divide-and-Conquer Strategy Based Adaptive Low-Bit Quantization Method for Vision Transformer

    • Vision Transformers (ViTs) have demonstrated remarkable success in computer vision tasks, but their complex architecture and computational demands hinder deployment on edge devices. While post-training quantization (PTQ) is widely adopted for model compression, existing PTQ methods exhibit severe performance degradation in 4-bit ultra-low-bitwidth scenarios. This work systematically addresses two fundamental limitations: 1) spatial mismatch between quantization-sensitive layers (e.g., Softmax) and compute-intensive layers (e.g., linear projections), where quantizing Softmax causes 80% accuracy loss despite contributing merely 8% computational load; 2) non-Gaussian activation distributions with hidden Gaussian-like clustering properties (97% values less than three times z-score). We propose DAQ (divide-and-conquer and adaptive quantization), a hardware-friendly PTQ method. DAQ adpots z-score-driven dynamic partitioning algorithm to separate data normal-range and abnormal-range groups and quantizes the two groups with connected parameter. DAQ further explores hardware accelerated kernel such as tensor core to speed up quantization ViT models. Experimental results demonstrate that DAQ achieves a maximum improvement of 4.37% in ImageNet Top-1 accuracy under 4-bit quantization. In object detection tasks, its average error margin remains below 0.4% compared with the baseline and achieves a maximum improvement of 8.2%, even surpassing the full-precision model by 0.1% in specific cases, thereby realizing near-lossless low-bit-width quantization. Through hardware implementation optimization, DAQ achieves 43%~86% computational acceleration without significantly increasing computational overhead. This approach establishes a synergistic algorithm-hardware co-optimized quantization deployment paradigm for resource-constrained scenarios, effectively balancing model efficiency and precision retention.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return