• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Lü Qianru, Xu Jinwei, Jiang Jingfei, Li Dongsheng. DAQ: Divide-and-Conquer Strategy Based Adaptive Low-Bit Quantization Method for Vision Transformer[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550145
Citation: Lü Qianru, Xu Jinwei, Jiang Jingfei, Li Dongsheng. DAQ: Divide-and-Conquer Strategy Based Adaptive Low-Bit Quantization Method for Vision Transformer[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550145

DAQ: Divide-and-Conquer Strategy Based Adaptive Low-Bit Quantization Method for Vision Transformer

Funds: 

undefined

This work was supported by the National Natural Science Foundation of China (62025208, 62421002, 62472435, 62172430), the Science and Technology Innovation Program of Hunan Province (2022RC3065) , and the National Key Laboratory of Parallel and Distributed Computing Foundation ( 2023-JKWPDL-02).

More Information
  • Author Bio:

    Lü Qianru: born in 1993. PhD candidate. Her main research interests include computer architecture and artificial intelligence

    Xu Jinwei: born in 1990. PhD, assistant research fellow. Member of CCF. His main research interests include artificial intelligence and reconfigurable computing

    Jiang Jingfei: born in 1974. PhD, professor. Member of CCF. Her main research interests include reconfigurable computing, artificial intelligence, and computer architecture. (jingfeijiang@nudt.edu.cn

    Li Dongsheng: born in 1978. PhD, professor. Member of CCF. His main research interests include parallel computing, artificial intelligence, computer architecture.(dsli@nudt.edu.cn

  • Received Date: February 28, 2025
  • Revised Date: April 09, 2025
  • Available Online: April 15, 2025
  • Vision Transformers (ViTs) have demonstrated remarkable success in computer vision tasks, but their complex architecture and computational demands hinder deployment on edge devices. While post-training quantization (PTQ) is widely adopted for model compression, existing PTQ methods exhibit severe performance degradation in 4-bit ultra-low-bitwidth scenarios. This work systematically addresses two fundamental limitations: 1) spatial mismatch between quantization-sensitive layers (e.g., Softmax) and compute-intensive layers (e.g., linear projections), where quantizing Softmax causes 80% accuracy loss despite contributing merely 8% computational load; 2) non-Gaussian activation distributions with hidden Gaussian-like clustering properties (97% values less than three times z-score). We propose DAQ (divide-and-conquer and adaptive quantization), a hardware-friendly PTQ method. DAQ adpots z-score-driven dynamic partitioning algorithm to separate data into normal-range and abnormal-range groups and quantizes the two groups with connected parameter. DAQ further explores hardware accelerated kernel such as tensor core to speed up quantization ViT models. Experimental results demonstrate that DAQ achieves a maximum improvement of 4.37% in ImageNet Top-1 accuracy under 4-bit quantization. In object detection tasks, its average error margin remains below 0.4% compared with the baseline and achieves a maximum improvement of 8.2%, even surpassing the full-precision model by 0.1% in specific cases, thereby realizing near-lossless low-bit-width quantization. Through hardware implementation optimization, DAQ achieves 43%~86% computational acceleration without significantly increasing computational overhead. This approach establishes a synergistic algorithm-hardware co-optimized quantization deployment paradigm for resource-constrained scenarios, effectively balancing model efficiency and precision retention.

  • [1]
    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30: 5998−6008
    [2]
    Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv preprint, arXiv: 2010.11929, 2020
    [3]
    Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention[J]. arXiv: 2012.12877
    [4]
    Liu Ze, Lin Yutong, Cao Yue, et al. Swin Transformer: Hierarchical vision Transformer using shifted windows[C]//Proc of the IEEE/CVF Inte Conf on Computer Vision. Piscataway, NJ: IEEE, 2021: 10012−10022
    [5]
    He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778
    [6]
    Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]//Proc of European Conf on Computer Vision. Berlin: Springer 2020: 213−229
    [7]
    Cheng B, Schwing A, Kirillov A. Per-pixel classification is not all you need for semantic segmentation[J]. Advances in Neural Information Processing Systems, 2021, 34: 17864−1787
    [8]
    Gündüç Y. Vit-GAN: Image-to-image translation with vision Transformes and conditional GANS[J]. arXiv preprint, arXiv: 2110.09305, 2021
    [9]
    Hatamizadeh A, Song J, Liu G, et al. Diffit: Diffusion vision transformers for image generation[C]//Proc of European Conf on Computer Vision. Berlin: Springer, 2024: 37−55
    [10]
    Lin Yang, Zhang Tianyu, Sun Peiqin, et al. Fq-ViT: Fully quantized vision Transformer without retraining[J]. arXiv preprint, arXiv: 2111.13824, 2021
    [11]
    Yuan Zhihang, Xue Chenhao, Chen Yiqi, et al. PTQ4ViT: Post-training quantization for vision transformers with twin uniform quantization[C]//Proc of European Conf on Computer Vision. Berlin: Springer, 2022: 191−207
    [12]
    Li Yanjing, Xu Sheng, Zhang Baochang, et al. Q-ViT: Accurate and fully quantized low-bit vision Transformer[J]. Advances in Neural Information Processing Systems, 2022, 35: 34451−34463
    [13]
    Zhong Yunshan, Huang You, Hu Jiawei, et al. Towards accurate post-training quantization of vision transformers via error reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(4): 2676−2692
    [14]
    Li Zhikai, Xiao Junrui, Yang Lianwei, et al. RepQ-ViT: Scale reparameterization for post-training quantization of vision transformers[C]//Proc of the IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2023: 17227−17236
    [15]
    Li Zhikai, Liu Xuewen, Zhang Jing, et al. RepQuant: Towards accurate post-training quantization of large Transformer models via scale reparameterization[J]. arXiv preprint, arXiv: 2402.05628, 2024
    [16]
    Du Dayou, Gong Gu, Chu Xiaowen. Model quantization and hardware acceleration for vision Transformers: A comprehensive survey[J]. arXiv preprint, arXiv: 2405. 00314, 2024
    [17]
    Tai Yushan, Lin Mingguang, Wu A. TSPTQ-ViT: Two-scaled post-training quantization for vision Transformer[C]//Proc of 2023 IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2023: 1−5
    [18]
    Chen C F R, Fan Q, Panda R. Crossvit: Cross-attention multi-scale vision transformer for image classification[C]//Proc of the IEEE/CVF Int Conf on computer vision. Piscataway, NJ: IEEE, 2021: 357−366
    [19]
    Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84−90
    [20]
    Zhu Xizhou, Su Weijie, Lu Lewei, et al. Deformable DETR: Deformable transformers for end-to-end object detection[J]. arXiv preprint, arXiv: 2010.04159, 2020
    [21]
    Strudel R, Garcia R, Laptev I, et al. Segmenter: Transformer for semantic segmentation[C]//Proc of the IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2021: 7262−7272
    [22]
    Zheng Sixiao, Lu Jiachen, Zhao Hengshuang, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 6881−6890
    [23]
    Liu Yixin, Zhang Kai, Li Yuan, et al. Sora: A review on background, technology, limitations, and opportunities of large vision models[J]. arXiv preprint, arXiv: 2402.17177, 2024
    [24]
    Dettmers T, Lewis M, Belkada Y, et al. GPT3. Int8 (): 8-bit matrix multiplication for transformers at scale[J]. Advances in Neural Information Processing Systems, 2022, 35: 30318−30332
    [25]
    Touvron H, Lavril T, Izacard G, et al. Llama: Open and efficient foundation language models[J]. arXiv preprint, arXiv: 2302.13971, 2023
    [26]
    Chan T F, Golub G H, LeVeque R J. Algorithms for computing the sample variance: Analysis and recommendations[J]. The American Statistician, 1983, 37(3): 242−247 doi: 10.1080/00031305.1983.10483115
    [27]
    Nvidia. NVIDIA GPU Ampere Architecture Whitepaper, https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf
    [28]
    Deng Jia, Dong Wei, Socher R, et al. ImageNet: A large-scale hierarchical image database[C]//Proc of 2009 IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2009: 248−255
    [29]
    Lin Tsung-Yi, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context[C]//Proc of the 13th European Conf on Computer Vision (ECCV 2014). Berlin: Springer, 2014: 740−755
    [30]
    Chen Kai, Wang Jiaqi, Pang Jiangmiao, et al. MMDetection: Open mmlab detection toolbox and benchmark[J]. arXiv preprint, arXiv: 1906.07155, 2019
    [31]
    Frantar E, Ashkboos S, Hoefler T, et al. GPTQ: Accurate post-training quantization for generative pre-trained transformers[J]. arXiv preprint, arXiv: 2210.17323, 2022
    [32]
    Nagel M, Amjad R A, Van Baalen M, et al. Up or down? adaptive rounding for post-training quantization[C]//Proc of Int Conf on Machine Learning. New York: ACM, 2020: 7197−7206
    [33]
    Peebles W, Xie S. Scalable diffusion models with transformers[C]//Proc of the IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2023: 4195−4205
    [34]
    Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840−6851
  • Related Articles

    [1]Yang Lihua, Dong Yong, Wu Huijun, Tan Zhipeng, Wang Fang, Lu Kai. Survey of Log-Structured File Systems in Mobile Devices[J]. Journal of Computer Research and Development, 2025, 62(1): 58-74. DOI: 10.7544/issn1000-1239.202330789
    [2]Chen Huimin, Jin Sichen, Lin Wei, Zhu Zeyu, Tong Lingbo, Liu Yipeng, Ye Yining, Jiang Weihan, Liu Zhiyuan, Sun Maosong, Jin Jianbin. Quantitative Analysis on the Communication of COVID-19 Related Social Media Rumors[J]. Journal of Computer Research and Development, 2021, 58(7): 1366-1384. DOI: 10.7544/issn1000-1239.2021.20200818
    [3]Guo Hongyi, Liu Gongshen, Su Bo, Meng Kui. Collaborative Filtering Recommendation Algorithm Combining Community Structure and Interest Clusters[J]. Journal of Computer Research and Development, 2016, 53(8): 1664-1672. DOI: 10.7544/issn1000-1239.2016.20160175
    [4]Wang Di, Zhao Tianlei, Tang Yuxing, Dou Qiang. A Communication Feature-Oriented 3D NoC Architecture Design[J]. Journal of Computer Research and Development, 2014, 51(9): 1971-1979. DOI: 10.7544/issn1000-1239.2014.20130131
    [5]Chen Ping, Xing Xiao, Xin Zhi, Wang Yi, Mao Bing, and Xie Li. Protecting Programs Based on Randomizing the Encapsulated Structure[J]. Journal of Computer Research and Development, 2011, 48(12): 2227-2234.
    [6]Li Shaofang, Hu Shanli, Shi Chunyi. An Anytime Coalition Structure Generation Based on the Grouping Idea of Cardinality Structure[J]. Journal of Computer Research and Development, 2011, 48(11): 2047-2054.
    [7]Liu Jinglei, Zhang Wei, Liu Zhaowei, and Sun Xuejiao. Properties and Application of Coalition Structure Graph[J]. Journal of Computer Research and Development, 2011, 48(4): 602-609.
    [8]Su Shexiong, Hu Shanli, Zheng Shengfu, Lin Chaofeng, and Luo Jianbin. An Anytime Coalition Structure Generation Algorithm Based on Cardinality Structure[J]. Journal of Computer Research and Development, 2008, 45(10): 1756.
    [9]Cao Yafei, Wang Dawei, and Li Sikun. A Novel System-Level Communication Synthesis Methodology Containing Crossbar Bus and Shared Bus[J]. Journal of Computer Research and Development, 2008, 45(8): 1439-1445.
    [10]Zheng Zhirong, Cai Yi, and Shen Changxiang. Research on an Application Class Communication Security Model on Operating System Security Framework[J]. Journal of Computer Research and Development, 2005, 42(2): 322-328.
  • Cited by

    Periodical cited type(5)

    1. 何业锋,刘闪闪,刘妍,权家辉,田哲铭,杨梦玫,李智. 支持虚拟车辆辅助假名更新的混合区位置隐私保护方案. 计算机应用研究. 2024(01): 272-276 .
    2. 况博裕,李雨泽,顾芳铭,苏铓,付安民. 车联网安全研究综述:威胁、对策与未来展望. 计算机研究与发展. 2023(10): 2304-2321 . 本站查看
    3. 王佳星,周武源,李甜甜. 人工智能发展态势的文献计量分析与研究. 小型微型计算机系统. 2023(11): 2424-2433 .
    4. 张迪,曹利,李原帅. 车联网环境下基于多策略访问树的安全访问控制算法. 计算机应用研究. 2023(11): 3394-3401 .
    5. 邓雨康,张磊,李晶. 车联网隐私保护研究综述. 计算机应用研究. 2022(10): 2891-2906 .

    Other cited types(2)

Catalog

    Article views (27) PDF downloads (10) Cited by(7)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return