Citation: | Lü Qianru, Xu Jinwei, Jiang Jingfei, Li Dongsheng. DAQ: Divide-and-Conquer Strategy Based Adaptive Low-Bit Quantization Method for Vision Transformer[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550145 |
Vision Transformers (ViTs) have demonstrated remarkable success in computer vision tasks, but their complex architecture and computational demands hinder deployment on edge devices. While post-training quantization (PTQ) is widely adopted for model compression, existing PTQ methods exhibit severe performance degradation in 4-bit ultra-low-bitwidth scenarios. This work systematically addresses two fundamental limitations: 1) spatial mismatch between quantization-sensitive layers (e.g., Softmax) and compute-intensive layers (e.g., linear projections), where quantizing Softmax causes 80% accuracy loss despite contributing merely 8% computational load; 2) non-Gaussian activation distributions with hidden Gaussian-like clustering properties (97% values less than three times z-score). We propose DAQ (divide-and-conquer and adaptive quantization), a hardware-friendly PTQ method. DAQ adpots z-score-driven dynamic partitioning algorithm to separate data normal-range and abnormal-range groups and quantizes the two groups with connected parameter. DAQ further explores hardware accelerated kernel such as tensor core to speed up quantization ViT models. Experimental results demonstrate that DAQ achieves a maximum improvement of 4.37% in ImageNet Top-1 accuracy under 4-bit quantization. In object detection tasks, its average error margin remains below 0.4% compared with the baseline and achieves a maximum improvement of 8.2%, even surpassing the full-precision model by 0.1% in specific cases, thereby realizing near-lossless low-bit-width quantization. Through hardware implementation optimization, DAQ achieves 43%~86% computational acceleration without significantly increasing computational overhead. This approach establishes a synergistic algorithm-hardware co-optimized quantization deployment paradigm for resource-constrained scenarios, effectively balancing model efficiency and precision retention.
[1] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30: 5998−6008
|
[2] |
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv preprint, arXiv: 2010.11929, 2020
|
[3] |
Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention[J]//arXiv: 2012.12877
|
[4] |
Liu Ze, Lin Yutong, Cao Yue, et al. Swin Transformer: Hierarchical vision Transformer using shifted windows[C]//Proc of the IEEE/CVF Inte Conf on Computer Vision. Piscataway, NJ: IEEE, 2021: 10012−10022
|
[5] |
He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778
|
[6] |
Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]//Proc of European Conference on Computer Vision. Berlin: Springer 2020: 213−229
|
[7] |
Cheng B, Schwing A, Kirillov A. Per-pixel classification is not all you need for semantic segmentation[J]. Advances in Neural Information Processing Systems, 2021, 34: 17864−1787
|
[8] |
Gündüç Y. Vit-GAN: Image-to-image translation with vision Transformes and conditional GANS[J]. arXiv preprint, arXiv: 2110.09305, 2021
|
[9] |
Hatamizadeh A, Song J, Liu G, et al. Diffit: Diffusion vision transformers for image generation[C]//Proc of European Conf on Computer Vision. Berlin: Springer, 2024: 37−55
|
[10] |
Lin Yang, Zhang Tianyu, Sun Peiqin, et al. Fq-ViT: Fully quantized vision Transformer without retraining[J]. arXiv preprint, arXiv: 2111.13824, 2021
|
[11] |
Yuan Zhihang, Xue Chenhao, Chen Yiqi, et al. PTQ4ViT: Post-training quantization for vision transformers with twin uniform quantization[C]//Proc of European Conf on Computer Vision. Berlin: Springer, 2022: 191−207
|
[12] |
Li Yanjing, Xu Sheng, Zhang Baochang, et al. Q-ViT: Accurate and fully quantized low-bit vision Transformer[J]. Advances in Neural Information Processing Systems, 2022, 35: 34451−34463
|
[13] |
Zhong Yunshan, Huang You, Hu Jiawei, et al. Towards accurate post-training quantization of vision transformers via error reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(4): 2676−2692
|
[14] |
Li Zhikai, Xiao Junrui, Yang Lianwei, et al. RepQ-ViT: Scale reparameterization for post-training quantization of vision transformers[C]//Proc of the IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2023: 17227−17236
|
[15] |
Li Zhikai, Liu Xuewen, Zhang Jing, et al. RepQuant: Towards accurate post-training quantization of large Transformer models via scale reparameterization[J]. arXiv preprint, arXiv: 2402.05628, 2024
|
[16] |
Du Dayou, Gong Gu, Chu Xiaowen. Model quantization and hardware acceleration for vision transformers: A comprehensive survey[J]. arXiv preprint, arXiv: 2405. 00314, 2024
|
[17] |
Tai Yushan, Lin Mingguang, Wu A. TSPTQ-ViT: Two-scaled post-training quantization for vision Transformer[C]//Proc of 2023 IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2023: 1−5
|
[18] |
Chen C F R, Fan Q, Panda R. Crossvit: Cross-attention multi-scale vision transformer for image classification[C]//Proc of the IEEE/CVF Int Conf on computer vision. Piscataway, NJ: IEEE, 2021: 357−366
|
[19] |
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84−90
|
[20] |
Zhu Xizhou, Su Weijie, Lu Lewei, et al. Deformable DETR: Deformable transformers for end-to-end object detection[J]. arXiv preprint, arXiv: 2010.04159, 2020
|
[21] |
Strudel R, Garcia R, Laptev I, et al. Segmenter: Transformer for semantic segmentation[C]//Proc of the IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2021: 7262−7272
|
[22] |
Zheng Sixiao, Lu Jiachen, Zhao Hengshuang, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 6881−6890
|
[23] |
Liu Yixin, Zhang Kai, Li Yuan, et al. Sora: A review on background, technology, limitations, and opportunities of large vision models[J]. arXiv preprint, arXiv: 2402.17177, 2024
|
[24] |
Dettmers T, Lewis M, Belkada Y, et al. GPT3. Int8 (): 8-bit matrix multiplication for transformers at scale[J]. Advances in Neural Information Processing Systems, 2022, 35: 30318−30332
|
[25] |
Touvron H, Lavril T, Izacard G, et al. Llama: Open and efficient foundation language models[J]. arXiv preprint, arXiv: 2302.13971, 2023
|
[26] |
Chan T F, Golub G H, LeVeque R J. Algorithms for computing the sample variance: Analysis and recommendations[J]. The American Statistician, 1983, 37(3): 242−247 doi: 10.1080/00031305.1983.10483115
|
[27] |
Nvidia. NVIDIA GPU Ampere Architecture Whitepaper, https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf
|
[28] |
Deng Jia, Dong Wei, Socher R, et al. ImageNet: A large-scale hierarchical image database[C]//Proc of 2009 IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2009: 248−255
|
[29] |
Lin Tsung-Yi, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context[C]//Proc of the 13th European Conf on Computer Vision (ECCV 2014). Berlin: Springer, 2014: 740−755
|
[30] |
Chen Kai, Wang Jiaqi, Pang Jiangmiao, et al. MMDetection: Open mmlab detection toolbox and benchmark[J]. arXiv preprint, arXiv: 1906.07155, 2019
|
[31] |
Frantar E, Ashkboos S, Hoefler T, et al. GPTQ: Accurate post-training quantization for generative pre-trained transformers[J]. arXiv preprint, arXiv: 2210.17323, 2022
|
[32] |
Nagel M, Amjad R A, Van Baalen M, et al. Up or down? adaptive rounding for post-training quantization[C]//Proc of Int Conf on Machine Learning. New York: ACM, 2020: 7197−7206
|
[33] |
Peebles W, Xie S. Scalable diffusion models with transformers[C]//Proc of the IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2023: 4195−4205
|
[34] |
Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840−6851
|
[1] | Du Yuyue, Sun Ya’nan, Liu Wei. Petri Nets Based Recognition of Model Deviation Domains and Model Repair[J]. Journal of Computer Research and Development, 2016, 53(8): 1766-1780. DOI: 10.7544/issn1000-1239.2016.20160099 |
[2] | Zhu Jun, Guo Changguo, Wu Quanyuan. A Web Services Interaction Behavior-Environment Model Based on Generalized Stochastic Petri Nets[J]. Journal of Computer Research and Development, 2012, 49(11): 2450-2463. |
[3] | Sun Cong, Tang Liyong, Chen Zhong, Ma Jianfeng. Secure Information Flow in Java by Optimized Reachability Analysis of Weighted Pushdown System[J]. Journal of Computer Research and Development, 2012, 49(5): 901-912. |
[4] | Zhou Hang, Huang Zhiqiu, Zhu Yi, Xia Liang, Liu Linyuan. Real-Time Systems Contact Checking and Resolution Based on Time Petri Net[J]. Journal of Computer Research and Development, 2012, 49(2): 413-420. |
[5] | Men Peng and Duan Zhenhua. Extension of Model Checking Tool of Colored Petri Nets and Its Applications in Web Service Composition[J]. Journal of Computer Research and Development, 2009, 46(8): 1294-1303. |
[6] | Zhao Mingfeng, Song Wen, Yang Yixian. Confusion Detection Based on Petri-Net[J]. Journal of Computer Research and Development, 2008, 45(10): 1631-1637. |
[7] | Cui Huanqing and Wu Zhehui. Structural Properties of Parallel Program's Petri Net Model[J]. Journal of Computer Research and Development, 2007, 44(12): 2130-2135. |
[8] | Tang Da, Li Ye. Model Analysis of Supply Chain System Based on Color Stochastic Petri Net[J]. Journal of Computer Research and Development, 2007, 44(10): 1782-1789. |
[9] | Lao Songyang, Huang Guanglian, Alan F. Smeaton, Gareth J. F. Jones, Hyowon Lee. A Query Description Model of Soccer Video Based on BSU Composite Petri-Net[J]. Journal of Computer Research and Development, 2006, 43(1): 159-168. |
[10] | Li Botao and Luo Junzhou. Modeling and Analysis of Non-Repudiation Protocols by Using Petri Nets[J]. Journal of Computer Research and Development, 2005, 42(9): 1571-1577. |
1. |
黄蔚亮,李锦煊,余志文,蔡亚永,刘元. 确定性网络:架构、关键技术和应用. 重庆邮电大学学报(自然科学版). 2025(01): 1-16 .
![]() | |
2. |
姜旭艳,全巍,付文文,张小亮,孙志刚. OpenPlanner:一个开源的时间敏感网络流量规划器. 计算机研究与发展. 2025(05): 1307-1329 .
![]() | |
3. |
齐玉玲,黄涛,张军贤,贾焱鑫,徐龙,熊伟,朱海龙,彭开来. 基于时间敏感网络的列车通信网络研究及应用. 城市轨道交通研究. 2024(05): 184-189 .
![]() | |
4. |
何倩,郭雅楠,赵宝康,潘琪,王勇. 无等待与时隙映射复用结合的时间触发流调度方法. 通信学报. 2024(08): 192-204 .
![]() | |
5. |
郭若彤,许方敏,张恒升,赵成林. 基于循环排队转发的时间触发流量路由与调度优化方法. 微电子学与计算机. 2024(10): 55-63 .
![]() | |
6. |
薛强,吴梦,杨世标,屠礼彪,李伟,廖江. 嵌入式人工智能技术在IP网络的创新应用. 邮电设计技术. 2024(10): 66-72 .
![]() | |
7. |
张浩,郭偶凡,周飞飞,马涛,何迎利,姚苏滨. 基于分段帧复制和消除的时间敏感网络动态冗余机制研究. 计算机科学. 2024(S2): 750-756 .
![]() | |
8. |
罗峰,周杰,王子通,张晓先,孙志鹏. 基于多域冗余的车载时间敏感网络时间同步增强方法. 系统工程与电子技术. 2024(12): 4259-4268 .
![]() | |
9. |
王雪荣,唐政治,李银川,齐美玉,朱建波,张亮. 基于优化决策树的时延敏感流智能感知调度. 电信科学. 2023(04): 120-132 .
![]() | |
10. |
陆以勤,谢文静,王海瀚,陈卓星,程喆,潘伟锵,覃健诚. 面向时间敏感网络的安全感知调度方法. 华南理工大学学报(自然科学版). 2023(05): 1-12 .
![]() | |
11. |
朱渊,胡馨予,吴思远,黄蓉. 基于OMNeT++的5G-TSN调度算法综述. 西安邮电大学学报. 2023(01): 9-18 .
![]() | |
12. |
王家兴,杨思锦,庄雷,宋玉,阳鑫宇. 时间敏感网络中多目标在线混合流量调度算法. 计算机科学. 2023(07): 286-292 .
![]() | |
13. |
李维,梁巍,周策. 基于ACO算法的SDN网络流量调度优化研究. 自动化与仪器仪表. 2023(07): 42-46 .
![]() | |
14. |
吴昭祥,李文凯,袁亚洲,刘志新. 时间敏感网络中基于抢占式通道模型的资源调度算法研究. 移动通信. 2023(08): 67-73+97 .
![]() | |
15. |
刘美鹭,刘留,王凯,韩紫杰. 真空管高速飞行列车通信业务建模. 移动通信. 2023(08): 98-106 .
![]() | |
16. |
彭紫梅,寿国础,郭梦杰,刘雅琼,胡怡红. 时间敏感网络中的冗余机制研究综述. 电信科学. 2023(08): 29-42 .
![]() | |
17. |
胡文学,孙雷,王健全,朱渊,毕紫航. 基于网络演算的时间敏感网络时延上界分析模型研究. 自动化学报. 2023(11): 2297-2310 .
![]() | |
18. |
王新蕾,周敏,张涛. 时间敏感网络流量调度算法研究综述. 电讯技术. 2023(11): 1830-1838 .
![]() | |
19. |
段晓东,刘鹏,陆璐,孙滔,李志强. 确定性网络技术综述. 电信科学. 2023(11): 1-12 .
![]() | |
20. |
陆以勤,熊欣,王猛,覃健诚,潘伟锵. TSN中基于链路负载均衡的AVB流量带宽分配方法. 华南理工大学学报(自然科学版). 2023(11): 1-9 .
![]() | |
21. |
周阳,陈鸿龙,张雷. 时间敏感网络中的动态路由与调度联合优化算法. 物联网学报. 2023(04): 52-62 .
![]() | |
22. |
裴金川,胡宇翔,田乐,胡涛,李子勇. 联合路由规划的时间敏感网络流量调度方法. 通信学报. 2022(12): 54-65 .
![]() |