A Return Address Predictor Based on Persistent Stack

Tan Hongze; Wang Jian

doi:10.7544/issn1000-1239.202111274

Journal of Computer Research and Development > 2023 > 60(6): 1337-1345. > DOI: 10.7544/issn1000-1239.202111274

Tan Hongze, Wang Jian. A Return Address Predictor Based on Persistent Stack[J]. Journal of Computer Research and Development, 2023, 60(6): 1337-1345. DOI: 10.7544/issn1000-1239.202111274

Citation:

Tan Hongze, Wang Jian. A Return Address Predictor Based on Persistent Stack[J]. Journal of Computer Research and Development, 2023, 60(6): 1337-1345. DOI: 10.7544/issn1000-1239.202111274

Citation:

Tan Hongze, Wang Jian. A Return Address Predictor Based on Persistent Stack[J]. Journal of Computer Research and Development, 2023, 60(6): 1337-1345. DOI: 10.7544/issn1000-1239.202111274

PDF (1330 KB)

A Return Address Predictor Based on Persistent Stack

Tan Hongze,
Wang Jian^,

State Key Lab of Processors (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190
University of Chinese Academy of Sciences, Beijing 100049

Funds: This work was supported by the Strategic Priority Research Program of Chinese Academy of Sciences（XDC05020100）.

More Information

Author Bio:
Tan Hongze: born in 1996. PhD candidate. Student member of CCF. His main research interest includes computer architecture

Wang Jian: born in 1971. PhD, professor. Senior member of CCF. His main research interests include processor micro-architecture, software-hardware co-designed virtual machine, and operating system
Received Date: December 23, 2021
Revised Date: August 07, 2022
Available Online: February 26, 2023

Graphical Abstract

Abstract

Abstract

Branch prediction is an essential optimization for both the performance and power of modern processors, enabling instructions ahead of branches to be executed speculatively in parallel. Different from the general branch prediction, procedure return can be conquered with a return-address stack (RAS). By using a speculative emulation of the call stack according to the last-in-first-out rule for procedure calls and returns, the RAS predicts return addresses accurately. However, due to wrong-path corruptions under speculative execution of real processors, the RAS needs a repair mechanism to maintain the accuracy of the storage. Especially for embedded processors which are sensitive to the area, a careful trade-off between the accuracy and the overhead of repair mechanisms could be necessary. To address the redundancy of RAS storage, we introduce hybrid RAS, a return-address predictor based on a persistent stack. By integrating the classical stack, the persistent stack, and the backup prediction with the detection of overflows, our proposal could eliminate wrong-path corruptions and redundancies at the same time. As a result, the return misprediction rate is reduced effectively and efficiently. In addition, the classical stack is decoupled from the persistent stack to further optimize the area. With benchmarks from the SPEC CPU 2000 suite, the experiments show that our proposed RAS can reduce MPKI（mis-predictions per kilo instructions）to 2.4×10⁻³with a design area of only 1.1×10⁴ μm² under design compiler, whose misses are reduced by over 96% compared with the state-of-the-art RAS.
- return address prediction,
- speculative execution,
- corruption recovery,
- persistence,
- backup prediction

FullText(HTML)

References (23)

References

[1]	Garza E, Ajorpaz S M, Jimenez D A, et al. Bit-level perceptron prediction for indirect branches [C] // Proc of the 46th Annual Int Symp on Computer Architecture. New York: ACM, 2019: 27−38
[2]	Zangeneh S, Pruett S, Lym S, et al. BranchNet: A convolutional neural network to predict hard-to-predict branches [C] //Proc of the 53rd Annual Int Symp on Microarchitecture (MICRO). Piscataway, NJ: IEEE, 2020: 118−130
[3]	Adiga N, Bonanno J, Collura A, et al. The IBM z15 high frequency mainframe branch predictor industrial product [C] //Proc of the 47th Annual Int Symp on Computer Architecture. New York: ACM, 2020: 27−39
[4]	Kaeli D R, Emma P G. Branch history table prediction of moving target branches due to subroutine returns [C] //Proc of the 18th Annual Int Symp on Computer Architecture (ISCA). New York: ACM, 1991: 34−42
[5]	Desmet V, Sazeides Y, Kourouyiannis C, et al. Correct alignment of a return-address-stack after call and return mispredictions [C] //Proc of the 32nd Annual Int Symp on Computer Architecture (ISCA). New York: ACM, 2005: 25−33
[6]	Park Y, Lee G. Repairing return address stack for buffer overflow protection [C] //Proc of the 1st ACM Int Conf on Computing Frontiers. New York: ACM, 2004: 335−342
[7]	Waterman A, Asanovic K. The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213 [S/OL]. 2019 [2022-04-08]. https://riscv.org/technical/specifications/
[8]	Lampret D. OpenRISC 1000 Architecture Manual [S/OL]. OpenRISC Community, 2022 [2022-04-08]. https://openrisc.io/
[9]	芯片研发部. 龙芯架构参考手册, 卷1: 基础架构 [S/OL]. 北京: 龙芯中科技术股份有限公司, 2021 [2022-04-08]. https://www.loongson.cn/ R&D Department of Chip. The LoongArch Reference Manual, Volume I: Basic Architecuture [S/OL]. Beijing: Loongson Technology Corporation Limited, 2021[2022-04-08]. https://www.loongson.cn/(in Chinese)
[10]	Vandierendonck H, Seznec A. Speculative return address stack management revisited[J]. ACM Transactions on Architecture and Code Optimization, 2008, 5(3): 15:1−15:20
[11]	Jourdan S, Stark J, Hsing T, et al. Recovery requirements of branch prediction storage structures in the presence of mispredicted-path execution[J]. Internatinal Journal of Parallel Programming, 1997, 25(5): 363−383 doi: 10.1007/BF02699883
[12]	Skadron K, Ahuja P S, Martonosi M, et al. Improving prediction for procedure returns with return-address-stack repair mechanisms [C] // Proc of the 31st Annual Int Symp on Microarchitecture (MICRO). Piscataway, NJ: IEEE, 1998: 259−271
[13]	Sun Caixia, Zhang Minxuan. Dual-stack return address predictor [C] // Proc of the 1st Int Conf on Embedded Software and Systems (ICESS). Piscataway, NJ: IEEE, 2004: 172−179
[14]	Wang Guopeng, Hu Xiangdong, Zhu Ying, et al. Self-aligning return address stack [C] // Proc of the 7th IEEE Int Conf on Networking, Architecture and Storages (NAS). Piscataway, NJ: IEEE, 2012: 278−282
[15]	Aleph One. Smashing the stack for fun and profit [J/OL]. Phrack Magazine, 1996 [2022-04-09]. https://phrack.org/issues/49/14.html
[16]	Ye Dong, Kaeli D. A reliable return address stack: Microarchitectural features to defeat stack smashing[J]. SIGARCH Computer Architecture News, 2005, 33(1): 73−80 doi: 10.1145/1055626.1055637
[17]	Xu Jun, Kalbarczyk Z, Patel S, et al. Architecture support for defending against buffer overflow attacks, UILU-ENG-02-2205, CRHC-02-05[R/OL]. Urbana, America: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, 2002 [2022-04-08]. http://hdl.handle.net/2142/74493
[18]	Ozdoganoglu H, Vijaykumar T N, Brodley C E, et al. SmashGuard: A hardware solution to prevent security attacks on the function return address[J]. IEEE Transactions on Computers, 2006, 55(10): 1271−1285 doi: 10.1109/TC.2006.166
[19]	Alam M, Roy D B, Bhattacharya S, et al. SmashClean: A hardware level mitigation to stack smashing attacks in OpenRISC [C/OL] // Proc of the 14th ACM/IEEE Int Conf on Formal Methods and Models for System Design (MEMOCODE). Piscataway, NJ: IEEE, 2016 [2022-04-04]. https://ieeexplore.ieee.org/document/7797764
[20]	Bresch C, Hély D, Papadimitriou A, et al. Stack redundancy to thwart return oriented programming in embedded systems[J]. IEEE Embedded Systems Letters, 2018, 10(3): 87−90 doi: 10.1109/LES.2018.2819983
[21]	Li Jinfeng, Xu Qizhen, Li Yongyue, et al. Efficient return address verification based on dislocated stack[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(11): 3398−3407 doi: 10.1109/TCAD.2020.3012645
[22]	Driscoll J R, Sarnak N, Sleator D D, et al. Making data structures persistent[J]. Journal of Computer and System Sciences, 1989, 38(1): 86−124 doi: 10.1016/0022-0000(89)90034-2
[23]	吴瑞阳,汪文祥,王焕东,等. 龙芯GS464E处理器核架构设计[J]. 中国科学:信息科学,2015,45(4):480−500 doi: 10.1360/N112014-00292 Wu Ruiyang, Wang Wenxiang, Wang Huandong, et al. Design of Loongson GS464E processor architecture[J]. SCIENTIA SINICA Informationis, 2015, 45(4): 480−500 (in Chinese) doi: 10.1360/N112014-00292

[1]	Tian Xiao, Chang Jiyou, Zhang Chi, Rong Jingfeng, Wang Ziyu, Zhang Guanghua, Wang He, Wu Gaofei, Hu Jinglu, Zhang Yuqing. Survey of Open-Source Software Defect Prediction Method[J]. Journal of Computer Research and Development, 2023, 60(7): 1467-1488. DOI: 10.7544/issn1000-1239.202221046
[2]	Wang Yuqing, Yang Qiusong, Li Mingshu. A Cache Replacement Policy Based on Instruction Flow Access Pattern Prediction[J]. Journal of Computer Research and Development, 2022, 59(1): 31-46. DOI: 10.7544/issn1000-1239.20200503
[3]	Han Shukai, Xiong Ziwei, Jiang Dejun, Xiong Jin. Rethinking Index Design Based on Persistent Memory Device[J]. Journal of Computer Research and Development, 2021, 58(2): 356-370. DOI: 10.7544/issn1000-1239.2021.20200394
[4]	Huang Rui, Zhang Hongqi, Chang Dexian. A Backup and Recovery Mechanism for Security Service Chain Fault in Network Function Virtualization Environment[J]. Journal of Computer Research and Development, 2018, 55(4): 768-781. DOI: 10.7544/issn1000-1239.2018.20170942
[5]	Lu Kezhong, Zhu Jinbin, Li Zhengmin, Sui Xiufeng. Design of RDD Persistence Method in Spark for SSDs[J]. Journal of Computer Research and Development, 2017, 54(6): 1381-1390. DOI: 10.7544/issn1000-1239.2017.20170108
[6]	Jia Ning, Yang Chun, Tong Dong, and Wang Keyi. Correlated Software Prediction for Indirect Branch in Dynamic Translation Systems[J]. Journal of Computer Research and Development, 2014, 51(3): 661-671.
[7]	Li Shisheng, Cheng Buqi, Li Xiaofeng, Sun Guangzhong, Chen Guoliang. JavaScript Typing System with Prediction[J]. Journal of Computer Research and Development, 2012, 49(2): 421-431.
[8]	Hu Qiaolin, Sun Yipin, and Su Jinshu. BAR-BGP: Achieving High Reliability Interdomain Routing Through Backup AS-Address Advertisement and Recovery Forwarding[J]. Journal of Computer Research and Development, 2011, 48(12): 2242-2252.
[9]	Zhu Ping, Yang Fumin, and Tu Gang. Real-Time Fault-Tolerant Scheduling for Distributed Systems Based on Improving Priority of Passive Backup[J]. Journal of Computer Research and Development, 2010, 47(11): 2003-2010.
[10]	Zhang Guangyan, Shu Jiwu, Xue Wei, and Zheng Weimin. A Persistent Out-of-Band Virtualization System[J]. Journal of Computer Research and Development, 2006, 43(10): 1842-1848.