ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (2): 421-430.doi: 10.7544/issn1000-1239.2019.20170657

• 软件技术 • 上一篇    下一篇

基于动态二进制翻译和插桩的函数调用跟踪

卢帅兵,张明,林哲超,李虎,况晓辉,赵刚   

  1. (信息系统安全技术国家重点实验室(军事科学院) 北京 100101) (datadancer@163.com)
  • 出版日期: 2019-02-01

Dynamic Binary Translation and Instrumentation Based Function Call Tracing

Lu Shuaibing, Zhang Ming, Lin Zhechao, Li Hu, Kuang Xiaohui, Zhao Gang   

  1. (National Key Laboratory of Science and Technology on Information System Security (Academy of Military Sciences), Beijing 100101)
  • Online: 2019-02-01

摘要: 动态函数调用跟踪技术是调试Linux内核的重要手段.针对现有动态跟踪工具存在支持平台有限、运行效率低的问题,基于二进制翻译,设计并实现支持多种指令集的动态函数调用跟踪工具.首先,使用二进制翻译进行系统加载、分析内核镜像,识别基本块的分支指令类型.然后,根据不同平台指令集,设计桩代码并在函数调用与返回指令翻译时插入桩指令,进而在程序执行和内核启动时实时获取时间戳、进程标识、线程标识、函数地址等信息.最后,内核加载完毕后,处理获取的信息,生成过程函数调用图.只需要根据平台指令集特点设计对应的信息获取桩代码并插入到函数调用指令翻译代码中,实现简单,易于移植支持多种平台.该方法基于二进制翻译,直接对程序或内核镜像中的指令段、代码段、符号表进行分析,不依赖源码.拓展的中间代码和额外的目标码,不影响基本块连接、冗余代码消除、热路径分析等二进制翻译的优化方法,降低了开销.基于QEMU的实验结果表明:跟踪分析结果与源代码行为一致,桩代码执行信息记录产生了15.24%的时间开销,而信息处理并输出到磁盘文件产生了165.59%的时间开销,与现有工具相比,性能有较大提升.

关键词: 动态二进制翻译, 代码插桩, 函数调用跟踪, Linux内核分析, 跨平台

Abstract: Dynamic function call tracing is one of the most important techniques for Linux kernel analysis. Existing tools suffer from the problems of insufficiently supporting instruction set architectures(ISA) and low efficiency. We design and implement a function call tracing tool to support multiple ISAs with high efficiency. Firstly, we use the binary translation system to load the kernel image and recognize the branch instruction types. Secondly, we design different instrumentation code based on different kinds of ISAs and insert instrumentation code during the translation stage to get timestamps, process IDs, thread IDs and function addresses during the kernel booting and runtime. Finally, when the kernel boots up and the shell appears, we process all the information and generate function call maps. Based on binary translation, we analyze the text, symbol and string sections of the binary image, without any source code. Enriched intermediate code and extra target code are compatible with optimization algorithms like block chain, redundant code elimination and hot path optimization, which reduces the performance overhead. The core algorithm is to design the instrumentation code and get corresponding information based on different ISAs. It is easy to implement and to migrate to multiple ISAs. Experiments on QEMU and Linux 4.9 kernel show that the traced information is accordance with the source code while instrumentation code brings about 15.24% and information processing generates 165.59% overhead of original QEMU, which is much faster than existing tools.

Key words: dynamic binary translation, instrumentation code, function call tracing, Linux kernel analysis, cross platform

中图分类号: