Abstract:
Integrating C/C++ with assembly language is a common practice in software development, particularly in operating system-level software. However, programming with inline assembly can be more error-prone due to its complex architecture-related details and the lack of compiler checks. Modern software engineering employs static analysis to detect bugs and security vulnerabilities. Nevertheless, analyzing inline assembly poses challenges beyond the capabilities of LLVM IR-based static analysis tools. If the effects of inline assembly on data flow and control flow are not addressed effectively, the result can be a rise in false positives and a decline in precision. This paper presents LASM, an innovative tool that translates x86 inline assembly code into LLVM IR while seamlessly integrating it with the IR generated from C/C++ source code. LASM enables downstream static analysis tools to perform whole program analysis on applications containing inline assembly code fragments, thereby reducing false positives and improving precision. It emulates hardware and the semantics of assembly instructions using LLVM IR. By providing optional optimizations that simplify the semantics of atomic instructions and enhance pointer arithmetic semantics, LASM supports static analysis tools more effectively. By entirely and accurately emulating assembly instructions, the LLVM IR lifted by LASM can be compiled into an executable and executed correctly. The strength of LASM lies in its ease of integration; LLVM IR-based static analysis tools can analyze inline assembly code by incorporating it seamlessly into their existing workflows without requiring additional modifications or adaptations to their implementations. LASM facilitates the reuse of many pre-existing tools in the ecosystem. Our experimental results indicate that LASM effectively supports SVF-based static analysis on the Linux Kernel. This analysis systematically categorizes global variables based on whether their access occurs during the initialization or post-initialization phase. The results show an impressive 62.03% reduction in misclassifications and an 8.06% increase in precision. LASM helped to identify 14.80% more protectable variables without introducing new misclassifications. These results highlight LASM's potential to enhance the effectiveness of static analysis in the presence of inline assembly code.