ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (3): 681-690.doi: 10.7544/issn1000-1239.2015.20131255

• 信息安全 • 上一篇    下一篇

基于并行字符索引的多步长正则表达式匹配算法

丁麟轩1,黄昆2,张大方1   

  1. 1(湖南大学信息科学与工程学院 长沙 410082); 2(中国科学院计算技术研究所 北京 100190) (lxding@hnu.edu.cn)
  • 出版日期: 2015-03-01
  • 基金资助: 
    基金项目:国家“九七三”重点基础研究发展计划基金项目(2012CB315805);国家自然科学基金项目(61173167,61100171)

Multi-Stride Regular Expression Matching Using Parallel Character Index

Ding Linxuan1, Huang Kun2, Zhang Dafang1   

  1. 1(College of Computer Science and Electronics Engineering, Hunan University, Changsha 410082); 2(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190)
  • Online: 2015-03-01

摘要: 深度包检测(deep packet inspection, DPI)是网络入侵检测与防御系统(network intrusion dete-ction and prevention system, NIDPS)的核心.基于三态内容可寻址存储器(ternary content addressable memory, TCAM)的正则表达式匹配算法提高了数据包的处理速度,成为DPI技术的一个重要研究方向.TCAM具有查找速度快、存储空间小等特性,且能耗与存储空间成正比.由于DFA的存储空间开销比较大,且存储空间大小随着DFA步长数的增加而指数倍增,基于TCAM的DFA面临高能耗的问题,特别是多步长DFA.提出一种基于并行字符索引的多步长正则表达式匹配算法(multi-stride parallel character-indexed DFA, PCIDFA),对确定型有限自动机(deterministic finite automaton, DFA)构造并行字符索引,通过比特位图取交集,减少匹配时激活的TCAM块数,显著降低TCAM能耗.实验结果表明:与多步长DFA相比,多步长PCIDFA在TCAM能耗上减少了99.8%以上,在TCAM存储空间开销上减少了48.5%~65.3%,在吞吐量上提高了1.9~2.6倍.

关键词: 正则表达式匹配, 三态内容可寻址存储器, 并行字符索引, 分块存储, 低能耗

Abstract: Deep packet inspection (DPI) is a key function of network intrusion detection and prevention systems (NIDPS). TCAM-based regular expression matching algorithms have been proposed as a promising approach to improve processing speed, which is an important research direction of DPI. Ternary content addressable memory (TCAM) has the characters of high searching speed and small storage space, as well as the TCAM power consumption is proportionate to its storage space. Deterministic finite automaton (DFA) requires large storage space and the storage space of multi-stride DFA grows exponentially with the stride of DFA, which leads to high TCAM power consumption of DFA, especially for multi-stride DFA. This paper presents a parallel character-indexed multi-stride regular expression matching algorithm to address such limitation. This algorithm uses the idea of building parallel character indexes according to the stride of DFA, and reduces the number of activated TCAM blocks by using bitmap intersection, which in turn translates low TCAM power. Experimental results show that our algorithm reduces the TCAM power by more than 99.8% as well as the TCAM space usage by 48.5%~65.3%, and improves the matching throughput by 1.9~2.6 times compared with previous solutions based on multi-stride DFA.

Key words: regular expression matching, ternary content addressable memory (TCAM), parallel character index, block-based storage, low power

中图分类号: