基于存储优化的多模式串匹配算法

刘燕兵; 刘  萍; 谭建龙; 郭  莉

基于存储优化的多模式串匹配算法

A Multiple String Matching Algorithm Based on Memory Optimization

摘要

摘要: 多模式串匹配算法是网络内容过滤系统的核心技术之一.自动机的存储空间大小和Cache性能是影响多模式串匹配算法速度的关键因素.随着模式串规模的扩大，自动机的巨大存储开销导致现有的串匹配算法性能大幅度下降.从压缩存储空间以提高Cache命中率的思想出发，提出了一种对经典SBOM算法的优化策略，它用Suffix Tree代替SBOM算法中的Factor Oracle结构，同时用剪枝的方法将Suffix Tree降低为近似线性的空间复杂度，然后用双数组Trie表示之,以压缩存储空间.与SBOM算法相比，改进算法不仅能够有效地节省存储空间，而且显著地提高了串匹配的速度，非常适合于在线高速匹配的应用环境.

Abstract: Multiple string matching algorithms play a fundamental role in many network security systems, such as intrusion detection and prevention systems, anti-virus systems, anti-spam systems, firewall, etc. It has been observed that the memory space usage and cache locality of automata are critical factors affecting multiple string matching algorithms searching speed. As the pattern set size grows larger and larger, classical multiple string matching algorithms suffer from great performance degradation because of the massive storage usage of string matching automata. The authors propose optimization strategies for the classical string matching algorithm SBOM to reduce its automata size and improve its cache locality, which results in a great promotion in searching speed. More specifically, the Factor Oracle of SBOM algorithm is first replaced with a suffix tree structure, and then the rarely accessed automata nodes are removed through the pruning method to reduce suffix tree to nearly linear space complexity, and finally the pruned suffix tree is represented with double-array trie structure to compress its memory space. Compared with SBOM, this algorithm can greatly reduce memory usage and improve searching speed. Experiments on random data sets show that the proposed algorithm uses memory less than 5% of SBOM and achieves 100% performance improvement over SBOM algorithm. The proposed algorithm is especially suitable for high-speed online pattern matching.

HTML全文

参考文献(0)

施引文献

资源附件(0)