高级检索

    高级AC自动机的快速构建方法

    Fast Building Method of Advanced AC Automaton

    • 摘要: 高级AC(advanced AC, AAC)是一种基于自动机的多模式串匹配算法,应用极为广泛.在大规模匹配时AAC自动机构建耗时较大.改进了经典精确单模式匹配算法——DFA算法自动机构建过程,并将其扩展到多模式匹配领域,提出Set DFA自动机,并证明Set DFA自动机和AAC自动机一致.该自动机构建方法简单清晰,无需计算失败函数,自动机内每个状态在生成后只需访问一次即可完成自动机构建.实验表明Set DFA构建时间只有AAC自动机的一半左右.

       

      Abstract: Advanced AC (AAC) is the most widely used multi-patterns string matching algorithm. The building cost of AAC automaton is high for large-scale matching. In this article, the automaton building method of DFA (a basic single pattern string matching algorithm) is improved and it is expanded into multi-patterns matching field. Thus, an automaton for multi-pattern string matching called Set DFA is proposed and it is proved to be same as AAC automaton. The building method of Set DFA can be described in two steps: 1)Building trie and setting all undefined trans of the initial state point to the initial state. 2)Accessing all states of the trie in level-order, and setting all undefined trans. Let the string along the path from the initial state to the accessed state be u\-1u\-2…u\-k, and let the trans character of the undefined trans be c, this undefined trans should be set to point to the result state after the string of u\-2…u\-kc being inputted from the initial state on existing automaton. This building method can be also improved to liner time building. The experimental results indicate that building Set DFA is about two times faster than building AAC automaton, because AC failure function is not used in buliding Set DFA and each state in Set DFA just is accessed once after being generated.

       

    /

    返回文章
    返回