Private Protocol Reverse Engineering Based on Network Traffic: A Survey
-
摘要:
协议逆向技术是分析私有协议的重要途径,基于少量或零先验知识推断私有协议的约束与规范.在恶意应用监管、协议模糊测试、脆弱性检测、通信行为理解等方面均具有较高的实用价值.网络流量表征协议规范,承载协议固有特征,因此基于网络流量的私有协议逆向技术更适用于发现、分析并监管网络上的私有协议.在梳理现有的基于网络流量的私有协议逆向技术基础上,首先提出包括预推理、协议格式推断、语义分析以及协议状态机推理4步骤的基于网络流量的私有协议逆向技术框架,并阐述各个步骤的主要任务,提出面向研究方法本质的分类结构;其次,详细阐述各个私有协议逆向技术的方法流程,从适用协议类型、方法内核、推断算法等多个角度进行对比分析,提供现有基于网络流量的私有协议逆向技术系统概述;最后,归纳总结现有技术存在的问题以及主要影响因素,并对私有协议逆向技术的未来研究方向与应用场景进行展望.
Abstract:Protocol reverse engineering is an important way to analyze private protocols, which can infer the protocol constraints and specifications with little or no prior knowledge, so protocol reverse engineering has practical value in malware supervision, protocol fuzz testing and vulnerability detection, interaction behavior understanding and so on. Network traffic characterizes protocol specifications and bears the inherent characteristics of protocol, so that the private protocol reverse engineering based on network traffic is more suitable for discovering, analyzing and monitoring the private protocol on the network. In this paper, we provide a thorough review of the existing private protocol reverse engineering based on network traffic: Firstly, the architecture of private protocol reverse engineering based on network traffic is proposed, which includes four steps of pre-inference, protocol format inference, semantic analysis, and protocol state machine inference. The main research tasks of each step are also elaborated and a classification structure oriented to the core of the research method is proposed. Secondly, the method and process of each private protocol reverse engineering are described in detail, and a comparative analysis from multiple perspectives of applicable protocol type, technology kernel, and inference algorithms etc is made. A systematic overview of existing private protocol reverse engineering based on network traffic is conducted. Finally, the shortcomings of existing research and main influencing factors are summarized, and the future research direction and application scenarios of private protocol reverse engineering are prospected.
-
-
表 1 协议报文聚类方法概述
Table 1 Summary of Protocol Packets Clustering
面向目标 文献 协议类型 相似度计算方法 聚类算法 对比分析 面向协议格式的报文聚类 文献[25,35] 文本类 基于最长公共子序列 长度度量 凝聚层次聚类算法 计算简单,引入最长公共子序列相似度计算方法,但相似度计算并不能反映整个报文结构. 文献[36] 通用 基于语义信息的改进序列比对算法 基于语义信息的改进UPGMA算法 考虑协议报文的语义信息较为准确,但语义信息采集较为复杂,语义信息为自定义. 文献[37] 通用 基于字段概率分布信息度量 UPGMA算法 创新性利用概率模型生成的字段分布信息计算报文相似度. 文献[38] 文本类 基于ProWord产生的报文分段点度量 粗糙集划分算法 创新性引入粗糙集划分算法,但相似度计算过分依赖ProWord算法分段点. 面向协议种类的报文聚类 文献[40] 通用 基于TFD相似度的改进序列对比算法 参数指导的DBSCAN
聚类算法实现不同类型协议报文之间的聚类,但同一类型报文间相似度差距不大.同时仅考虑报文头部字节序列计算相似度存在一定的争议,并不合适. 表 2 协议报文分段方法概述
Table 2 Summary of Protocol Packets Segmentation
分段方法 文献 协议类型 方法基础 分段算法 对比分析 基于信息论投票的报文分段 文献[39] 通用 词内熵与词边界熵 基于信息熵的无监督专家投票算法 创新性引入无监督的专家投票算法,但计算时间复杂度较高,且针对二进制类协议效果不明显. 文献[41] 二进制类 字节信息熵与互信息 考虑字节间的互信息熵,结合字节间信息熵规律性,分段点选取更加合理.但针对二进制协效果不明显,且不同种类协议信息熵规律性易变,不能作为准则. 文献[42] 二进制类 字节信息熵 基于最近邻聚类算法决定分段点 考虑字节间的相似度,结合聚类算法,但计算时间复杂度较高,且产生的分段点可靠性不高. 基于决策模型的报文分段 文献[43] 通用 隐半马尔可夫模型 基于隐半马尔可夫模型的最大似然概率估计 创新性引入隐半马尔可夫模型,且对噪声有一定的容忍度,但频繁序列会对结果造成一定的影响. 文献[45] 二进制类 贝叶斯决策模型 基于序列比对算法的贝叶斯空位分段点决策估计 创新性引入贝叶斯决策估计,但其极度依赖于序列比对算法,部分分段点无法正确得到. 文献[46-47] 二进制类 时间序列突变点检测 基于时间序列多累积和的报文分段点检测算法 创新性引入时间序列突变点检测算法,但计算较为复杂,需要正序列、反序列2次检测. 基于比特结构的报文分段 文献[48] 二进制类 位一致性 基于多种位一致性值序列极大值点决定分段点 创新性引入位一致性,计算简单,但位一致性缺乏实际理论证明. 文献[49] 二进制类 位翻转频率 基于位翻转率极大值点决定分段点 只针对简单的二进制协议较为有效,更适合物联网协议分析. 表 3 协议格式推断方法概述
Table 3 Summary of Protocol Format Inference
推断方法 方法基础 文献 协议类型 推断算法 对比分析 基于序列
比对的协议
格式推断传统序列
比对算法文献[25,52,54] 通用 基于SW相似度的渐进式
NW序列比对算法创新性引入序列比对算法,但结果较为依赖对齐序列. 文献[53] 文本类 基于NW序列比对算法 只利用协议头部报文4个字节进行聚类,存在较多冗余,对齐结果并不准确. 文献[55] 文本类 基于PI方法的增量式近
实时协议格式推断算法其核心采用PI方法,推断基本协议格式,采用增量形式完善格式,但实时分析受网络环境限制,无法真正应用. 优化对齐
规则的序列
比对算法文献[57] 通用 基于字段类型的NW
序列比对算法基于字段的序列比对算法,对齐更加合理,但针对二进制类协议效果不明显. 文献[58] 通用 基于TF-IDF与位置信息的
DiAlign多序列比对算法对候选协议字段进行了初步筛选,去除冗余,优化后续序列比对算法. 优化对齐
矩阵的序列
比对算法文献[36] 通用 基于语义信息的NW
序列比对算法将语义信息用于改进推断结果的准确率,但需要人工参与语义信息的收集,需要较多的先验知识. 文献[60] 二进制类 基于Pair-HMM的NW
序列比对算法对序列比对算法的匹配规则进行创新,考虑概率对齐由概率给出得分情况,降低特殊字段造成的对齐影响. 文献[61] 二进制类 基于字段间不相似度的
Hirschberg对齐算法创新性考虑字段间的不相似度以适配序列比对算法,度量报文间的不相似度,从而推断协议格式. 基于概率
统计的协议
格式推断面向协议头部
报文格式的
常规概率统计文献[63] 通用 基于K-S统计检验的
格式推断算法提取协议关键词与协议头部报文格式,为状态机推断做准备. 文献[64] 二进制类 基于字节序列统计特征的
格式推断算法创新性提出以状态转移概率图形式描述协议格式,以字节序列作为状态. 面向协议
关键词的
常规概率统计文献[39] 通用 基于多维属性统一度量
排序的关键词提取算法度量关键词的多维属性,统一标准化排序,但部分属性对分数度量影响很大,即未平衡各个属性所占权重. 文献[65] 二进制类 基于报文分段点重定位的
协议关键词提取算法核心采用ProWord方法对报文进行分段,该方法对二进制协议效果不明显,但使用重定位对分段点再次确认,适用于较为简单的工控协议. 文献[46−47] 二进制类 基于最小描述长度与位置信
息的协议关键词提取算法创新性使用时间序列突变点检测算法对协议报文进行分段,并考虑位置信息提取协议关键词,但其较为依赖分段点的准确性,可能存在冗余. 文献[66] 通用 基于聚类效果直观度量的
协议关键词概率提取算法创新性对聚类效果建立直观度量模型,根据聚类效果筛选最优协议关键词,从而得到最佳的报文聚类簇,推断更准确的协议格式. 基于LDA模型
的概率统计文献[69] 通用 基于概率分布LDA模型的
格式推断算法创新性以LDA模型提取协议关键词,协议关键词筛选较为准确完整,错误冗余字段较少. 文献[70] 文本类 基于概率分布LDA模型的
FP-Growth频繁项挖掘算法结合频繁项挖掘算法推断协议格式,时间复杂度较高,且针对协议较为简单,分析结果得到的二进制字段无意义. 基于HMM
模型的
概率统计文献[71,74] 文本类 基于时间序列与状态分类
构建最小化HMM模型创新性引入HMM模型,针对文本类效果较为明显,但其仅提取协议头部报文格式,并不完整. 文献[72] 通用 基于最优化报文分段HsMM模型的协议格式推断 采用基于HsMM模型的报文分段算法,结合AP聚类算法对协议报文进行聚类,最终推断协议格式. 文献[73] 通用 基于统计信息与HsMM
模型的协议关键词提取算法在使用HsMM模型建模前,对候选协议字段初步筛选,去除冗余,构建的HsMM模型更加简洁. 基于频繁项
挖掘的协议
格式推断基于Apriori
算法的挖掘文献[78] 通用 基于多支持度与位置信息
的频繁项挖掘算法算法需要设置支持度较多,时间复杂度较高,但格式推断较为准确,推断的状态机不具有代表性. 文献[79] 文本类 基于信息熵与多支持度的
Apriori频繁项挖掘算法重点对文本类协议报文分隔符做了详细分类,以统计信息提取协议关键词. 文献[80−81] 通用 基于CSP算法的协议
格式推断将协议字段划分为4种类型,并填充协议格式,但该格式不具有代表性. 文献[82] 二进制类 基于多维度字段长度的
Apriori频繁项挖掘算法以多种字段长度为基础挖掘频繁项,尽量减少未分段或过分段的协议关键词,但算法时间复杂度较高. 基于FP-Growth算法
的挖掘文献[83] 通用 基于信息熵分段与位置信
息熵的频繁项挖掘算法核心采用ProWord方法对报文进行分段,在频繁项挖掘之前对候选协议字段进行过滤,但其无法挖掘到完整的协议格式. 文献[84] 文本类 基于频繁项挖掘的CFSM
算法与CFGM算法考虑挖掘协议关键词间的并列、顺序与层次关系,并使用树形结构予以表示. 基于
PrefixSpan
算法的挖掘文献[85] 二进制类 基于频繁项挖掘的加密
协议明文格式推断算法创新性提取加密协议明文格式,但基于位置偏移的协议关键词提取具有一定的局限性. 引入多模式匹配的挖掘 文献[87−88] 二进制类 基于AC多模式匹配的
FP-Growth频繁项挖掘算法针对无人机等简单二进制协议具有较高的可用性,无法适用文本协议. 文献[89] 二进制类 基于AC多模式匹配的
Apriori频繁项挖掘算法针对Apriori算法的时间复杂度大有改进,提高效率. 基于深度
学习的协议
格式推断基于LSTM-FCN模型
的推断文献[90] 二进制类 基于LSTM-FCN模型的
深度学习算法创新性将深度学习算法引入协议逆向,定义5种字段类型,需要大量已知公开协议关键词结合时序关系训练模型,协议格式即为字段的识别划分. 文献[92] 二进制类 定义6种字段类型,将协议报文分为多维度长度字段进行分类,针对未知协议格式推断具有一定可行性. 表 4 语义分析方法概述
Table 4 Summary of Semantic Analysis
分析
方法文献 环境参数相关语义 标识性语义 指示性语义 特殊语义 对比分析 端口 地址 时间戳 结点名 参数 计数器 ID 报文
类型长度 偏移量 偏移
指针校验码 字符串 常量 加密
字段枚举 功能
代码基于字段取值的语义分析 文献[56] ● ● ● ● ● 需要基于正确的报文分段,构建数值集合,对数值集合间关系予以描述. 文献[57] ● ● ● ● ● ● ● 文献[93−94] ● ● ● ● ● 文献[95] ● ● ● ● ● ● 对特殊语义字段具有针对性的检测方法. 文献[96] ● ● ● ● ● 文献[65] ● ● ● ● 文献[97] ● ● ● ● ● 基于模板匹配的语义分析 文献[25] ● 需要正确先验知识构建语义模板,模板具有较低的兼容性. 文献[62] ● ● ● ● 文献[98] ● ● ● ● ● ● 注:●表示该方法中提到的可推断字段语义种类. 表 5 协议状态机推理方法概述
Table 5 Summary of Protocol State Machine Inference
推理方法 文献 协议类型 协议状态标记方法 状态机构建方法 状态机简化方法 对比分析 传统协议状态机推理 文献[17] 二进制类 基于设定最大阈值
的协议交互式基于宏聚类、微聚类
2次聚类结果的相似
状态合并构建状态机方法较为简单,其采用全协议会话,生成状态机庞大,对化简造成负担. 文献[99] 二进制类 基于字节的VDV筛选状态相关字段进行报文类型划分 基于协议状态分裂
算法的协议交互式基于制定规则的协议
状态机化简定义状态分裂算法,与字节的方差分布变化,但其采用滑动窗口机制,造成的时间复杂度较高. 文献[35] 文本类 基于凝聚层次聚类的协议报文类型划分 协议交互式 基于报文流向与
状态唯一性的协议
状态机化简该方法推断仅仅是报文序列之间顺序模型,有违协议状态机原理,并不具有代表性. 基于概率分析的协议状态机推理 文献[63] 通用 基于PAM算法的协
议状态报文筛选基于有向图的
概率分析提出扩展的概率协议状态机,包含状态间的转换概率,但并未对协议状态机进行状态合并以化简. 文献[18,100] 通用 基于最近邻聚类的
协议报文类型划分基于马尔可夫
模型的概率分析Moore状态机
最小化算法引入马尔可夫模型,并以此生成协议状态机,状态转换附带概率信息,但协议状态机模型较为复杂,化简并不完善. 基于启发式树形构建的协议状态机推理 文献[53,101] 文本类 基于普通树形的
启发式构建基于状态等价
原则的树剪枝在构建协议状态机之前对协议会话进行过滤,删除循环报文,初步简化. 文献[102] 文本类 基于最近邻聚类的
协议报文类型划分基于状态兼容性
检测的状态合并引入状态兼容性原则,重点关注协议状态机的简化. 文献[103] 文本类 基于PTA的协议
报文类型划分基于PTA的启发式
树形构建基于报文类型间因果
关系的积极状态合并引入PTA区分报文类型与状态机构建,考虑报文类型间的因果关系更有助于状态合并. 文献[104] 通用 基于Apriori算法
与K-means算法的
协议报文类型划分K-tail状态合并算法 提出新型协议状态机,引入数据保护的概念,对协议状态机附加协议关键词间约束与协议报文间依赖等信息,使协议状态机更加完善,但同时造成推理的时间复杂度升高. -
[1] 中国互联网络信息中心. 第48次中国互联网络发展状况统计报告[R/OL]. (2021-08-27) [2021-10-12]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/202108/P020210827326243065642.pdf China Internet Network Information Center. The 48th China statistical report on Internet development[R/OL]. (2021-08-27) [2021-10-12]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/202108/P020210827326243065642.pdf (in Chinese)
[2] Duchene J, Le Guernic C, Alata E, et al. State of the art of network protocol reverse engineering tools[J]. Journal of Computer Virology and Hacking Techniques, 2018, 14(1): 53−68 doi: 10.1007/s11416-016-0289-8
[3] Sija B D, Goo Y H, Shim K S, et al. A survey of automatic protocol reverse engineering approaches, methods, and tools on the inputs and outputs view[J]. Security and Communication Networks, 2018, 2018: 8370341 [4] Newsome J, Brumley D, Franklin J, et al. Replayer: Automatic protocol replay by binary analysis[C] //Proc of the 13th ACM Conf on Computer and Communications Security. New York: ACM, 2006: 311−321
[5] Caballero J, Yin Heng, Liang Zhenkai, et al. Polyglot: Automatic extraction of protocol message format using dynamic binary analysis[C] //Proc of the 14th ACM Conf on Computer and Communications Security. New York: ACM, 2007: 317−329
[6] Dupont P, Lambeau B, Dames C, et al. The QSM algorithm and its application to software behavior model induction[J]. Applied Artificial Intelligence, 2008, 22(1/2): 77−115
[7] Caballero J, Poosankam P, Kreibich C, et al. Dispatcher: Enabling active botnet infiltration using automatic protocol reverse-engineering[C] //Proc of the 16th ACM Conf on Computer and Communications Security. New York: ACM, 2009: 621−634
[8] Comparetti P M, Wondracek G, Kruegel C, et al. Prospex: Protocol specification extraction[C] //Proc of the 30th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2009: 110−125
[9] Wang Zhi, Jiang Xuxian, Cui Weidong, et al. ReFormat: Automatic reverse engineering of encrypted messages[C] //Proc of the 14th European Symp on Research in Computer Security. Berlin: Springer, 2009: 200−215
[10] 应凌云,杨轶,冯登国,等. 恶意软件网络协议的语法和行为语义分析方法[J]. 软件学报,2011,22(7):1676−1689 doi: 10.3724/SP.J.1001.2011.03858 Ying Lingyun, Yang Yi, Feng Dengguo, et al. Syntax and behavior semantics analysis of network protocol of malware[J]. Journal of Software, 2011, 22(7): 1676−1689 (in Chinese) doi: 10.3724/SP.J.1001.2011.03858
[11] Caballero J, Song D. Automatic protocol reverse-engineering: Message format extraction and field semantics inference[J]. Computer Networks, 2013, 57(2): 451−474 [12] Zeng Junyuan, Lin Zhiqiang. Towards automatic inference of kernel object semantics from binary code[C] //Proc of the 18th Int Symp on Research in Attacks, Intrusions and Defenses. Berlin: Springer, 2015: 538−561
[13] 中国信息通信研究院, 工业互联网产业联盟. 2020年上半年工业互联网安全态势综述[EB/OL]. (2020-09-19) [2021-10-12]. http://www.caict.ac.cn/kxyj/qwfb/qwsj/202009/P020200919706881804206.pdf The China Academy of Information and Communications Technology, Alliance of Industrial Internet. The overview of industrial Internet security situation in the first half of 2020[EB/OL]. (2020-09-19) [2021-10-12]. http://www.caict.ac.cn/kxyj/qwfb/qwsj/202009/P020200919706881804206.pdf (in Chinese)
[14] IETF. RFC 8922: A Survey of the Interaction Between Security Protocols and Transport Services[S/OL]. (2019-10-04) [2021-10-12]. https://datatracker.ietf.org/doc/rfc8922/?include_text=1
[15] Kleber S, Maile L, Kargl F. Survey of protocol reverse engineering algorithms: Decomposition of tools for static traffic analysis[J]. IEEE Communication Surveys and Tutorials., 2019, 21(1): 526−561 doi: 10.1109/COMST.2018.2867544
[16] Cho C Y, Babic D, Shin E C R, et al. Inference and analysis of formal models of botnet command and control protocols[C] //Proc of the 17th ACM Conf on Computer and Communications Security. New York: ACM, 2010: 426−439
[17] Leita C, Mermoud K, Dacier M. ScriptGen: An automated script generation tool for honeyd[C] //Proc of the 21st Annual Computer Security Applications Conf. Piscataway, NJ: IEEE, 2005: 203−214
[18] Krueger T, Gascon H, Kramer N. Learning stateful models for network honeypots[C] //Proc of the 5th ACM Workshop on Security and Artificial Intelligence. New York: ACM, 2012: 37−48
[19] Gascon H, Wressnegger C, Yamaguchi F, et al. PULSAR: Stateful black-box fuzzing of proprietary network protocols[C] //Proc of the 11th Int Conf on Security and Privacy in Communication Networks. Berlin: Springer, 2015: 330−347
[20] Blumbergs B, Vaarandi R. Bbuzz: A bit-aware fuzzing framework for network protocol systematic reverse engineering and analysis[C] //Proc of the 36th IEEE Military Communications Conf. Piscataway, NJ: IEEE, 2017: 707−712
[21] Kim J, Choi H, Namkung H, et al. Enabling automatic protocol behavior analysis for android applications[C] //Proc of the 12th Int on Conf on Emerging Networking Experiments and Technologies. New York: ACM, 2016: 281−295
[22] Choi K, Son Y, Noh J, et al. Dissecting customized protocols: Automatic analysis for customized protocols based on IEEE 802.15.4[C] //Proc of the 9th ACM Conf on Security, Privacy in Wireless and Mobile Networks. New York: ACM, 2016: 183−193
[23] Stute M, Kreitschmann D, Hollick M. Reverse engineering and evaluating the apple wireless direct link protocol[J]. GetMobile Mobile Computer and Communications, 2019, 23(1): 30−33 doi: 10.1145/3351422.3351432
[24] Yang Zhi, Gou Xiantai, Jin Weidong, et al. Reverse engineering for UAV control protocol based on detection data[C] //Proc of the 2nd Int Conf on Multimedia and Image Processing. Piscataway, NJ: IEEE, 2017: 301−304
[25] Ji Ran, Wang Jian, Tang Chaojing, et al. Automatic reverse engineering of private flight control protocols of UAVs[J]. Security and Communication Networks, 2017, 2017: 1308045
[26] Wressnegger C, Kellner A, Rieck K. ZOE: Content-based anomaly detection for industrial control systems[C] //Proc of the 48th Annual IEEE/IFIP Int Conf on Dependable Systems and Networks. Piscataway, NJ: IEEE, 2018: 127−138
[27] Marin E, Singelee D, Yang Bohan, et al. On the feasibility of cryptography for a wireless insulin pump system[C] //Proc of the 6th ACM on Conf on Data and Application Security and Privacy. New York: ACM, 2016: 113−120
[28] Marin E, Singelee D, Yang Bohan, et al. Securing wireless neurostimulators[C] //Proc of the 8th ACM Conf on Data and Application Security and Privacy. New York: ACM, 2018: 287−298
[29] 潘吴斌,程光,郭晓军,等. 网络加密流量识别研究综述及展望[J]. 通信学报,2016,37(9):154−167 doi: 10.11959/j.issn.1000-436x.2016187 Pan Wubin, Cheng Guang, Guo Xiaojun, et al. Review and perspective on encrypted traffic identification research[J]. Journal on Communications, 2016, 37(9): 154−167 (in Chinese) doi: 10.11959/j.issn.1000-436x.2016187
[30] Fukunaga K, Narendra P M. A branch and bound algorithm for computing k-nearest neighbors[J]. IEEE Transactions on Computers, 1975, 24(7): 750−753
[31] Sokal R R, Michener C D. A statistical method of evaluating systematic relationships[J]. The University of Kansas Science Bulletin, 1958, 38(22): 1409−1438
[32] Kaufman L, Rousseeuw P J. Partitioning around medoids[M] //Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, NJ: John Wiley & Sons, 2005: 68−125
[33] Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C] //Proc of the 2nd Int Conf on Knowledge Discovery and Data Mining. Palo Alto, CA: AAAI, 1996: 226−231
[34] Frey B J, Dueck D. Clustering by passing messages between data points[J]. Science, 2007, 315(5814): 972−976 doi: 10.1126/science.1136800
[35] Shevertalov M, Mancoridis S. A reverse engineering tool for extracting protocols of networked applications[C] //Proc of the 14th Working Conf on Reverse Engineering. New York: ACM, 2007: 229−238
[36] Bossert G, Guihery F, Hiet G. Towards automated protocol reverse engineering using semantic information[C] //Proc of the 9th ACM Symp on Information, Computer and Communications Security. New York: ACM, 2014: 51−62
[37] Luo Xin, Chen Dan, Wang Yongjun, et al. A type-aware approach to message clustering for protocol reverse engineering[J]. Sensors, 2019, 19(3): 716−729 doi: 10.3390/s19030716
[38] Li Yihao, Hong Zheng, Feng Wenbo, et al. A message clustering method based on rough set theory[C] //Proc of the 4th Advanced Information Technology, Electronic and Automation Control Conf. Piscataway, NJ: IEEE, 2019: 1128−1133
[39] Zhang Zhuo, Zhang Zhibin, Lee P P C, et al. Toward unsupervised protocol feature word extraction[J]. IEEE Journal on Selected Areas in Communications, 2014, 32(10): 1894−1906 doi: 10.1109/JSAC.2014.2358857
[40] Sun Fanghui, Wang Shen, Zhang Chunrui, et al. Clustering of unknown protocol messages based on format comparison[J]. Computer Networks, 2020, 179: 107296
[41] Sun Fanghui, Wang Shen, Zhang Chunrui, et al. Unsupervised field segmentation of unknown protocol messages[J]. Computer Communications, 2019, 146: 121−130
[42] Jiang Dongxiao, Li Chenggang, Ma Lixin, et al. ABInfer: A novel field boundaries inference approach for protocol reverse engineering[C] //Proc of the 6th IEEE Int Conf on Big Data Security on Cloud (BigDataSecurity), IEEE Int Conf on High Performance and Smart Computing (HPSC), and IEEE Int Conf on Intelligent Data and Security (IDS). Piscataway, NJ: IEEE, 2020: 19−23
[43] 黎敏,余顺争. 抗噪的未知应用层协议报文格式最佳分段方法[J]. 软件学报,2013,24(3):604−617 Li Min, Yu Shunzheng. Noise-tolerant and optimal segmentation of message formats for unknown application-layer protocols[J]. Journal of Software, 2013, 24(3): 604−617 (in Chinese)
[44] Yu Shunzheng. Hidden semi-Markov models[J]. Artificial Intelligence, 2010, 174(2): 215−243 doi: 10.1016/j.artint.2009.11.011
[45] Tao Siyu, Yu Hongyi, Li Qing. Bit-oriented format extraction approach for automatic binary protocol reverse engineering[J]. IET Communications, 2016, 10(6): 709−716 doi: 10.1049/iet-com.2015.0797
[46] Cai Jun, Luo Jianzhen, Ruan Jianliang, et al. Toward fuzz test based on protocol reverse engineering[C] //Proc of the 13th Int Conf on Information Security Practice and Experience. Berlin: Springer, 2017: 892−897
[47] Luo Jianzhen, Shan Chun, Cai Jun, et al. IoT application-layer protocol vulnerability detection using reverse engineering[J]. Symmetry, 2018, 10(11): 561−574 doi: 10.3390/sym10110561
[48] Kleber S, Kopp H, Kargl F. NEMESYS: Network message syntax reverse engineering by analysis of the intrinsic structure of individual messages[C/OL] //Proc of the 12th USENIX Workshop on Offensive Technologies. Berkeley, CA: USENIX Association, 2018 [2021-10-12]. https://www.usenix.org/conference/woot18/presentation/kleber
[49] Marchetti M, Stabili D. READ: Reverse engineering of automotive data frames[J]. IEEE Transactions on Information Forensics and Security, 2019, 14(4): 1083−1097 doi: 10.1109/TIFS.2018.2870826
[50] Needleman S B, Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of Molecular Biology, 1970, 48(3): 443−453 doi: 10.1016/0022-2836(70)90057-4
[51] Smith T F, Waterman M S. Identification of common molecular subsequences[J]. Journal of Molecular Biology, 1981, 147(1): 195−197 doi: 10.1016/0022-2836(81)90087-5
[52] Beddoe M A. Network protocol analysis using bioinformatics algorithms[EB/OL]. 2004 [2021-10-12]. http://phreakocious.net/PI/PI.pdf
[53] Gorbunov S, Rosenbloom A. AutoFuzz: Automated network protocol fuzzing framework[J]. International Journal of Computer Science and Network Security, 2010, 10(8): 239−245
[54] Razo S I V, Anaya E A, Ambrosio P J E. Reverse engineering with bioinformatics algorithms over a sound android covert channel[C] //Proc of the 11th Int Conf on Malicious and Unwanted Software (MALWARE). Piscataway, NJ: IEEE, 2016: 3−9
[55] Zhang Xiaoming, Qiang Qian, Wang Weisheng, et al. IPFRA: An online protocol reverse analysis mechanism[C] //Proc of the 4th Int Conf on Cloud Computing and Security. Berlin: Springer, 2018: 324−333
[56] Cui Weidong, Paxson V, Weaver N, et al. Protocol-independent adaptive replay of application dialog[C/OL] //Proc of the 13th Network and Distributed System Security Symp(NDSS). Piscataway, NJ: IEEE, 2006 [2021-10-12]. https://www.ndss-symposium.org/ndss2006/protocol-independent-adaptive-replay-application-dialog
[57] Cui Weidong, Kannan J, Wang H J. Discoverer: Automatic protocol reverse engineering from network traces[C/OL] //Proc of the 16th USENIX Security Symp. Berkeley, CA: USENIX Association, 2007 [2021-10-12]. https://www.usenix.org/conference/16th-usenix-security-symposium/discoverer-automatic-protocol-reverse-engineering-network
[58] Esoul O, Walkinshaw N. Using segment-based alignment to extract packet structures from network traces[C] //Proc of the 2017 IEEE Int Conf on Software Quality, Reliability and Security. Piscataway, NJ: IEEE, 2017: 398−409 [59] Morgenstern B. DIALIGN 2: Improvement of the segment-to-segment approach to multiple sequence alignment[J]. Bioinformatics, 1999, 15(3): 211−218 doi: 10.1093/bioinformatics/15.3.211
[60] Meng Fanzhi, Zhang Chunrui, Wu Guo. Protocol reverse based on hierarchical clustering and probability alignment from network traces[C] //Proc of the 3rd IEEE Int Conf on Big Data Analysis. Piscataway, NJ: IEEE, 2018: 443−447
[61] Kleber S, Hejiden R W, Kargl F. Message type identification of binary network protocols using continuous segment similarity[C] //Proc of the 39th IEEE Conf on Computer Communications. Piscataway, NJ: IEEE, 2020: 2243−2252
[62] Kleber S, Kargl F. Poster: Network message field type recognition[C] //Proc of the 2019 ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2019: 2581−2583
[63] Wang Yipeng, Zhang Zhibin, Yao Danfeng, et al. Inferring protocol state machine from network traces: A probabilistic approach[C] //Proc of the 9th Int Conf on Applied Cryptography and Network Security. New York: ACM, 2011: 1−18 [64] Wang Yipeng, Li Xingjian, Meng Jiao, et al. Biprominer: Automatic mining of binary protocol features[C] //Proc of the 12th Int Conf on Parallel and Distributed Computing, Applications and Technologies. Berlin: Springer, 2011: 179−184
[65] Wang Xiaowei, Lv Kezhi, Li Bo. IPART: An automatic protocol reverse engineering tool based on global voting expert for industrial protocols[J]. International Journal of Parallel, Emergent and Distributed Systems, 2020, 35(3): 376−395 doi: 10.1080/17445760.2019.1655740
[66] Ye Yapeng, Zhang Zhuo, Wang Fei, et al. NETPLIER: Probabilistic network protocol reverse engineering from message traces[C/OL] //Proc of the 28th Network and Distributed Systems Security Symp(NDSS). Piscataway, NJ: IEEE, 2021 [2021-10-12]. https://www.ndss-symposium.org/ndss-paper/netplier-probabilistic-network-protocol-reverse-engineering-from-message-traces/
[67] Blei D M, Yg A Y, Jordan M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993−1022 [68] Baum L E, Petrie T. Statistical inference for probabilistic functions of finite state Markov chains[J]. The Annals of Mathematical Statistics, 1966, 37(6): 1554−1563 doi: 10.1214/aoms/1177699147
[69] Wang Yipeng, Yun Xiaochun, Shafiq M Z, et al. A semantics aware approach to automated reverse engineering unknown protocols[C/OL] //Proc of the 20th IEEE Int Conf on Network Protocols. Piscataway, NJ: IEEE, 2012 [2021-10-12]. https://ieeexplore.ieee.org/document/6459963
[70] Li Haifeng, Shuai Bo, Wang Jian, et al. Protocol reverse engineering using LDA and association analysis[C] //Proc of the 11th Int Conf on Computational Intelligence and Security. New York: ACM, 2015: 312−316
[71] Whalen S, Bishop M, Crutchfield J P. Hidden Markov models for automated protocol learning[C] //Proc of the 6th Int ICST Conf on Security and Privacy in Communication Networks. Berlin: Springer, 2010: 415−428
[72] Cai Jun, Luo Jianzhen, Lei Fangyuan. Analyzing network protocols of application layer using hidden semi-Markov model[J]. Mathematical Problems in Engineering, 2016, 2016: 9161723
[73] Li Baichao, Yu Shunzheng. Keyword mining for private protocols tunneled over websocket[J]. IEEE Communications Letters, 2016, 20(7): 1337−1340
[74] He Yunhua, Shen Jialong, Xiao Ke, et al. A sparse protocol parsing method for IIoT protocols based on HMM hybrid model[C] //Proc of the 2020 IEEE Int Conf on Communications. Piscataway, NJ: IEEE, 2020: 1−6
[75] Agrawal R, Srikant R. Fast algorithms for mining association rules[C] //Proc of the 20th VLDB Conf. New York: ACM, 1994: 487−499
[76] Han Jiawei, Pei Jian, Yin Yiwen. Mining frequent patterns without candidate generation[C] //Proc of the 2000 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2000: 1−12 [77] Pei Jian, Han Jiawei, Mortazavi-Asl B, et al. Mining sequential patterns by pattern-growth: The PrefixSpan approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1424−1440 doi: 10.1109/TKDE.2004.77
[78] Luo Jianzhen, Yu Shunzheng. Position-based automatic reverse engineering of network protocols[J]. Journal of Network and Computer Application, 2013, 36(3): 1070−1077 doi: 10.1016/j.jnca.2013.01.013
[79] Lee M S, Goo Y H, Shim K S, et al. A method for extracting static fields in private protocol using entropy and statistical analysis[C/OL] //Proc of the 20th Asia-Pacific Network Operations and Management Symp. Piscataway, NJ: IEEE, 2019 [2021-10-12]. https://ieeexplore.ieee.org/document/8893038
[80] Shim K S, Goo Y H, Lee M S, et al. Inference of network unknown protocol structure using CSP(contiguous sequence pattern) algorithm based on tree structure[C/OL] //Proc of the 2018 IEEE/IFIP Network Operations and Management Symp. Piscataway, NJ: IEEE, 2018 [2021-10-12]. https://ieeexplore.ieee.org/document/8406311
[81] Goo Y H, Shim K S, Lee M S, et al. Protocol specification extraction based on contiguous sequential pattern algorithm[J]. IEEE Access, 2019, 7: 36057−36074
[82] 秦中元,陆凯,张群芳,等. 一种二进制私有协议字段格式划分方法[J]. 小型微型计算机系统,2019,40(11):2318−2323 doi: 10.3969/j.issn.1000-1220.2019.11.014 Qin Zhongyuan, Lu Kai, Zhang Qunfang, et al. Approach of field format extraction in binary private protocol[J]. Journal of Chinese Computer Systems, 2019, 40(11): 2318−2323 (in Chinese) doi: 10.3969/j.issn.1000-1220.2019.11.014
[83] Li Gaochao, Qiang Qian, Wang Zhonghua, et al. Protocol keywords extraction method based on frequent item-sets mining[C] //Proc of the 2018 Int Conf on Information Science and System. New York: ACM, 2018: 53−58 [84] Lin Peihong, Hong Zheng, Wu Lifa, et al. Protocol format extraction based on an improved CFSM algorithm[J]. China Communications, 2020, 17(11): 156−180 doi: 10.23919/JCC.2020.11.014
[85] 朱玉娜,韩继红,袁霖,等. SPFPA: 一种面向未知安全协议的格式解析方法[J]. 计算机研究与发展,2015,52(10):2200−2211 doi: 10.7544/issn1000-1239.2015.20150568 Zhu Yuna, Han Jihong, Yuan Lin, et al. SPFPA: A format parsing approach for unknown security protocols[J]. Journal of Computer Research and Development, 2015, 52(10): 2200−2211 (in Chinese) doi: 10.7544/issn1000-1239.2015.20150568
[86] Aho A V, Corasick M J. Efficient string matching: An aid to bibliographic search[J]. Communications of the ACM, 1975, 18(6): 333−340 doi: 10.1145/360825.360855
[87] Wang Yong, Zhang Nan, Wu Yanmei, et al. Protocol formats reverse engineering based on association rules in wireless environment[C] //Proc of the 12th IEEE Int Conf on Trust, Security and Privacy in Computing and Communications. Piscataway, NJ: IEEE, 2013: 134−141
[88] Ji Ran, Li Haifeng, Tang Chaojing. Extracting keywords of UAVs wireless communication protocols based on association rules learning[C] //Proc of the 12th Int Conf on Computational Intelligence and Security. Berlin: Springer, 2016: 309−313
[89] Hei Xinhong, Bai Binbin, Wang Yichuan, et al. Feature extraction optimization for bitstream communication protocol format reverse analysis[C] //Proc of the 18th IEEE Int Conf on Trust, Security and Privacy in Computing and Communications. Piscataway, NJ: IEEE, 2019: 662−669
[90] Zhao Rui, Liu Zhaohui. Analysis of private industrial control protocol format based on LSTM-FCN model[C] //Proc of the 2020 Int Conf on Aviation Safety and Information Technology. New York: ACM, 2020: 330−335 [91] Karim F, Majumdar S, Darabi H, et al. LSTM fully convolutional networks for time series classification[J]. IEEE Access, 2018, 6: 1662−1669
[92] Yang Chenglong, Fu Cao, Qian Yekui, et al. Deep learning-based reverse method of binary protocol[C] //Proc of the 1st Int Conf on Security and Privacy in Digital Economy. Berlin: Springer, 2020: 606−624
[93] Bermudez I, Tongaonkar A, Iliofotou M, et al. Automatic protocol field inference for deeper protocol understanding[C/OL] //Proc of the 14th IFIP Networking Conf. Piscataway, NJ: IEEE, 2015 [2021-10-12]. https://ieeexplore.ieee.org/document/7145307
[94] Bermudez I, Tongaonkar A, Iliofotou M, et al. Towards automatic protocol field inference[J]. Computer Communications, 2016, 84: 40−51
[95] De Carli L, Torres R, Modelo-Howard G, et al. Botnet protocol inference in the presence of encrypted traffic[C/OL] //Proc of the 36th IEEE Conf on Computer Communications. Piscataway, NJ: IEEE, 2017 [2021-10-12]. https://ieeexplore.ieee.org/document/8057064
[96] Ladi G, Buttyan L, Holczer T. Message format and field semantics inference for binary protocols using recorded network traffic[C/OL] //Proc of the 26th Int Conf on Software, Telecommunications and Computer Networks. Piscataway, NJ: IEEE, 2018 [2021-10-12]. https://ieeexplore.ieee.org/document/8555813
[97] 张蔚瑶,张磊,毛建瓴,等. 未知协议的逆向分析与自动化测试[J]. 计算机学报,2020,43(4):653−667 doi: 10.11897/SP.J.1016.2020.00653 Zhang Weiyao, Zhang Lei, Mao Jianling, et al. An automated method of unknown protocol fuzzing test[J]. Chinese Journal of Computers, 2020, 43(4): 653−667 (in Chinese) doi: 10.11897/SP.J.1016.2020.00653
[98] Wang Qun, Sun Zhonghua, Wang Zhangquan, et al. A practical format and semantic reverse analysis approach for industrial control protocols[J]. Security and Communication Networks, 2021, 2021: 6690988 [99] Trifilo A, Burschka S, Biersack E. Traffic to protocol reverse engineering[C/OL] //Proc of the 2009 IEEE Symp on Computational Intelligence in Security and Defense Applications. Piscataway, NJ: IEEE, 2009 [2021-10-12]. https://ieeexplore.ieee.org/document/5356565
[100] Krueger T, Kramer N, Rieck K. ASAP: Automatic semantics-aware analysis of network payloads [C] //Proc of Privacy and Security Issues in Data Mining and Machine Learning (Workshop of the 21st Int ECML/14th PKDD). Berlin: Springer, 2010: 50−63
[101] Hsu Y, Shu Guoqiang, Lee D. A model-based approach to security flaw detection of network protocol implementations[C] //Proc of the 16th IEEE Int Conf on Network Protocols. Piscataway, NJ: IEEE, 2008: 114−123
[102] Lee C, Bae J, Lee H. PRETT: Protocol reverse engineering using binary tokens and network traces[C] //Proc of the 33rd IFIP Int Conf on ICT Systems Security and Privacy Protection. Piscataway, NJ: IEEE, 2018: 141−155
[103] Antunes J, Neves N, Verissimo P. Reverse engineering of protocols from network traces[C] //Proc of the 18th Working Conf on Reverse Engineering. New York: ACM, 2011: 169−178
[104] Lin Yingdar, Lai Yukuen, Bui Quantien, et al. ReFSM: Reverse engineering from protocol packet traces to test generation by extended finite state machines[J]. Journal of Network and Computer Applications, 2020, 171: 102819
[105] Wang Yipeng, Yun Xiaochun, Zhang Yongzheng, et al. Rethinking robust and accurate application protocol identification[J]. Computer Networks, 2017, 129(P1): 64−78
[106] Liu Kaizheng, Yang Ming, Ling Zhen, et al. On manually reverse engineering communication protocols of Linux-based IoT systems[J]. IEEE Internet of Things Journal, 2021, 8(8): 6815−6827 doi: 10.1109/JIOT.2020.3036232
-
期刊类型引用(6)
1. 李萍,王昕. 控制网络节点可重构无线通讯协议性能仿真. 计算机仿真. 2021(04): 129-133 . 百度学术
2. 陈亮,李峰,任保全,杨建喜. 软件定义物联网研究综述. 电子学报. 2021(05): 1019-1032 . 百度学术
3. 谢可,郭文静,祝文军,张楠,琚贇. 面向电力物联网海量终端接入技术研究综述. 电力信息与通信技术. 2021(09): 57-69 . 百度学术
4. 张淑清. 基于哈希计算的大数据冗余消除算法设计. 微型电脑应用. 2021(12): 68-70 . 百度学术
5. 彭新玉. 基于未来网络关键技术的工业互联网平台方案研究及应用. 通讯世界. 2020(01): 37-38 . 百度学术
6. 刘思,张德干,刘晓欢,张婷,吴昊. 一种基于判定区域的AODV路由的自适应修复算法. 计算机研究与发展. 2020(09): 1898-1910 . 本站查看
其他类型引用(3)