基于信息瓶颈理论的鲁棒少标签虚假信息检测

王吉宏; 赵书庆; 罗敏楠; 刘欢; 赵翔; 郑庆华

doi:10.7544/issn1000-1239.202330506

基于信息瓶颈理论的鲁棒少标签虚假信息检测

Robust Few-Label Misinformation Detection Based on Information Bottleneck Theory

摘要

摘要: 虚假信息检测对于维护网络舆情安全具有重要意义. 研究表明，虚假信息在信息内容和传播结构上较真实信息具有显著不同. 为此，近年来研究致力于挖掘信息内容和信息传播结构，提升虚假信息检测的精准性. 然而，现实场景中虚假信息的标注往往需要大量地与官方报道等比照分析，代价较为昂贵，现有方法对标注信息的过分依赖限制了其实际应用. 此外，虚假信息传播者可通过在评论区控评等手段恶意操纵虚假信息的传播，增加了虚假信息检测的难度. 为此，基于信息瓶颈理论提出一种鲁棒少标签虚假信息检测方法，通过互信息最大化技术融合无标注样本信息，克服虚假信息检测对标签的过分依赖问题；并通过对抗训练的策略模拟虚假信息传播者的恶意操纵行为，基于信息瓶颈理论学习鲁棒的虚假信息表征，在高质量表征虚假信息的同时消除恶意操纵行为的影响. 实验表明，该方法在少标签识别和鲁棒性2个方面均取得了优于基准方法的效果.

Abstract: Misinformation detection is crucial for the social stability. Researches show that there are substantial distinctions between misinformation and real information in terms of information content and propagation structure. Consequently, recent researchers mainly focus on improving the accuracy of misinformation detection by jointly considering the information content and propagation structure. However, these methods can be infeasible in practice since they highly rely on manual label information. The manual labels can be expensive since they require extensive comparison with official reports and other evidence. Moreover, the spreaders of misinformation can adversarially manipulate the information content and propagation structure by controlling reviews and other methods. Such behaviors may exacerbate the challenges of misinformation detection. To address these problems, we propose a robust few-label misinformation detection method based on information bottleneck theory. Specifically, to mitigate the dependence on labeled data, we propose to integrate the unlabeled sample information by employing the mutual information maximization technique. Furthermore, to improve the robustness of our method against the adversarial manipulation of misinformation spreaders, we employ the adversarial training strategy to simulate the behaviors of the spreaders and propose to learn robust representations based on the information bottleneck theory. The learned representations can effectively embed the essential information in the misinformation while discarding the adversarial information involved by the spreaders. Empirical evaluations validate the effectiveness of the proposed approach, demonstrating superior performance compared with benchmark methods in terms of few-label detection and robustness.

HTML全文

参考文献(55)

施引文献

资源附件(0)