ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (11): 2328-2336.doi: 10.7544/issn1000-1239.2020.20200413

所属专题: 2020密码学与数据隐私保护研究专题

• 信息安全 • 上一篇    下一篇

MSRD: 多模态网络谣言检测方法

刘金硕1,冯阔1,Jeff Z. Pan2,邓娟1,王丽娜1   

  1. 1(空天信息安全与可信计算教育部重点实验室,武汉大学国家网络安全学院 武汉 430072);2(阿伯丁大学 苏格兰阿伯丁 AB24 3FX) (
  • 出版日期: 2020-11-01
  • 基金资助: 

MSRD: Multi-Modal Web Rumor Detection Method

Liu Jinshuo1, Feng Kuo1, Jeff Z. Pan2, Deng Juan1, Wang Lina1   

  1. 1(Key Laboratory of Aerospace Information Security and Trusted Computing,Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan 430072);2(University of Aberdeen, Aberdeen, Scotland AB24 3FX)
  • Online: 2020-11-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (U1936107, 6187613, 61672393).

摘要: 图像和文本相结合的多模态网络谣言由于更具迷惑性和煽动性,对国家安全和社会稳定的危害性更严重.目前网络谣言检测工作充分考虑了谣言中配文的文本内容而忽略了图像内容以及图像中的内嵌文本内容,因此,提出了一种基于深度神经网络针对图像、图像内嵌文本以及配文文本内容的多模态网络谣言检测方法MSRD.该方法使用VGG-19网络提取图像内容特征,使用DenseNet提取图像内嵌文本内容,使用LSTM网络提取文本内容特征,与图像特征串接后,通过完全连接层获取图像与文本共享表示的均值与方差向量,借助从高斯分布中采样的随机变量以形成重新参数化的多模态特征并作为谣言检测器的输入进行谣言检测.实验表明:该方法在Twitter和微博两大数据集上达到了68.5%和79.4%的准确率.

关键词: 多模态, 谣言检测, 图像内嵌文本, 自然语言处理, 深度神经网络

Abstract: The multi-modal web rumors that combine images and texts are more confusing and inflammatory, so they are more harmful to national security and social stability. At present, the web rumor detection work fully considers the text content of the essay in the rumor, and ignores the image content and the embedded text in the image. Therefore, this paper proposes a multi-modal web rumors detection method MSRD for the image, embedded text in the image and the text of the essay based on deep neural networks. This method uses the VGG-19 network to extract image content features, DenseNet to extract embedded text content, and LSTM network to extract text content features. After concatenating with the image features, the mean and variance vectors of the image and text shared representations are obtained through the fully connected layer, and the random variables sampled from the Gaussian distribution are used to form a re-parameterized multi-modal feature and used as the input of the rumor detector. Experiments show that the method achieves 68.5% and 79.4% accuracy on the two data sets of Twitter and Weibo.

Key words: multimodal, rumor detection, inline text in image, natural language processing, deep neural network