ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (1): 48-59.doi: 10.7544/issn1000-1239.2021.20200264

• 人工智能 • 上一篇    下一篇

基于时空融合图网络学习的视频异常事件检测

周航,詹永照,毛启容   

  1. (江苏大学计算机科学与通信工程学院 江苏镇江 212013) (henrryzh@qq.com)
  • 出版日期: 2021-01-01
  • 基金资助: 
    国家自然科学基金项目(61672268)

Video Anomaly Detection Based on Space-Time Fusion Graph Network Learning

Zhou Hang, Zhan Yongzhao, Mao Qirong   

  1. (School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013)
  • Online: 2021-01-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61672268).

摘要: 视频中异常事件所体现的时空特征存在着较强的相关关系.针对视频异常事件发生的时空特征相关性而影响检测性能问题,提出了基于时空融合图网络学习的视频异常事件检测方法,该方法针对视频片段的特征分别构建空间相似图和时间连续图,将各片段对应为图中的节点,考虑各节点特征与其他节点特征的Top-k相似性动态形成边的权重,构成空间相似图;考虑各节点的m个时间段内的连续性形成边的权重,构成时间连续图.将空间相似图和时间连续图进行自适应加权融合形成时空融合图卷积网络,并学习生成视频特征.在排序损失中加入图的稀疏项约束降低图模型的过平滑效应并提升检测性能.在UCF-Crime和ShanghaiTech等视频异常事件数据集上进行了实验,以接收者操作曲线(receiver operating characteristic curve, ROC)以及曲线下面积(area under curve, AUC)值作为性能度量指标.在UCF-Crime数据集下,提出的方法在AUC上达到80.76%,比基准线高5.35%;在ShanghaiTech数据集中,AUC达到89.88%,比同类最好的方法高5.44%.实验结果表明:所提出的方法可有效提高视频异常事件检测的性能.

关键词: 视频异常事件检测, 空间相似图, 时间连续图, 自适应加权, 图卷积网络

Abstract: There are strong correlations among spatial-temporal features of abnormal events in videos. Aiming at the problem of performance for abnormal event detection caused by these correlations, a video anomaly detection method based on space-time fusion graph network learning is proposed. In this method, spatial similarity graph and temporal trend graph for the segments are constructed in terms of the features of the segments. The spatial similarity graph is built dynamically by treating the features of the video segments as the vertexes in graph. In this graph, the weights of edges are dynamically formed by taking the relationship between vertex and its Top-k similarity vertexes into account. The temporal trend graph is built by taking the time distance for m sequential segments into account. The space-time fusion graph convolutional network is constructed by adaptively weighting the spatial similarity graph and temporal trend graph. The video embedding features are learnt and generated by using this graph convolutional network. A graph sparse regularization is added to the ranking loss, in order to reduce the over-smoothing effect of graph model and improve detection performance. The experiments are conducted on two challenging video datasets: UCF-Crime(University of Central Florida crime dataset) and ShanghaiTech. ROC(receiver operating characteristic curve) and AUC (area under curve) are taken as performance metrics. Our method obtains the AUC score of 80.76% rising by 5.35% compared with the baseline on UCF-Crime dataset, and also gets the score of 89.88% rising by 5.44% compared with SOTA(state of the art) weakly supervised algorithm on ShanghaiTech. The experimental results show that our proposed method can improve the performance of video abnormal event detection effectively.

Key words: video anomaly detection, spatial similarity graph, temporal trend graph, adaptive weighting, graph convolutional network

中图分类号: