ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (5): 1092-1100.doi: 10.7544/issn1000-1239.2019.20180472

• 人工智能 • 上一篇    下一篇

基于图片问答的静态重启随机梯度下降算法

李胜东1,2,吕学强3   

  1. 1(中国人民大学信息学院 北京 100872); 2(廊坊燕京职业技术学院计算机工程系 河北廊坊 065200); 3(网络文化与数字传播北京市重点实验室(北京信息科技大学) 北京 100101) (lsd@ruc.edu.cn)
  • 出版日期: 2019-05-01
  • 基金资助: 
    国家自然科学基金项目(61671070);国家语委十三五科研规划2017年度重点项目(ZDI135-53);网络文化与数字传播北京市重点实验室开放课题(ICDD201505)

Static Restart Stochastic Gradient Descent Algorithm Based on Image Question Answering

Li Shengdong1,2, Lü Xueqiang3   

  1. 1(School of Information, Renmin University of China, Beijing 100872); 2(Department of Computer Engineering, Langfang Yanjing Vocational Technical College, Langfang, Hebei 065200); 3(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research(Beijing Information Science and Technology University), Beijing 100101)
  • Online: 2019-05-01

摘要: 图片问答是计算机视觉与自然语言处理交叉的多模态学习任务.为了解决该任务,研究人员提出堆叠注意力网络(stacked attention networks, SANs).研究发现该模型易陷入不好的局部最优解,引发较高的问答错误率.为了解决该问题,提出基于图片问答的静态重启随机梯度下降算法.实验结果和分析表明:它的准确率比基准算法提高0.29%,但其收敛速度慢于基准算法.为了验证改善性能的显著性,对实验结果进行统计假设检验.T检验结果证明它的改善性能是极其显著的.为了验证它在同类算法中的有效性,将该算法和当前最好的一阶优化算法进行有效性实验,实验结果和分析证明它更有效.为了验证它的泛化性能和推广价值,在经典的Cifar-10数据集上进行图像识别实验.实验结果和T检验结果证明:它具有良好的泛化性能和较好的推广价值.

关键词: 图片问答, 堆叠的注意力网络, 动量, 静态重启, 随机梯度下降

Abstract: Image question answering is a multimodal learning task intersecting computer vision and natural language processing. With the breakthroughs in the deep neural networks, it has been the hotspot and focus of many researchers’ attention. To solve the task, researchers put forward numerous excellent models. Stacked attention networks (SANs) is one of the most typical models, and gets the state-of-the-art results in the test of four public visual question answering datasets. Although it has the excellent performance, because of the diversity of question and the sparsity of answer, it cannot fully learn the universal law of the corpus, and easily fall into the poor local optimal solution, which leads to the higher question answering error rate. By analyzing the causes of the error and observing the details of the model processing image question answering, we find that stochastic gradient descent based on momentum (baseline) has some defects in the optimization of SANs. To solve it, we propose static restart stochastic gradient descent based on image question answering. The experimental results show that its accuracy is 0.29% higher than baseline, but its convergence rate is slower than baseline. To verify the significance of the improved performance, we conduct statistical hypothesis test on the experimental results. The results of T test prove that its improved performance is extremely significant in the process of converging to the global optimal solution. To verify its effectiveness in the same kind of algorithm, we conduct effectiveness experiments with it and the state-of-the-art first-order optimization algorithms. The experimental results and analysis prove that it is more effective in solving image question answering. To verify its generalization performance and promotion value, we conduct the image recognition experiment on the classic Cifar-10 for the image recognition task. The experimental results and the results of T test prove that it has good generalization performance and promotion value in the process of converging to the global optimal solution.

Key words: image question answering, stacked attention networks (SANs), momentum, static restart, stochastic gradient descent (SGD)

中图分类号: