高级检索

    基于图片问答的静态重启随机梯度下降算法

    Static Restart Stochastic Gradient Descent Algorithm Based on Image Question Answering

    • 摘要: 图片问答是计算机视觉与自然语言处理交叉的多模态学习任务.为了解决该任务,研究人员提出堆叠注意力网络(stacked attention networks, SANs).研究发现该模型易陷入不好的局部最优解,引发较高的问答错误率.为了解决该问题,提出基于图片问答的静态重启随机梯度下降算法.实验结果和分析表明:它的准确率比基准算法提高0.29%,但其收敛速度慢于基准算法.为了验证改善性能的显著性,对实验结果进行统计假设检验.T检验结果证明它的改善性能是极其显著的.为了验证它在同类算法中的有效性,将该算法和当前最好的一阶优化算法进行有效性实验,实验结果和分析证明它更有效.为了验证它的泛化性能和推广价值,在经典的Cifar-10数据集上进行图像识别实验.实验结果和T检验结果证明:它具有良好的泛化性能和较好的推广价值.

       

      Abstract: Image question answering is a multimodal learning task intersecting computer vision and natural language processing. With the breakthroughs in the deep neural networks, it has been the hotspot and focus of many researchers’ attention. To solve the task, researchers put forward numerous excellent models. Stacked attention networks (SANs) is one of the most typical models, and gets the state-of-the-art results in the test of four public visual question answering datasets. Although it has the excellent performance, because of the diversity of question and the sparsity of answer, it cannot fully learn the universal law of the corpus, and easily fall into the poor local optimal solution, which leads to the higher question answering error rate. By analyzing the causes of the error and observing the details of the model processing image question answering, we find that stochastic gradient descent based on momentum (baseline) has some defects in the optimization of SANs. To solve it, we propose static restart stochastic gradient descent based on image question answering. The experimental results show that its accuracy is 0.29% higher than baseline, but its convergence rate is slower than baseline. To verify the significance of the improved performance, we conduct statistical hypothesis test on the experimental results. The results of T test prove that its improved performance is extremely significant in the process of converging to the global optimal solution. To verify its effectiveness in the same kind of algorithm, we conduct effectiveness experiments with it and the state-of-the-art first-order optimization algorithms. The experimental results and analysis prove that it is more effective in solving image question answering. To verify its generalization performance and promotion value, we conduct the image recognition experiment on the classic Cifar-10 for the image recognition task. The experimental results and the results of T test prove that it has good generalization performance and promotion value in the process of converging to the global optimal solution.

       

    /

    返回文章
    返回