ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (11): 2545-2554.doi: 10.7544/issn1000-1239.2015.20148280

• 人工智能 • 上一篇    下一篇

面向大数据流的多任务加速在线学习算法

李志杰,李元香,王峰,匡立   

  1. (软件工程国家重点实验室(武汉大学) 武汉 430072) (lzj0019@163.com)
  • 出版日期: 2015-11-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61070009,61103125);国家“八六三”高技术研究发展计划基金项目(2007AA01Z290)

Accelerated Multi-Task Online Learning Algorithm for Big Data Stream

Li Zhijie, Li Yuanxiang, Wang Feng, Kuang Li   

  1. (State Key Laboratory of Software Engineering(Wuhan University), Wuhan 430072)
  • Online: 2015-11-01

摘要: 多任务在线学习框架采用直接数据处理的流式计算模式,是大数据流分析很有前途的一种工具.然而目前的多任务在线学习算法收敛率低,仅为O(1/〖KF(〗T〖KF)〗),T为算法迭代次数.提出一种新颖的多任务加速在线学习算法ADA-MTL(accelerated dual averaging method for multi-task learning),在保持多任务在线学习快捷计算优势的基础上,达到最优收敛率O(1/T\+2).对多任务权重学习矩阵W\-t的迭代闭式解表达式进行了推导,对提出算法的收敛性进行了详细的理论分析.实验表明,提出的多任务加速在线学习算法能够更好地保障大数据流处理的实时性和可伸缩性,有较广泛的实际应用价值.

关键词: 大数据流, 多任务, 加速, 在线学习, 收敛分析

Abstract: Conventional machine learning and data mining techniques with batch computing mode suffer from many limitations when being applied to big data stream analytics tasks. Multi-task online learning framework with stream computing mode is a promising tool for big data stream analysis. However, current multi-task online learning algorithm has low convergence rate, such as O(1/〖KF(〗T〖KF)〗) up to the T-th iteration, and its low convergence rate has become a bottleneck of online algorithm performance. In this paper, we propose a novel multi-task accelerated online learning algorithm, called ADA-MTL(accelerated dual averaging method for multi-task learning), which simultaneously obtains low computational time complexity and optimal convergence rate O(1/T\+2). The proof of a closed-form solution theorem which efficiently updates the weight matrix W\-t at each iteration is provided, and detailed theoretical analysis for the algorithm convergence rate is conducted. The experimental results on real-world datasets demonstrate the merits of the proposed multi-task accelerated online learning algorithm for large-scale dynamic data stream problems. Since this multi-task accelerated online learning algorithm can obviously improve the real-time performance and the scalability for big data stream analysis, it is a realistic method for big data stream analytics tasks.

Key words: big data stream, multi-task, accelerated, online learning, convergence analysis

中图分类号: