ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (8): 1707-1721.doi: 10.7544/issn1000-1239.2015.20150185

所属专题: 2015面向大数据的人工智能技术

• 综述 • 上一篇    下一篇

面向大数据分析的在线学习算法综述

李志杰,李元香,王峰,何国良,匡立   

  1. (软件工程国家重点实验室(武汉大学) 武汉 430072)(武汉大学计算机学院 武汉 430072)(lzj0019@whu.edu.cn)
  • 出版日期: 2015-08-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61070009,61103125);国家“八六三”高技术研究发展计划基金项目(2007AA01Z290)

Online Learning Algorithms for Big Data Analytics: A Survey

Li Zhijie,Li Yuanxiang,Wang Feng,He Guoliang,Kuang Li   

  1. (State Key Laboratory of Software Engineering (Wuhan University), Wuhan 430072) (Computer School, Wuhan University, Wuhan 430072)
  • Online: 2015-08-01

摘要: 大数据时代,越来越多的领域出现了对海量、高速数据进行实时处理的需求.如何对大数据流进行抽取转化成有用的信息并应用于各行各业变得越来越重要.传统的批量机器学习技术在大数据分析的应用中存在许多限制.在线学习技术采用流式计算模式,在内存中直接进行数据的实时计算,为流数据的学习提供了有利的工具.介绍了大数据分析的动机与背景,集中展示经典和最新的在线学习方法与算法,这种在线学习体系很有希望解决各种大数据挖掘任务面临的困难与挑战.主要技术内容包括3方面: 1) 线性模型在线学习;2) 基于核的非线性模型在线学习;3) 非传统的在线学习方法.各类方法尽量给出详细的模型和伪代码,讨论面向大数据分析的大规模机器学习研究与应用中的关键问题;给出大数据在线学习的3种典型应用场景,并探讨现今或将来在线学习领域进一步的研究方向.

关键词: 在线学习算法, 流数据, 大数据分析, 监督学习, 核, 多任务

Abstract: The advent of big data has been presenting a large array of applications that require real-time processing of massive data with high velocity. How to mine big data stream in a wide range of real-world applications becomes more and more important. Conventional batch machine learning techniques suffer from many limitations when being applied to big data analytics tasks. Online learning technique with stream computing mode is a promising tool for data stream learning. In this survey, we firstly introduce the motivation and background of big data analytics, and then focus on presenting the family of classical and latest online learning methods and algorithms, which are promising to tackle the emerging challenges of mining big data in a wide range of real-world applications. The main technical content of this survey consists of three parts: 1) online learning for linear model;2) kernel-based online learning for nonlinear model;3) non-traditional online learning methods. This is followed by a discussion about some key problems of large-scale machine learning for big data analytics applications. Finally, we present a few typical scenarios of online learning for big data stream and discuss possible directions for ongoing and future research in this area.

Key words: online learning algorithm, streaming data, big data analytics, supervised learning, kernel, multi-task

中图分类号: