Abstract:
The advent of big data has been presenting a large array of applications that require real-time processing of massive data with high velocity. How to mine big data stream in a wide range of real-world applications becomes more and more important. Conventional batch machine learning techniques suffer from many limitations when being applied to big data analytics tasks. Online learning technique with stream computing mode is a promising tool for data stream learning. In this survey, we firstly introduce the motivation and background of big data analytics, and then focus on presenting the family of classical and latest online learning methods and algorithms, which are promising to tackle the emerging challenges of mining big data in a wide range of real-world applications. The main technical content of this survey consists of three parts: 1) online learning for linear model;2) kernel-based online learning for nonlinear model;3) non-traditional online learning methods. This is followed by a discussion about some key problems of large-scale machine learning for big data analytics applications. Finally, we present a few typical scenarios of online learning for big data stream and discuss possible directions for ongoing and future research in this area.