Abstract:
Manifold learning is developed to find the observed data's low-dimension embeddings in high dimensional data space. As a type of effective nonlinear dimension reduction method, it has been widely applied to the machine learning field, such as data mining and pattern recognition, etc. However, when processing a large scale data stream, the complexity of time is too high for many traditional manifold learning algorithms, including out of sample learning algorithm, incremental learning algorithm, online learning algorithm, and so on. This paper presents a novel self-adaptive learning algorithm based on incremental tangent space alignment (named SLITSA) for big data stream processing. SLITSA adopts the incremental PCA to construct the subspace incrementally, and can detect the intrinsic low dimensional manifold structure of data streams online or incrementally. In order to ensure the convergence of SLITSA and reduce the reconstruction error, it can also construct a new tangent space for adjustment during the iterative process. Experiments on artificial data sets and real data sets show that the classification accuracy and time efficiency of the proposed algorithm are better than other manifold learning algorithms, which can be extended to the application of streaming data and real-time big data analytics.