A Two Phases Unsupervised Sequential Forward Fractal Dimensionality Reduction Algorithm
-
Graphical Abstract
-
Abstract
Both the dimensionality and the amount of data that needs to be processed are increasing rapidly with the advances in data collection and storage capabilities. Accordingly, reducing the dimensionality of the attribute vectors to enhance the performance of the underlying techniques is a popular solution to tackle the infamous curse of dimensionality. The fractal dimension of one dataset keeps stable as the embedding dimension of the dataset varies and can act as the indicator to guide the process of the dimensionality reduction. Therefore, the authors choose the individual attribute fractal dimension and the difference of fractal dimension after the attribute merge operation as the criterion of attribute correlation and transform the dimensionality reduction problem into an optimization problem which tries to find the attribute subset with the maximal fractal dimension and the attribute number restriction simultaneously. In order to solve the optimization problem, a two phase unsupervised sequential forward fractal dimensionality reduction algorithm is proposed, which integrates the relevance analysis process and the redundancy analysis process based on the fractal dimension of the individual attribute and the attribute subset. The elementary time-space complexity of the algorithm is presented. The experimental results using synthetic and real life data set show that the algorithm gets the satisfactory subset with rather low workload of fractal dimension calculation.
-
-