高级检索

    两阶段无监督顺序前向分形属性规约算法

    A Two Phases Unsupervised Sequential Forward Fractal Dimensionality Reduction Algorithm

    • 摘要: 采用单个属性多重分形维数及属性合并之后分形维数变化程度作为属性相关性的度量依据,以结果属性子集分形维数与属性全集分形维数的差值作为评价结果属性子集优劣的标准,将分形属性规约问题转化为属性个数受限的最大无关分形属性子集搜索问题.针对高维属性空间搜索的“组合爆炸”现象,设计了结合相关性分析与冗余性分析的两阶段顺序前向无监督分形属性规约算法.初步分析了算法的时空复杂性,基于标准与合成数据集的实验结果表明,算法能够以较低的分形维数计算工作量得到较优的属性子集.

       

      Abstract: Both the dimensionality and the amount of data that needs to be processed are increasing rapidly with the advances in data collection and storage capabilities. Accordingly, reducing the dimensionality of the attribute vectors to enhance the performance of the underlying techniques is a popular solution to tackle the infamous curse of dimensionality. The fractal dimension of one dataset keeps stable as the embedding dimension of the dataset varies and can act as the indicator to guide the process of the dimensionality reduction. Therefore, the authors choose the individual attribute fractal dimension and the difference of fractal dimension after the attribute merge operation as the criterion of attribute correlation and transform the dimensionality reduction problem into an optimization problem which tries to find the attribute subset with the maximal fractal dimension and the attribute number restriction simultaneously. In order to solve the optimization problem, a two phase unsupervised sequential forward fractal dimensionality reduction algorithm is proposed, which integrates the relevance analysis process and the redundancy analysis process based on the fractal dimension of the individual attribute and the attribute subset. The elementary time-space complexity of the algorithm is presented. The experimental results using synthetic and real life data set show that the algorithm gets the satisfactory subset with rather low workload of fractal dimension calculation.

       

    /

    返回文章
    返回