高级检索
    张 翔, 邓赵红, 王士同, 蔡及时. 极大熵Relief特征加权[J]. 计算机研究与发展, 2011, 48(6): 1038-1048.
    引用本文: 张 翔, 邓赵红, 王士同, 蔡及时. 极大熵Relief特征加权[J]. 计算机研究与发展, 2011, 48(6): 1038-1048.
    Zhang Xiang, Deng Zhaohong, Wang Shitong, Choi Kupsze. Maximum Entropy Relief Feature Weighting[J]. Journal of Computer Research and Development, 2011, 48(6): 1038-1048.
    Citation: Zhang Xiang, Deng Zhaohong, Wang Shitong, Choi Kupsze. Maximum Entropy Relief Feature Weighting[J]. Journal of Computer Research and Development, 2011, 48(6): 1038-1048.

    极大熵Relief特征加权

    Maximum Entropy Relief Feature Weighting

    • 摘要: Relief特征加权的最新研究进展表明其可近似地表述为一个间距最大化优化问题.尽管该类算法广为应用,但仍然存在一些缺陷.为了提高Relief特征加权的适应性和鲁棒性,融合间距最大化和极大熵理论,并由此探讨了新的鲁棒的具有更好适应性的Relief特征加新方法.首先,构造了一个结合极大熵原理的间距最大化目标函数.对于该目标函数,运用优化理论得到一些重要的理论结果.在此基础上,对于两类数据、多类数据和在线数据,提出了一组鲁棒的Relief特征加权算法.利用UCI基准数据集和基因数据集进行了实验验证,结果表明提出的新Relief特征加权算法对噪音和例外点显示出了更好的适应性和鲁棒性.

       

      Abstract: A latest advance in Relief feature weighting techniques is that it can be approximately expressed as a margin maximization problem and therefore its distinctive properties can be investigated with the help of the optimization theory. Although Relief feature has been widely used, it lacks a mechanism to deal with outlier data and how to enhance the robustness and the adjustability of the algorithm in noisy environments is still not very obvious. In order to enhance Relief’s adjustability and robustness, by integrating maximum entropy technique into Relief feature weighting techniques, the more robust and adaptive Relief feature weighting new algorithms are investigated. First, a new margin-based objective function integrating maximum entropy is proposed within the optimization framework,where two maximum entropy terms are adopted to control the feature weights and sample force coefficients respectively. Then by applying optimization theory, some of useful theoretical results are derived from the proposed objective function and then a set of robust Relief feature weighting algorithms are developed for two-class data, multi-class data and online data. As demonstrated by extensive experiments in UCI benchmark datasets and gene expression datasets, the proposed new algorithms show the competitive performance to the state-of-the-art algorithms and much better robustness to datasets with noise and/or outliers.

       

    /

    返回文章
    返回