利用混杂核模糊补互信息选择特征

袁钟; 陈红梅; 王志红; 李天瑞

doi:10.7544/issn1000-1239.202111272

利用混杂核模糊补互信息选择特征

Exploiting Hybrid Kernel-Based Fuzzy Complementary Mutual Information for Selecting Features

摘要

摘要: 模糊粗糙集理论目前在数据挖掘和机器学习等领域受到了广泛的关注. 该理论提供了一种能克服离散化问题的有效工具，并能直接应用于数值或混合属性数据. 在模糊粗糙集模型中，定义模糊关系来测量对象之间的相似性，数值属性值不再需要离散化. 模糊粗糙集理论已经被成功应用于许多领域，如属性约简、规则提取、聚类分析和离群点检测. 信息熵被引入到模糊粗糙集理论进行模糊和不确定信息的表示，产生了不同形式的模糊不确定性度量，如模糊信息熵、模糊补熵和模糊互信息等. 然而，大部分所提关于决策的模糊互信息都是非单调的，这可能导致一个不收敛的学习算法. 为此，基于混杂核模糊补熵，定义了关于决策的模糊补互信息，证明了其随特征呈单调性变化. 进而，利用混杂核模糊补互信息探索特征选择方法并且设计了相关的算法. 实验结果展示了在大多数情况下所提算法可以选取更少的特征且能保持或提高分类准确率.

Abstract: Fuzzy rough set theory is currently receiving a lot of attention in the fields of data mining and machine learning. The theory provides an effective tool to overcome the discretization problem and can be applied directly to numerical or mixed attribute data. In the fuzzy rough set model, fuzzy relations are defined to measure the similarity between objects and numerical attribute values no longer need to be discretized. The theory has been successfully applied to many fields such as attribute reduction, rule extraction, cluster analysis and outlier detection. Information entropy has been introduced into fuzzy rough set theory for the representation of fuzzy and uncertainty information, resulting in different forms of fuzzy uncertainty measures such as fuzzy information entropy, fuzzy complementary entropy, and fuzzy mutual information. However, most of the proposed fuzzy mutual information on decisions is non-monotonic, which may lead to a non-convergent learning algorithm. To this end, the fuzzy complementary mutual information on decisions is defined based on the hybrid kernel fuzzy complementary entropy, which is shown to vary monotonically with features. Then, the feature selection method is explored by using the hybrid kernel-based fuzzy complementary mutual information and a corresponding algorithm is designed. Experimental results show that the proposed algorithm can select fewer features and maintain or improve the classification accuracy in most cases.

HTML全文

参考文献(35)

施引文献

资源附件(0)