基于相对相关属性子集的最优尺度组合选择
Optimal Scale Combination Selection Based on Relatively Relevant Attribute Subsets
-
摘要: 多尺度混合数据的知识获取是多粒度环境下数据建模的重要研究方向, 而最优尺度的选择是多尺度数据知识获取的一个关键步骤. 然而, 在计算最优尺度时, 大多数多尺度粒计算分析模型都只是基于所有属性, 并没有考虑所选择属性的顺序对所获最优尺度的效果, 这影响了模型的鲁棒性和有效性. 为此, 提出一种定向搜索基于相对相关属性子集的最优尺度组合的新方法, 用以处理多尺度混合数据中的最优尺度组合选择问题. 首先, 基于条件熵, 在一个广义多尺度混合决策系统(GMHDS)中给出条件属性关于决策的相对相关性的度量. 其次, 定义基于相对相关属性子集的正域最优尺度组合的概念, 以确保最优尺度组合在相关性较高的属性子集中搜索. 进一步, 在一个GMHDS中设计一种新的定向搜索最优尺度组合的逐步搜索算法, 在搜索过程中尽可能选择与决策相关性较高的条件属性所对应的尺度. 最后, 实验结果表明, 在大多数情况下, 该模型的结果比其他对比模型的结果具有更好的分类表现.Abstract: Knowledge acquisition of multi-scale hybrid data is an important research direction in data modeling under a multi-granularity environment, and the selection of an optimal scale is a key step in the knowledge acquisition of multi-scale data. However, in the calculation of an optimal scale, most of multi-scale granular computing analysis models are only based on all attributes without considering the influence of the order of the selected attributes on the performance of the obtained optimal scale, which affects the robustness and effectiveness of the models. Based on this observation, a new method to search an optimal scale combination based on relatively relevant attribute subsets is proposed to deal with the optimal scale selection in multi-scale hybrid data. Based on conditional entropies, quantitative measures of relatively relevance of conditional attributes with respect to the decision are first introduced in a generalized multi-scale hybrid decision system (GMHDS). The concept of positive region optimal scale combinations based on relatively relevant subsets of attributes is then defined to ensure that the optimal scale combinations are selected in highly relevant subsets of attributes. Furthermore, a new stepwise search algorithm for positive region optimal scale combinations with a directed order is designed in a GMHDS, which is used to select the scales corresponding to conditional attributes with high relevance to the decision during the search process. Finally, the experimental results demonstrate that, in most cases, the results of this model have better classification performances than those of other comparative models.