ISSN 1000-1239 CN 11-1777/TP

• 论文 • 上一篇    下一篇

面向分布式应用管理的混合故障诊断模型

李云春 秦先龙   

  1. (北京航空航天大学计算机学院 北京 100191) (lych@buaa.edu.cn)
  • 出版日期: 2010-03-15

A Hybrid Fault Diagnosis Model in Distributed Application Management

Li Yunchun and Qin Xianlong   

  1. (School of Computer Science and Engineering, Beihang University, Beijing 100191)
  • Online: 2010-03-15

摘要: 由于分布式应用的动态性、复杂性,传统的人工管理已经不能做到很好的故障管理,应用自主计算的思想实现管理成为一种解决问题的方法.研究基于故障诊断技术实现系统自感知.首先,根据对分布式应用故障管理的分析,提出一种混合故障诊断模型,将故障诊断的过程分为应用服务故障诊断和网络服务故障诊断2个阶段;其次,由于对网络故障症状的观察存在不确定和不准确的特点,将故障诊断模型映射到贝叶斯网络上进行不确定性推理;最后,重点研究了在多层FPM模型中进行推理的算法,给出一种基于变量消元算法的改进算法,实验证明改进算法可加速推理过程.

关键词: 分布式应用管理, 故障诊断, FPM模型, 自主计算, 贝叶斯网络

Abstract: Fault management is a key research topic in the field of distributed applications management. Due to the dynamic and complexity of distributed applications, traditional methods cant meet the need of the fault management. Autonomic computing becomes a solution to solve the problem in order to realize systems self-management. Basically, self-management is divided into two procedures: self-awareness and self-adapting. This paper mainly deals with actualizing system self-awareness based on fault diagnosis. Firstly, a hybrid fault diagnosis model is proposed after analyzing the fault propagation in distributed application management. According to this model, the fault diagnosis process is divided into two steps: application service fault diagnosis and network service fault diagnosis. Secondly, because the observation of the network faults is uncertain and inaccurate, fault diagnosis model is mapped to Bayesian network to carry out uncertainty reasoning. Finally, due to the complexity of the exact inference algorithm in Bayesian network, some improvements are added to the original inference algorithm for diagnosing the root cause based on multi-layers Bayesian network corresponding to multi-layers FPM model. As experiments shown, the improved algorithm accelerates inferring procedure.

Key words: distributed application management, fault diagnosis, FPM model, autonomic computing, Bayesian network