Abstract:
Fault management is a key research topic in the field of distributed applications management. Due to the dynamic and complexity of distributed applications, traditional methods cant meet the need of the fault management. Autonomic computing becomes a solution to solve the problem in order to realize systems self-management. Basically, self-management is divided into two procedures: self-awareness and self-adapting. This paper mainly deals with actualizing system self-awareness based on fault diagnosis. Firstly, a hybrid fault diagnosis model is proposed after analyzing the fault propagation in distributed application management. According to this model, the fault diagnosis process is divided into two steps: application service fault diagnosis and network service fault diagnosis. Secondly, because the observation of the network faults is uncertain and inaccurate, fault diagnosis model is mapped to Bayesian network to carry out uncertainty reasoning. Finally, due to the complexity of the exact inference algorithm in Bayesian network, some improvements are added to the original inference algorithm for diagnosing the root cause based on multi-layers Bayesian network corresponding to multi-layers FPM model. As experiments shown, the improved algorithm accelerates inferring procedure.