Abstract:
Distributed systems are the enabler of todays Internet services. These systems usually involve multiple tiers, a hierarchy of functional abstractions, and high degrees of concurrency. A user request is processed as a task in a series of stages across multiple machines, processors and threads. Verifying the behavior of such individual tasks therefore becomes a very challenging problem since developers have to reconstruct the task flow by linking together pieces of its execution throughout the system. Understanding system runtime behavior is a key to verify system design and debug system logic and performance problems. Existing works extract system runtime as causal paths, but they require either manual annotation or developer-provided execution structures. Manual annotation is tedious and prone to error when system design is complex and code base is evolving rapidly. Providing execution structures requires deep understanding of system design and this approach is limited to core designers or developers. In this paper, the authors try to infer hierarchical task models from system logs. They first extract task information from logs and infer the hierarchical relationships among tasks. Then the task hierarchies are combined into hierarchical task models. Experiences show that the inferred task models can help both understand system design and debug performance problems.