Abstract:
Researchers have proposed and implemented many distributed runtime systems to help users build distributed applications. These distributed runtime systems are usually only good at processing certain types of loads in specific scenarios. However, in the things-edge-cloud scenario, the components of things-edge-cloud collaborative applications have heterogeneous quality requirements, runtime environments, and heterogeneous communication protocols, making it difficult to use one runtime to build high-performance and robust things-edge-cloud collaborative applications. Deploying application components independently to different runtimes will increase the difficulty of application management and lack of unified performance and fault-tolerance support. The Grip system is proposed to address the problem. The Grip system supports the unified access and utilization of multiple runtimes by introducing a virtual runtime adapter layer and a virtual runtime API layer. These two virtual layers specify the interfaces that need to be implemented when accessing a runtime. The Grip system supports the unified management of multi-runtime applications through Griplet and Grip abstractions. It utilizes ownership methods to provide mechanisms for supporting user-defined fault tolerance and scaling policy. Experiments show that in the things-edge-cloud environment, compared with using a single runtime such as Ray, Docker, and Kubernetes, the Grip system reduces the average end-to-end latency by 31% to 77%, the 90
th percentile tail latency by 25% to 78%, and the 95
th percentile tail latency by 22% to 78%.