A Parallel Data Processing Middleware Based on Clusters
-
Graphical Abstract
-
Abstract
Urgent performance requirements in large scale database applications have led to the use of parallelism for database processing. In this situation, providing a method to manipulate data paralleling can greatly improve the efficiency of data process. The HPDPM system is a middleware system applied in shared nothing cluster architecture to allow the database system to take advantage of the performance of parallel shared nothing systems. Presenting a new method to use parallel data processing middleware instead of parallel database system provides the ability for high performance computing. A framework is given for realizing parallel data manipulation. The primary modules of the middleware are described. Key techniques, used to improve system performance, including data placement and semantic caching, are discussed in detail. Then, the work principles and work steps of the middleware are presented. Implementation and experiments of this study show that this approach can improve system performance efficiently, enhance system availability and maintainability, and gain high performance price ratio. At present, the middleware system which uses data placement strategies and semantic caching have been applied to some large national engineering projects whose capacity of data is a little more than 1000 gigabytes.
-
-