一种基于群集的并行数据处理中间件

王念滨; 宋益波; 姚念民; 刘大昕

一种基于群集的并行数据处理中间件

A Parallel Data Processing Middleware Based on Clusters

摘要

摘要: HPDPM系统是基于无共享群集结构的支持并行数据处理的中间件.提出了中间件系统的体系结构和主要功能模块，详细论述了利用中间件系统实现并行数据处理的方法.阐述了实现数据放置、缓存管理等关键技术的策略和方法.给出了实验和现场测试结果.利用中间件系统，为用户提供统一的服务接口和管理平台，提高了系统性能，增强了系统的可用性和可维护性，保护了用户已有投资.系统目前在大型应用工程中得到实际应用，应用中涉及的数据规模达到TB级.

Abstract: Urgent performance requirements in large scale database applications have led to the use of parallelism for database processing. In this situation, providing a method to manipulate data paralleling can greatly improve the efficiency of data process. The HPDPM system is a middleware system applied in shared nothing cluster architecture to allow the database system to take advantage of the performance of parallel shared nothing systems. Presenting a new method to use parallel data processing middleware instead of parallel database system provides the ability for high performance computing. A framework is given for realizing parallel data manipulation. The primary modules of the middleware are described. Key techniques, used to improve system performance, including data placement and semantic caching, are discussed in detail. Then, the work principles and work steps of the middleware are presented. Implementation and experiments of this study show that this approach can improve system performance efficiently, enhance system availability and maintainability, and gain high performance price ratio. At present, the middleware system which uses data placement strategies and semantic caching have been applied to some large national engineering projects whose capacity of data is a little more than 1000 gigabytes.

HTML全文

参考文献(0)

施引文献

资源附件(0)