数据密集型计算编程模型研究进展

王  鹏; 孟  丹; 詹剑锋; 涂碧波

数据密集型计算编程模型研究进展

Review of Programming Models for Data-Intensive Computing

摘要

摘要: 作为一种新兴的计算模式，云计算受到了学术界和产业界的广泛关注.云计算以互联网服务和应用为中心，服务提供者需要存储和分析海量数据.为了能够低成本高效率地处理Web量级数据，主要的互联网公司都在由商品化服务器组成的大规模集群系统上研发了分布式编程系统.编程模型可以降低开发人员在大规模集群上编程的难度，并让程序充分利用集群资源，但设计这样的编程模型面临巨大挑战.首先说明了数据密集型计算的特点，并指出了编程模型要解决的基本问题；接着深入介绍了国际上代表性的编程模型，并对这些编程模型的特点进行了比较和分析；最后对当前所面临的问题和今后的发展趋势进行了总结和展望.

Abstract: Advances in communication, computation, and storage have created large amounts of data. The ability to collect, organize, and analyze massive amounts of data could lead to breakthroughs in business, science, and society. As a new computing paradigm, cloud computing focuses on Internet service, and Internet service providers have an increasing need to store and analyze massive data sets. In order to perform Web-scale analysis in a cost-effective manner, recently several Internet companies have developed distributed programming systems on large-scale clusters composed of shared-nothing commodity servers, which we call cloud platform. It is a great challenge to design a programming model and system that enables developers to easily write reliable programs that can efficiently utilize cluster-wide resources and achieve maximum degree of parallelism on the cloud platform. Many challenging and exciting research problems arise when trying to scale up the systems and computations to handle terabyte-scale datasets. The recent advance in programming model for massive data processing is reviewed in this context. Firstly, the unique characteristics of data-intensive computing are presented. The fundamental issues of programming model for massive data processing are pointed out. Secondly, several state-of-the-art programming systems for data-intensive computing are described in detail. Thirdly, the pros and cons of the classic programming models are compared and discussed. Finally, the open issues and future work in this field are explored.

HTML全文

参考文献(0)

施引文献

资源附件(0)