ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (2): 318-332.doi: 10.7544/issn1000-1239.2015.20140268

Special Issue: 2015大数据管理

Previous Articles     Next Articles

Distributed Stream Processing: A Survey

Cui Xingcan, Yu Xiaohui, Liu Yang, Lü Zhaoyang   

  1. (School of Computer Science and Technology, Shandong University, Jinan 250101)
  • Online:2015-02-01

Abstract: The rapid growth of computing and networking technologies, along with the increasingly richer ways of data acquisition, has brought forth a large array of applications that require real-time processing of massive data with high velocity. As the processing of such data often exceeds the capacity of existing technologies, there has appeared a class of approaches following the distributed stream processing paradigm. In this survey, we first review the application background of distributed stream processing and discuss how the technology has evolved to its current form. We then contrast it with other big data processing technologies to help the readers better understand the characteristics of distributed stream processing. We provide an in-depth discussion of the main issues involved in distributed stream processing, such as data models, system models, storage management, semantic guarantees, load control, and fault tolerance, pointing out the pros and cons of existing solutions. This is followed by a systematic comparison of several popular distributed stream processing platforms including S4, Storm, Spark Streaming, etc. Finally, we present a few typical applications of distributed stream processing and discuss possible directions for future research in this area.

Key words: big data, data stream, distributed stream processing, real-time processing, distributed system

CLC Number: