ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (1): 80-93.doi: 10.7544/issn1000-1239.2017.20150492

Previous Articles     Next Articles

Big-Data Platform Based on Open Source Ecosystem

Lei Jun1,2, Ye Hangjun2, Wu Zesheng2, Zhang Peng2, Xie Long2, He Yanxiang1,3   

  1. 1(Computer School, Wuhan University, Wuhan 430072); 2(Xiaomi Inc, Beijing 100085); 3(State Key Laboratory of Software Engineering (Wuhan University), Wuhan 430072)
  • Online:2017-01-01

Abstract: As large-scale data collecting and processing are being widely studied in recent years, several released big data processing platforms are increasingly playing important roles in the operations of many Internet businesses. Open source ecosystems, the engine of big data innovation, have been evolving so rapidly that a number of them are successfully adopted as the components of mainstream data processing platforms. In reality, however, the open source software is still far from perfect while dealing with real large-scale data. On the basis of the industrial practice at Xiaomi Inc, this paper proposes an improved platform for collecting and processing large-scale data in face of varied business requirements. We focus on the problems in terms of the functionality, consistency and availability of the software when they are executed for data collecting, storing and processing procedures. In addition, we propose a series of optimizations aiming at load balance, failover, data compression and multi-dimensional scheduling to significantly improve the efficiency of the current system. All these designs and optimizations described in this paper have been practically implemented and deployed to support various Internet services provided by Xiaomi Inc.

Key words: Hadoop, open source ecosystem, big data, data center, network virtualization

CLC Number: