Abstract:
To make up for the gap between big data technologies and industry applications, this paper proposes the models of scalability, customizability and multi-type processing of big data appliance, based on which the in-cloud smart data appliance, i.e. iSDA, is designed and implemented. First, the iSDA is assembled by optimally developing the cooperative computing units, heterogeneous storage and high-speed switching network to take fully advantages of both scale-out and scale-up architectures. Second, iSDA is devised to satisfy diversity requirements of industry big data applications by virtue of hardware customization from light-weight to heavy-load styles, and as well as hybrid software stack including real-time, interaction, streaming and batch processing all accelerated by the in-memory computing engine. Furthermore, in the consideration of the HDFS metadata service bottleneck, MapReduce load skew and HBase cross-domain issue, this paper as well introduces the technologies of multiple metadata servers, load balance algorithm and cross-datacenter big table used in iSDA. The practical use cases in the telecommunication, finance and environmental protection industries show that the proposed architecture and key technologies are feasible and effective, and the comprehensive comparisons with traditional MPP databases and other mainstream Hadoop distributions are also given to detail the advantages of iSDA from both hardware and software aspects.