Abstract:
The era of big data brings new challenges in the field of data storage and management. With the dramatic increase of data volume, automatic data distribution has been one of the key techniques and intractable problem for distributed systems. Based on the studies on data, workload and node in this field, this work abstracts the problem of data distribution as a triangle model called DaWN (data, workload, node), and summarizes their relationships with each other as data fragmentation, data allocation and workload processing. According to DaWN, it proposes an automatic solution for data distribution in large-scale on-line transaction processing (OLTP) applications, and discusses the details and interactions of each module in this consolidation architecture. Combined with our existing research, it applies the optimal equilibrium conduct of Nash-Pareto strategy into practice. According to the results of a series of experiments, the proposedapproach shows nice overall performance and effectiveness. Meanwhile, this work also implements a prototype tool called ADDvisor for automatic data distribution supporting in the expect of smoothly promoting more research work into real world practice and effectively coordinating automatic data distribution in large scale OLTP distributed applications.