Abstract:
In data grid systems, data usage pattern plays an important role in system performance. According to some recent traces about real systems, data request and replica distribution exhibit clustering properties. Considered in this paper is the relationship between request distribution and replica distribution in data grid where request exhibits clustering properties. First the formal model of replication strategies in federated data grid system is given. The performance metrics include cumulative hit ratios and average access latency. Then investigated is what is the optimal way to replicate data with the objective of minimizing average access latency when request exhibits clustering properties. In the sense of minimizing average access latency, it is found that the more popular a file in a subgrid, the more replicas should be created in this subgrid; furthermore, when requests distribute uniformly in system, replicas should be uniformly distributed in system too. The optimization model is solved by means of Lagrange multiplier method and bisection method. Then, an optimization downloading replication strategy for clustering demands is obtained. The performance of this strategy is compared with that of uniform replication strategy, proportional replication strategy, square root replication strategy and LRU caching strategy through simulation. Simulation results validate the effectiveness of optimal strategy. Compared with these popular strategies, the optimal strategy has advantages of least wide area network bandwidth requirement and least average access latency.