ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (2): 431-442.doi: 10.7544/issn1000-1239.2016.20148327

• 系统结构 • 上一篇    下一篇

一种基于文件支持度的动态副本管理机制

肖中正1,陈宁江1,贾炅昊1,张文博2   

  1. 1(广西大学计算机与电子信息学院 南宁 530004); 2(中国科学院软件研究所软件工程技术研究开发中心 北京 100190) (village.fm@hotmail.com)
  • 出版日期: 2016-02-01
  • 基金资助: 
    国家自然科学基金项目(61063012,61363003);国家科技支撑计划基金项目(2015BAH55F02);广西自然科学基金项目(2012GXNSFAA053222);广西高校优秀人才资助计划项目([2011] 40);广西科学研究与技术开发计划项目(桂科软13180015,桂科攻1348020-7)

A Dynamic Replica Management Mechanism Based on File Support Degree

Xiao Zhongzheng1, Chen Ningjiang1, Jia Jionghao1, Zhang Wenbo2   

  1. 1(School of Computer and Electronic Information, Guangxi University, Nanning 530004); 2(Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190)
  • Online: 2016-02-01

摘要: 在大规模分布式存储系统的容错技术中,数据副本管理是一种重要机制.针对网络环境中的动态副本管理需求,建立一种文件支持度指标及其动态计算模型.该模型通过周期性数据采集,利用文件支持度的自相关性,结合文件上一采集周期访问量、访问量占比、被访问数据量以及文件级别等参数,构建了能够较准确描述文件的动态副本需求状态模型.通过动态适应性的参数调整以适应变化的负载状态,使副本管理决策尽可能反映系统实际状态.在此基础上设计了数据结点负载均衡、副本调整、副本清理等相关算法,实现了动态副本管理的目标.通过实验验证了所设计的动态副本管理机制的有效性.

关键词: 分布式存储, 动态副本管理, 负载均衡, 文件支持度, 容错

Abstract: Replication-based management schema is an important fault tolerance mechanism in large scale distributed storage systems. In response to the demand of dynamic replication management in distributed storage systems, a file popularity index named file support degree and its computation model are proposed. Within this model, file’s parameters are periodically collected. By combination of self-correlation of file support degree, file hits in previous collection cycle, accessed data volume and file’s grade, a model that exactly reflects files’ replication requirement is built. To adapt to the variable system load, the model dynamically adjusts its parameters, making the replication decision-making to reflect real system status. Based on these work, some algorithms like load balancing, replication adjustment and replication clearing are designed. To avoid a single data storage node being overloaded, a data storage nodes’ load-balance strategy is proposed. In this strategy, data storage nodes are divided into 3 groups: a holding group, an acceptable group and a begging group. There are 2 periodic procedures in the system, including replication adjusting procedure and replication clearing procedure. In replication adjusting procedure, top P files are replicated to data storage nodes selected based on the load-balance strategy. Replication clearing procedure is a long-periodic procedure, because it needs many adjusting procedures to make the begging group be empty. This dynamic replication management mechanism is proven effective through the given experimentations.

Key words: distributed storage, dynamic replication management, load balancing, file support degree, fault tolerance

中图分类号: