ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (2): 438-446.doi: 10.7544/issn1000-1239.2018.20160796

• 系统结构 • 上一篇    

基于代理的并行文件系统元数据优化与实现

易建亮1, 陈志广1, 肖侬1,2, 卢宇彤3   

  1. 1(国防科技大学计算机学院 长沙 410073); 2(高性能计算国家重点实验室(国防科技大学) 长沙 410073); 3(中山大学数据科学与计算机学院 广州 510275) (jianliang.yi@foxmail.com)
  • 出版日期: 2018-02-01
  • 基金资助: 
    国家重点研发计划(2016YFB1000302);国家自然科学基金项目(U1611261,61433019,61402503);广东省引进创新创业团队项目(2016ZT06D211)

Proxy Based Metadata Optimization and Implementation in Parallel Filesystem

Yi Jianliang1, Chen Zhiguang1, Xiao Nong1,2, Lu Yutong3   

  1. 1(College of Computer, National University of Defense Technology, Changsha 410073); 2(State Key Laboratory of High Performance Computing (National University of Defense Technology), Changsha 410073); 3(School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510275)
  • Online: 2018-02-01

摘要: 在高性能计算环境中,并行文件系统面临百万量级的客户端,这些客户端往往在同一时间段内发出大量并发I/O请求,使元数据服务器承载巨大的压力.另一方面,这些客户端发出的并发读写请求往往指向同一目录,导致很难将元数据负载调度到多个服务器上.为此,提出在并行文件系统的客户端和元数据服务器之间增加一级代理(proxy),并给出相应的优化措施降低元数据服务器的负载.在元数据代理上实现2方面的优化:1)由于高性能计算程序往往并发访问大量的文件,可以考虑通过元数据聚合将大量请求合并成1个请求发送到元数据服务器上,降低元数据服务器的负载;2)高性能计算程序的并发I/O往往指向同一目录,而传统的元数据负载均衡机制一般采用子树划分的方法将元数据负载调度到多个元数据服务器上,无法实现针对同一目录元数据操作的负载均衡,通过代理将针对同一目录的元数据操作调度到多个元数据服务器上,实现细粒度的负载均衡.

关键词: 代理服务器, 高并发, 高性能计算, 并行文件系统, 负载均衡

Abstract: In high-performance computing environment, parallel file system faces a mega client. These clients often issue a large number of concurrent IO request to the system in the same period of time, making the metadata server under a huge pressure. On the other hand, concurrent read and write requests from these clients often relate to the same directory. It makes it difficult to schedule work load across multiple servers. Therefore, we add a proxy server between the client and the metadata server and propose corresponding optimization methods to reduce the work load of the metadata server. In this paper, we realize two aspects of optimization based on proxy server. First of all, since the high-performance computing program often access files concurrently, we consider merging the numerous requests into a big one and then sent it to metadata server. Secondly, concurrent IO from the high-performance computing program often points to the same directory. Traditional metadata load balancing mechanism commonly use sub-tree partitioning method to dispatch work load across multiple server. This method is unable to realize load balancing in the situation where all operations relate to the same directory. The paper realizes fine-grained load balancing by scheduling the operations from the same directory to the plurality of metadata servers.

Key words: proxy server, high concurrency, high performance computing, parallel file system, load balancing

中图分类号: