

    Network Declustering BWRAID: Faster Scalability, Recovery and IO Performance

    • 摘要: 存储区域网(storage area network, SAN)是重要的网络存储方法.使用商用硬件BWRAID在SAN上实现了分布式RAID.初始版本的BWRAID使用全对称结构,然而其存在3个问题:1)扩展时要读取数据重新计算校验, IO负载高、扩展时间长;2)将数据集中恢复到单个存储节点,没有分布的并发恢复;3)数据布局不合理,导致内部RAID4有大量同步更新.为解决上述问题,提出了“网络分簇BWRAID”.新系统采用“分簇RAID”(declustering RAID)的非对称结构,分簇对象是相等大小的小虚拟盘而不是数据块;在扩展时,它在节点之间仅迁移虚拟卷,不需计算校验.由于一个恢复需要的节点数量小于节点总数,多个恢复就能并行.为优化IO使用新的数据布局,按内部RAID4条带组织用户的存储空间,并给出了搜索虚拟盘的算法,用于在系统分配、扩展、恢复时,搜索合适的虚拟盘.实验表明网络分簇BWRAID更好:在系统扩展时无需重新计算校验,加速扩展5~8倍;并行恢复成倍加速;新数据布局提高了IO性能.


      Abstract: Storage area network (SAN) is important for network storage. We construct BWRAID, a distributed RAID in a SAN, from commodity hardware. The original version BWRAID has a symmetric architecture. However, it has three problems: Firstly, it reads data to re-calculate parity when it expands, and the re-calculation consumes much IO and time. Secondly, it recovers data to one storage node (SN), and recovery can be more efficient if parallel on multi nodes. Thirdly, its data layout is bad for IO, that makes its internal RAID4 have many costly read-modify-write updates even on sequential writes. To solve these problems, we propose a network declustering BWRAID. It has an asymmetric architecture like a declustering RAID, but it is declustered by equal size virtual disks instead of blocks. It is expanded by moving virtual disks without calculation. It runs multi recovery in parallel as the number of nodes involved in each recovery is less than the total number of nodes in the system. To optimize IO, we change its data layout to express user IO space by internal RAID4 stripes. We also provide algorithms to search suitable virtual disks for system allocation, expansion, or recovery. Experiments show that network declustering BWRAID is better than the original one. It expands the system without calculating parity five to eight times faster, and its parallel recovery is multi times faster, and it increases the IO performance with the new data layout.


