ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 January 2013, Volume 50 Issue 1
Energy-Efficient Algorithms for Distributed Storage System Based on Data Storage Structure Reconfiguration
Liao Bin, Yu Jiong, Sun Hua, and Nian Mei
2013, 50(1):  3-18. 
Asbtract ( 838 )   HTML ( 7)   PDF (3127KB) ( 1118 )  
Related Articles | Metrics
As an underlying core infrastructure and important component of cloud computing, distributed storage system is the foundation of all kinds of cloud services or applications. However, with the expanding of system scale and energy consumption factors being ignored by its designers, the problem of high energy consumption is exposed. Because of data availability requirements, we cannot simply use the existing energy-saving technologies to solve the distributed storage system’s high energy consumption problem. To ensure all data’s availability is the premise of designing energy-efficient algorithms for distributed storage system. In this paper, we create a system and data availability model. After studying the storage structure and mechanism, the relationship between server’s status and data’s availability, the method to solve the problem of full covering by constructing a data’s availability measurement matrix is proposed. The energy saving model for distributed storage system is defined. The RACK is divided into Active-Zone and Sleep-Zone distinct storage area. According to the different data access frequency and regularity, we calculate the different activity factors for different data which decide how many replicas are stored in Active-Zone. The servers in Sleep-Zone are turn to sleep for energy saving while the work load of data center is low. Experimental results show that, the energy-efficient algorithms in this paper adapt the access rule and the availability of all the data in system, improve the energy-efficient of distributed storage system, and the algorithm is more efficient when the system’s work load and average activity factor is low.
Energy-Efficient Replacement Schemes for Heterogeneous Drive
Yang Lianghuai, Zhou Jian, Gong Weihua, and Chen Lijun
2013, 50(1):  19-36. 
Asbtract ( 536 )   HTML ( 1)   PDF (2407KB) ( 507 )  
Related Articles | Metrics
Much attention has recently been put on the energy-saving scheme for heterogeneous drive(H-Drive) which combines SSD and HDD. This paper focuses on the energy-efficient file buffering schemes for H-Drive while ensuring disks lifespan. We propose a frequency-energy based replacement scheme (FEBR for short) by adapting previous replacement algorithm FBR with the help of an energy-cost model. And based on the sliding-window scheme, we also present a self-adaptive disk power management scheme by taking the disk lifespan into account, which adjusts timeout threshold according to the statistical behavior of user accesses. To explore the applicability of the existing replacement schemes ranging from page-based to file-based buffering scheme, we evaluate their effectiveness on energy-efficiency, performance, and HDD lifetime and compare them with our proposed scheme. With extensive experiments on four real-world file usage traces collected in our office, some useful conclusions are drawn: energy-saving in H-Drive is feasible, it can reach as high as 70%~80%; FBR and its variant FEBR, and GDS are the best ones among all those online buffering schemes evaluated while FEBR has some advantages over FBR and GDS; the proposed self-adaptive disk power management scheme can effectively control the disks lifetime and it is inappropriate to power disk on or off by using those fixed-timeout threshold scheme prevailed previously.
Hybrid S-RAID: An Energy-Efficient Data Layout for Sequential Data Storage
Liu Jingyu, Zheng Jun, Li Yuanzhang, Sun Zhizhuo, Wang Wenming, and Tan Yuan,
2013, 50(1):  37-48. 
Asbtract ( 658 )   HTML ( 1)   PDF (3183KB) ( 627 )  
Related Articles | Metrics
Power consumption of storage systems can not be ignored any more with the increment of the scale of storage systems, and the research in energy-saving of storage systems becomes more important. A data layout for storage systems in order to reduce power consumption is proposed in this paper: Hybrid S-RAID (HS-RAID). HS-RAID includes two parts: one is RAID 1 which is composed of two SSDs, and the other is S-RAID that is composed of a group of hard disks. Hard disks are grouped in S-RAID, and parallel in the group. Remapping the random I/O requests on the RAID 1, sequential data I/O write on the S-RAID by the group order. When only one group of hard disks are busy under continuous data access mode, the others can be shut down or put into standby mode for energy-saving because of no requests on them. That can save energy in HS-RAID and only increases the cost of the storage system slightly. HS-RAID is designed for applications whose I/O characteristics are sequential access. Experiment shows that the HS-RAID with 12 hard disks and 2 SSDs consumes 28% power of such a RAID 5 system.
Survey on Flash-Based Storage Systems
Lu Youyou and Shu Jiwu,
2013, 50(1):  49-59. 
Asbtract ( 1084 )   HTML ( 12)   PDF (2024KB) ( 1283 )  
Related Articles | Metrics
Flash memory has gained great popularity for its properties of low latency, high parallelism, energy efficient, and small volume. But the direct substitution of flash-based solid state drives for hard disk drives has hidden these properties from the operating system, preventing the operating system from optimizations. In this paper, comparison and analysis are made on the storage systems built on the raw flash, including flash accelerating cards, flash arrays, and flash-based clusters, which have achieved the goals of low latency, high reliability and energy efficiency through hardware interface change, software or controller module refinement, or computation and I/O capacities matching. Based on the comparison of these recent raw flash based storage systems, challenges and design issues are discussed on three aspects. Firstly, performance optimizations with I/O stack redesign are surveyed from the hardware interface, notification mechanisms, the system software refinement or redesign, and new storage interfaces for rich semantics. Secondly, the reliability approaches from the system level are described. Finally, the energy efficiency and the volume gains from the flash based storage are presented. After the discussion, the research works are summarized and the possible research directions are pointed out.
GC-RAIS: Garbage Collection Aware and Redundant Array of Independent SSDs
Wu Suzhen, Chen Xiaoxi, and Mao Bo
2013, 50(1):  60-68. 
Asbtract ( 648 )   HTML ( 1)   PDF (2284KB) ( 597 )  
Related Articles | Metrics
SSDs are popular in large-scale storage systems to accelerate the system performance because a single SSD cannot satisfy the performance, capacity and reliability requirements of data-intensive computing applications. Thus applying RAID algorithms to SSDs is necessary and promising to build high performance, high capacity and highly reliable SSD-based storage systems. However, garbage collection operations in SSDs have a significant impact on the SSD performance, thus leading to the performance variability in redundant array of independent SSDs (RAIS). To address this problem, GC-RAIS exploits the high random-read performance characteristics of SSDs and the hot-spare SSD in RAIS to alleviate the negative impact of GC operations on the RAIS performance. When an SSD is in the GC state, the incoming read requests to this SSD are serviced by reconstructing the read data from the other SSDs in the same stripe (read reconstruction), while the incoming write data is temporally stored on the hot-spare SSD and the corresponding parity is concurrently updated (write redirection). After the GC process completes, the redirected write data is reclaimed to its correct SSD. The original DiskSim and the MSR SSD simulator are extended to implement the proposed GC-RAIS and the GC-RAIS performance is evaluated with the HPC-like and enterprise realistic workloads. The simulation results show that GC-RAIS significantly outperforms the local garbage collection (LGC) and the global garbage collection (GGC) by 55% and 25% on average, respectively. Moreover, GC-RAIS reduces the performance variability for a wide variety of HPC-like and enterprise realistic workloads.
NVMMDS—Metadata Management Method Based on Non-Volatile Memory
Cai Tao, Niu Dejiao, Liu Yangkuan, Li Shuai, and Ju Shiguang
2013, 50(1):  69-79. 
Asbtract ( 614 )   HTML ( 1)   PDF (2975KB) ( 705 )  
Related Articles | Metrics
Metadata management methods are important factors to affect the performance of file system. A novel metadata management method based on non-volatile memory (NVMMDS) is designed to concern the efficiency and adaptability of current metadata management methods. The structure of NVMMDS and metadata management flow are presented according to the metadata accessing characteristic and management demands. The metadata is stored in non-volatile memory and DRAM is used to cache the update metadata, which provides the foundation for improving performance, avoiding metadata loss and enhancing the metadata operation atomicity. A metadata lookup algorithm based on NVBB-tree is used to uniformly search the metadata in non-volatile memory and metadata cache in DRAM with low overhead. A metadata cache algorithm based on active writing back is used to extend the life of non-volatile memory and avoid the loss of metadata. Compared with the current metadata management methods, our algorithms greatly improve the efficiency and adaptability of metadata management, and avoid the loss of metadata. The NVMMDS prototype is realized on ReiserFs and pNFS. The FileBench and several standard data sets are used to evaluate. The experimental results show that NVMMDS can improve maximum 35% IOPS and I/O performance of file system.
A High Performance Reliable Storage System Using HDDs as the Backup of SSDs
Chen Zhiguang, Xiao Nong, Liu Fang, and Du Yimo
2013, 50(1):  80-89. 
Asbtract ( 741 )   HTML ( 0)   PDF (1660KB) ( 624 )  
Related Articles | Metrics
SSD (solid state drive) has been widely deployed due to its high performance. But, its cost and reliability have not met the demand of large-scale storage systems. RAID (redundant arrays of inexpensive disks) is a conventional scheme to enhance reliability. However, existing RAID schemes are not effective for SSDs anymore. Instead, we propose a hybrid array consisting of SSDs and HDDs (hard disk drive): SSDs are used for responding to I/O requests, and HDDs supply backup for SSDs. The hybrid array guarantees both performance and reliability at a reasonable cost. However, HDDs lose to SSDs in term of both latency and bandwidth. Therefore, we employ a nonvolatile memory to bridge the latency gap, and take other two measures to improve HDD’s bandwidth. Firstly, workloads within an HDD are reconfigured to be more sequential. Secondly, multiple HDDs collaborate with each other to supply much higher aggregate bandwidth. By these ways, HDDs can replicate SSDs’ data in time. We implement a prototype to evaluate the proposed array. First of all, we demonstrate that the hybrid array is feasible. Then, we compare the hybrid array with other solutions. Experimental results show that the hybrid array covers both performance and reliability, and is also cost-effective.
A Reading Performance Improvement Method in Deduplication Based on Pipeline
Li Chao, Wang Shupeng, Yun Xiaochun, Zhou Xiaoyang, and Chen Ming,5
2013, 50(1):  90-100. 
Asbtract ( 607 )   HTML ( 0)   PDF (2835KB) ( 562 )  
Related Articles | Metrics
The application of data deduplication has been extended to the primary storage systems like cloud computing. In those systems,the reading performance has become a very important factor because of the high demand of reading response time. However, not so much attention has been paid to reading performance in the area of data deduplication. In this paper, we analyze the reading process and bottleneck in this area,and propose a reading model based on pipeline (RMBP). And we additionally improve this reading model using the mechanism of parallel calculation. Then we do theoretical analysis to evaluate its effect in the improvement of reading speed. Furthermore, we design a paralleled and pipelined data deduplication system based on this reading model. We also do experiments using three different kinds of data in this system. The experimental results show that: the system using RMBP can increase the reading speed with all kinds of the experimental data; for the network security logs and the virtual machine image data, the system using RMBP can get a 5 times higher reading throughput; RMBP can significantly improve the reading performance in scenarios of different data deduplicaiton ratio, and has good extensive applicability hence.
FAIDA: A Fast and Accurate Image Deduplication Approach
Chen Ming, Wang Shupeng, Yun Xiaochun, Wu Guangjun, and Li Chao
2013, 50(1):  101-110. 
Asbtract ( 922 )   HTML ( 3)   PDF (3206KB) ( 533 )  
Related Articles | Metrics
Deduplication is an effective way to improve storage utilization by eliminating redundant copies of duplicate data and replacing them with logical pointers to the unique copy. At present, it has been widely used in backup and archive systems. However, most of the existing deduplication systems use hashing to compute and compare data chunks to determine whether they are redundant. The Hash-based exact match is too strict for many applications, for example image deduplication. To solve this problem, a fast and accurate image deduplication approach is presented. We firstly give the definition of duplicate images according to the characteristics of Web images, and then divide image deduplication into two stages: duplicate image detection and duplicate image deduplication. In the first stage, we use perceptual hashing to improve image retrieval speed and multiple filters to improve image retrieval accuracy. In the second stage, we use fuzzy logic reasoning to select the proper centroid-images from duplicate image sets by simulating the process of human thinking. Experimental results demonstrate that the proposed approach not only has a fast and accurate ability to detect duplicate images, but also meets users’perceptive requirements in the selection of centroid-images.
MapReduce Intermediate Result Cache for Concurrent Data Stream Processing
Qi Kaiyuan, Han Yanbo, Zhao Zhuofeng, and Fang jun
2013, 50(1):  111-121. 
Asbtract ( 1302 )   HTML ( 1)   PDF (2072KB) ( 766 )  
Related Articles | Metrics
With the development of Internet of Things applications, real-time processing of sensor data stream over large scale history data brings a new challenge. The traditional MapReduce programming model is designed for batch-based large-scale data processing and cannot satisfy the real-time requirement. To extend the real-time data processing capability of MapReduce by preprocessing, pipelining and localizing, an immediate result cache for keyvalue data type, which can avoid repeated remote IO overhead and computation cost by taking full use of local memory and storage, localize stream processing by distributing data across the clusters and support frequent reads and writes of data stream processing, needs to be designed. This paper proposes a scalable, extensible and efficient keyvalue intermediate result cache, which consists of Hash B-tree structures and SSTable files. Furthermore, to optimize the high concurrency performance, this paper also devises a probability-based B-tree structure as well as its multiplexing search algorithm through the B-tree balance property, and improves the file readwrite strategy and replacement algorithm by utilization of the overhead estimation and buffered information. The theoretical analysis and benchmark experiments show that the proposed structures and algorithms further optimize the concurrency performance of MapReduce immediate results, and the immediate result cache is effective to support data stream processing over large-scale data.
A Node-Link Based Cache Deployment Algorithm for P2P Traffic in ISP Networks
Zhai Haibin, Jiang Hai, Sun Yi, Li Jun, and Li Zhongcheng
2013, 50(1):  122-135. 
Asbtract ( 704 )   HTML ( 0)   PDF (4298KB) ( 560 )  
Related Articles | Metrics
Peer-to-peer (P2P) systems are generating a large portion of the total Internet traffic and imposing a heavy burden on Internet services providers (ISPs). P2P caching is an effective means of easing this burden. We focus on the cache deployment problem as it has a significant impact on the effectiveness of caching. An ISP backbone network topology is usually abstracted to a graph comprised of nodes representing core routers and links connecting adjacent core routers. While deploying caches at nodes (node-based cache deployment, NCD) can reduce the amount of P2P traffic transmitted from lower access networks to the ISP backbone network, deploying caches on links (link-based cache deployment, LCD) can directly reduce the amount of P2P traffic on the ISP backbone network. It is difficult to conclude whether NCD or LCD is better, because the ISP cost is dynamically changing as the P2P traffic matrix changes. In this paper, we propose a node-link based cache deployment method (NLCD). Firstly, we propose an analysis model and a corresponding deployment algorithm for NLCD. Then, we prove this problem is NP-complete and develop an optimal cache deployment problem based on this model. Experimental results show that NLCD outperforms NCD and LCD for both Hub and Spoke (H&S) networks and Ladder networks. The average link utilization of NLCD is 5%~15% lower than that of LCD, and 7%~30% lower than that of NCD.
Survey of Secure Cloud Storage System and Key Technologies
Fu Yingxun, Luo Shengmei, and Shu Jiwu
2013, 50(1):  136-145. 
Asbtract ( 1471 )   HTML ( 14)   PDF (1438KB) ( 1845 )  
Related Articles | Metrics
With the rapid development of cloud storage, more and more people prefer to store their owner data in remote cloud storage to avoid troublesome data management in local storage systems. The most famous feature of cloud storage is the concept that storage as a service, users can store their own data into clouds by public APIs. However, because of losing absolute control of data, users storing their own data in cloud storage will suffer a series of security problems, such as data peeping, data tampering, and so on. In order to solute those security problems and improve the quality of secure cloud system based on enhance its security, researchers have investigated lots about cloud security problem in recent years, which established a research branch of the cloud storage- -secure cloud storage system. This paper introduces the security demand of secure cloud storage system; expounds the current status of cloud storage system; summarizes the key technologies of currently secure cloud storage systems, such as encryption key’s distribution and management, attribute-based encryption, searchable encryption, ciphertext-based data deduplication, provable data possession and proof of retrievability mechanism, data assured delete, etc. At the end of paper we discuss the future research directions of secure cloud storage system.
Big Data Management: Concepts,Techniques and Challenges
Meng Xiaofeng and Ci Xiang
2013, 50(1):  146-169. 
Asbtract ( 8825 )   HTML ( 363)   PDF (3405KB) ( 268114 )  
Related Articles | Metrics
Data type and amount in human society is growing in amazing speed which is caused by emerging new services such as cloud computing, internet of things and social network, the era of big data has come. Data has been fundamental resource from simple dealing object, and how to manage and utilize big data better has attracted much attention. Evolution or revolution on database research for big data is a problem. This paper discusses the concept of big data, and surveys its state of the art. The framework of big data is described and key techniques are studied. Finally some new challenges in the future are summarized.
A Utility Based Cache Optimization Mechanism for Multi-Thread Workloads
Tang Yixuan, Wu Junmin, Chen Guoliang, Sui Xiufeng, and Huang Jing
2013, 50(1):  170-180. 
Asbtract ( 899 )   HTML ( 0)   PDF (4915KB) ( 510 )  
Related Articles | Metrics
Modern multi-core processors usually employ shared level 2 cache to support fast data access among concurrent threads. However, under the pressure of high resource demand, the commonly used LRU policy may result in interferences among threads and degrades the overall performance. Partitioning the shared cache is a relatively flexible resource allocation method, but most previous partition approaches aimed at multi-programmed workloads and they ignored the difference between shared and private data access patterns of multi-threaded workloads, leading to the utility decrease of the shared data. Most traditional cache partitioning methods aim at single memory access pattern, and neglect the frequency and recency information of cachelines. In this paper, we study the access characteristics of private and shared data in multi-thread workloads, and propose a utility-based pseudo partition cache partitioning mechanism (UPP). UPP dynamically collects utility information of each thread and shared data, and takes the overall marginal utility as the metric of cache partitioning. Besides, UPP exploits both frequency and recency information of a workload simultaneously, in order to evict dead cachelines early and filter less reused blocks through dynamic insertion and promotion mechanism.
Fast and Efficient Prediction of L2 Cache Reliability
Cheng Yu, Ma Anguo, Wang Yongwen, Tang Yuxing, and Zhang Minxuan
2013, 50(1):  181-187. 
Asbtract ( 654 )   HTML ( 1)   PDF (1755KB) ( 565 )  
Related Articles | Metrics
With continuous technology scaling, microprocessors are becoming more and more susceptible to soft errors. Architectural vulnerability factor(AVF), which has been introduced to quantify the vulnerability of on-chip structures to soft errors, has demonstrated to exhibit significant runtime variations. While traditional fault tolerant techniques which take no account of the dynamic characteristics of AVF provide protection throughout the entire lifetime of programs, possibly leading to the over-protection and inducing significant costs. AVF prediction based dynamic fault tolerant techniques provide error protection only at the execution points with high AVF rather than the whole execution lifetime of programs, thereby maintaining the reliability goal with minimum cost. In this paper, we aim at developing an efficient online AVF predictor which can be used in dynamic fault tolerant management schemes for L2 Cache. We firstly improve the method of Cache AVF computation and characterize the dynamic vulnerability behavior of L2 Cache. Then based on the observations of the dynamic behavior of L2 Cache AVF, we propose to employ the Bayesian additive regression trees(BART) method to accurately model the variation of L2 Cache AVF and employ bump hunting technique to extract some simple selecting rules on several key performance metrics, thus enabling a fast and efficient prediction of L2 Cache AVF.
An Efficient Memory Access Strategy for Transposition and Block Operation in Image Processing
Shen Huanghui, Wang Zhensong, and Zheng Weimin
2013, 50(1):  188-196. 
Asbtract ( 735 )   HTML ( 0)   PDF (1721KB) ( 706 )  
Related Articles | Metrics
Image transposition and image block processing are common operations in image processing. The efficiencies of image transposition and image block processing, which are closely related to memory access efficiency, directly affect the real-time of image processing, while memory access efficiency has direct relationship with computer architecture, memory structure and operational strategy. A segmented storage strategy for efficient memory accessing is proposed in this paper. The segmented storage is based on the readwrite characteristics of memory, and the proper segmented length is calculated by theoretic analysis. When the image is transposed, the segmented length is related to the type of DDR2 SDRAM. When the image is partitioned, the segmented length is related to the size of image block. Combined with engineering application, universal mapping between bus address and memory address is deduced in this paper. Meanwhile, a hardware module is given to implement DDR2 SDRAM controller. It is only necessary to update address mapping module for different applications. So the segmented storage strategy has certain universality and expansibility, and is verified in the SAR real-time image process system.
Index of Meta-Data Set of the Similar Files for Inline De-Duplication in Distributed Storage Systems
Sun Jing, Yu Hongliang, and Zheng Weimin
2013, 50(1):  197-205. 
Asbtract ( 557 )   HTML ( 0)   PDF (1805KB) ( 808 )  
Related Articles | Metrics
Distributed storage systems have been widely adopted in the cloud storages and enterprise storage infrastructure, because of their high scalability and cost effectiveness. In the storage systems, data de-duplication can save most of storage space for the devices, and can improve the efficiency of data transmission. The key of de-duplicating in the distributed storage systems is how to implement a high performance and scalability meta-data index that should not hurt the writing throughput. This paper proposes an index of meta-data sets of the similar files. The index uses a locality sensitive Hashing function to organize meta-data set, and accesses the disk only one time for the lookups for the chunks of a file. Consequently, the index improves the indexing performance with high scalability and a small memory footprint, which is suitable for the cloud and enterprise storages.
Analysis of Inter-Thread Interference on Shared Cache Multi-Core Architectures Based on Instruction Fetch Timing Frame
Chen Fangyuan, Zhang Dongsong, Liu Cong, and Wang Zhiying
2013, 50(1):  206-217. 
Asbtract ( 535 )   HTML ( 0)   PDF (1622KB) ( 503 )  
Related Articles | Metrics
In real-time systems, designers need to obtain a safe and tight worst-case execution time (WCET) of applications. It is very challenging for multi-core processors due to the possible inter-thread interference caused by shared resources. For multi-core platforms with shared caches, instructions may be evicted by another co-running thread, which results in inter-thread interference in shared cache. In this case the execution time of applications can not be analyzed independently. The inter-thread interference should be taken into consideration in WCET analysis on multi-core platforms. This paper proposes a sufficient and unnecessary condition of non-interference to analyze the worst-case inter-thread interference by considering the timing frame of instruction fetch. Our approach introduces the consideration of timing frames into the conservative address analysis. Therefore, the worst-case shared cache misses can be reasonably estimated to determine the latency of inter-thread interference in shared cache. Our experiments indicate that the proposed approach improves the tightness of WCET estimation by 12% as compared with the representative related work which proposes an lifetime-based method, and by 7% as compared with another representative related work which is based on the structure order of shared cache accesses.
Constructing Direct Scalable Router with High Radix Router Node
Zhang Xiaoping, Duan Wuqing, Li Menghan, and Zhao Youjian
2013, 50(1):  218-224. 
Asbtract ( 735 )   HTML ( 0)   PDF (1821KB) ( 536 )  
Related Articles | Metrics
Scalable router is a very hot research spot for Internet core router, and the method which constructs scalable router with a direct network is a very important research direction in this area. However, most current researches concentrate on the method with low radix router node to realize the direct network. This method results in the direct network with high increasing diameter and low increasing bisection bandwidth. To solve this problem, we propose an idea of constructing direct network with high radix router node in scalable router. That is, on some packet latency condition, the radix of a router node in scalable router should be determined by the switch throughput of the router node. Then the direct network can get the lowest diameter and the highest bisection bandwidth, and router gets the highest switch throughput under this packet latency constraint. With high radix router node, the direct scalable router may get many different aspects on performance, such as switch throughput, packet latency, internal routing algorithm, load balancing, flow control and fault tolerance, etc. Here, we only study the switch throughput and packet latency, which are two very important factors on performance. We first give theory analysis on switch throughput and packet latency,comparing the change rule of these two factors. Then, we give the simulation results to verify the correctness of the theory analysis.