ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 April 2008, Volume 45 Issue 4
Support Vector Machine Ensemble Based on Evidence Theory for Multi-Class Classification
Li Ye, Cai Yunze, Yin Rupo, and Xu Xiaoming
2008, 45(4):  571-578. 
Asbtract ( 534 )   HTML ( 0)   PDF (425KB) ( 546 )  
Related Articles | Metrics
Ensemble learning has become a main research topic in the field of machine learning recently. By training and combining some accurate and diverse classifiers, ensemble learning provides a novel approach for improving the generalization performance of classification systems. Studied in this paper are the architectures and methods for combination of multiple classifiers in support vector machine (SVM) ensemble for multi-class classification. After analyzing the defects of the known architectures including multi-class-level SVM ensemble and binary-class-level SVM ensemble, a two-layer architecture is proposed to construct SVM ensemble. Then fusion methods of the measurement-level output information of SVMs are studied based on the evidence theory. Different basic probability assignment functions are defined respectively in terms of the used strategy for multi-class extension, i.e. one-against-all and one-against-one, and different evidence combination rules are adopted according to the degree of conflicts among evidence. In the case of one-against-all strategy, the classical Dempster's rule can be used while in the case of one-against-one strategy a new rule is proposed to combine the heavily conflicting evidence. The experimental results show that the two-layer architecture is better than the multi-class-level architecture. Moreover, the evidence theory based methods can effectively utilize the measurement-level output information of binary SVMs so as to gain satisfactory classification accuracies.
A Comprehensive Computational Model of Emotions
Yang Hongwei, Pan Zhigeng, and Liu Gengdai
2008, 45(4):  579-587. 
Asbtract ( 476 )   HTML ( 1)   PDF (468KB) ( 656 )  
Related Articles | Metrics
Emotion is an important aspect of human intelligence and is shown to play a significant role in creating virtual agent. Therefore, how to use emotion to enhance the intelligence and believability of virtual agent is an urgent issue to solve. Taken into account not only cognitive reasoning for generating emotion, but also physical interactions with its environment, a comprehensive computing emotional model is proposed. A complete architecture of emotion process is presented to describe the dynamic characteristics of emotional state and how to filter the mixture of emotions to generate coherent emotional state. The emotion structure with cause events is added to determine the specific effects the emotion will have on the decision making and behavior of the agent. In addition, a social learning component for predicting other agent's behaviours and relationships among agents is presented to improve virtual agents with the ability of adaptability in the dynamic environment. It is demonstrated experimentally through a computer simulation of virtual agents that the proposed emotional model can enhance the virtual agent's believability effectively.
A Fast Classification Strategy for Large Class Sets
Xu Lei, Xiao Baihua, Dai Ruwei, and Wang Chunheng
2008, 45(4):  588-595. 
Asbtract ( 464 )   HTML ( 0)   PDF (403KB) ( 515 )  
Related Articles | Metrics
Proposed in this paper is a fast multi-stage classification strategy for large class sets, such as handwriting Chinese character recognition. The key issue for multi-stage classification is how to select the candidate subset for fine classification, so a group-based candidate selection rule is provided. The whole class set is first divided into several groups by clustering algorithms. Then, the nearest neighbors of each class are added to the same group and consequently the adjacent groups overlap each other. As a result, for any unlabeled sample, its confusing classes will be totally included in its nearest group. Under this circumstance, the nearest group of any unlabeled sample can be taken as the candidate set. Because the number of the groups is definite and the average group size is rather small, it is feasible to design a special fine classifier for every group using all kinds of complementary features and classifiers. A hierarchical learning vector quantization is also utilized to optimize the global prototypes, local prototypes and group centroids. Furthermore, the risk-zone criterion is introduced to improve the hit rate of the samples which are located near the group boundaries. Experimental results on a handwriting Chinese character database show that the proposed method can reach a reasonable tradeoff between efficiency and accuracy.
A Study on Constraints for Feature Selection in Text Categorization
Xu Yan, Li Jintao, Wang Bin, Sun Chunming, and Zhang Sen
2008, 45(4):  596-602. 
Asbtract ( 629 )   HTML ( 0)   PDF (404KB) ( 604 )  
Related Articles | Metrics
Text categorization (TC) is the process of grouping texts into one or more predefined categories based on their content. Due to the increased availability of documents in digital form and the rapid growth of online information, TC has become a key technique for handling and organizing text data. One of the most important issues in TC is feature selection (FS). Many FS methods have been put forward and widely used in the TC field, such as information gain (IG), document frequency thresholding (DF) and mutual information. Empirical studies show that some of these (e.g. IG, DF) produce better categorization performance than others (e.g. MI). A basic research question is why these FS methods cause different performance. Many existing works seek to answer this question based on empirical studies. In this paper, a theoretical performance evaluation function for FS methods is put forward in text categorization, Some basic desirable constraints that any reasonable FS function should satisfy are defind and then these constraints on some popular FS methods are checked, including IG, DF and MI. It is found that IG satisfies these constraints, and that there are strong statistical correlations between DF and the constraints, whilst MI does not satisfy the constraints. Experimental results on Reuters 21578 and OHSUMED corpora show that the empirical performance of a feature selection method is tightly related to how well it satisfies these constraints.
Dynamic Multi-Objective Optimization Evolutionary Algorithm Based on New Model
Liu Chun'an, and Wang Yuping
2008, 45(4):  603-611. 
Asbtract ( 740 )   HTML ( 1)   PDF (566KB) ( 573 )  
Related Articles | Metrics
Dynamic multi-objective optimization problems (DMOPs) often involve incommensurable, competing and varying objectives with time, and the number of their optimal solutions is usually infinite. Thus how to find a sufficient number of uniformly distributed and representative Pareto optimal solutions under the condition of the continuously changing time for the decision maker is very important. In this paper, the continuously changing time period of DMOPs is divided into several random subperiods. In each subperiod, the dynamic multi-objective optimization problem is approximated by a static multi-objective optimization problem. At the same time, the static rank variance and the static density variance of the population is defined in each subperiod. Then, by using the static rank variance and the static density variance of the population, the dynamic multi-objective optimization problem with random objective functions is transformed into a bi-objective static optimization problem. A new dynamic multi-objective optimization evolutionary algorithm is proposed based on a new self-check operator which can automatically check out the time variation and its convergence is proved. The simulations are made and the results demonstrate the effectiveness of the proposed algorithm.
Relative Transformation for Machine Learning
Wen Guihua
2008, 45(4):  612-618. 
Asbtract ( 592 )   HTML ( 0)   PDF (386KB) ( 566 )  
Related Articles | Metrics
Recently developed machine learning approaches such as manifold learning and the support vector machine learning work well on the clean data sets even if these data sets are highly folded, twisted, or curved. However, they are much sensitive to noises or outliers contained in the data set, as these noises or outliers easily distort the real topological structure of the underlying data manifold. To solve the problem, the relative transformation on the original data space is proposed by modeling the cognitive relative laws. It is proved that the relative transformation is a kind of nonlinear enlarging transformation so that it makes the transformed data more distinguishable. Meanwhile, the relative transformation can weaken the influence of noise on data and make data relative denser. To measure the similarity and distance between data points in relative space is more consistent with the intuition of people, which can be then applied to improve the machine learning approach. The relative transformation is simple, general and easy to implement. It also has clear physical meaning and does not add any parameter. The theoretical analysis and conducted experiments validate the proposed approach.
Fuzzy Description Logic L-ALCN
Li Shuying, Li Mei, Jiang Yuncheng, Wang Ju, and Liu Zhenhuan
2008, 45(4):  619-625. 
Asbtract ( 601 )   HTML ( 0)   PDF (483KB) ( 563 )  
Related Articles | Metrics
Description logics are a logical reconstruction of the frame-based knowledge representation languages, with the aim of providing a simple well-established declarative semantics to capture the meaning of structured representation of knowledge. In order to make description logics deal with more general fuzzy information, U. Straccia has presented a fuzzy description logic L-ALC based on a certainty lattice. Based on the work of U. Straccia, a new description logic L-ALCN with the constructor of unqualified number restriction is proposed in this paper, and the syntax of L-ALCN and the semantics of the two concept (≥n R) and (≤n R) are defined. In classical description logics, when the constructor of unqualified number restriction is introduced, the role R has more than one successor. Because the description logic based on certainty lattice L-ALCN contains the constructor of unqualified number restriction, so the value of the assertion may have more than one. In order to ensure to gain the reasonable reasoning algorithm and its feasible complexity, a special set D\-L(c) is introduced. Two properties of certainty lattice are obtained by taking advantage of the set D\-L(c). After these preparations, the constraint propagation calculus for the system is studied, and soundness and completeness of the constraint propagation calculus are proved. Compared with L-ALC, the system L-ALCN is more expressive and it can be proved that reasoning tasks of the system L-ALCN are Pspace-complete.
Wireless Sensor Network Energy-Efficient Placement Algorithm Based on Vector
Wang Dashan, Huang Liusheng, Xu Hongli, Wu Junmin, and Zhang Junxia
2008, 45(4):  626-635. 
Asbtract ( 533 )   HTML ( 0)   PDF (532KB) ( 628 )  
Related Articles | Metrics
The lifetime of a wireless sensor network (WSN) directly lies in the power consumption of the network. Thus one of the challenging problems is how to place the sensor nodes efficiently to prolong the lifetime of the sensor network as long as possible. First analyzed in this paper is the optimal placement for single source under the linear network model. Based on the analyses, an energy-efficient deployment algorithm under the linear network model is designed, then it is extended to adapt to the planar network model. Based on the above work, a novel vector-based placement algorithm is proposed. In this algorithm, by calculating the positions of the relay nodes to gather the data from the sources, the energy-efficient placement in the whole wireless sensor network is obtained, which contains fixed sensing nodes and given number of relay nodes. The results of simulation experiment illustrate that the network deployed by the algorithm can save about 50% energy of that deployed by regular algorithm when the ratio of source and relay number is 1∶2. Now this algorithm is used in the fire monitoring system implemented in the laboratory. In the practical application systems, the numbers of nodes are often limited because of the cost, so this algorithm is of significant importance to construct low cost application system of wireless sensor networks.
MAWA: An Efficient and Reliable Web Access Scheme in Mobile Wireless Environments Based on Mobile Agents
Hu Haiyang, Hu Hua, and Ling Yun
2008, 45(4):  636-645. 
Asbtract ( 521 )   HTML ( 1)   PDF (602KB) ( 402 )  
Related Articles | Metrics
Web access in mobile wireless environments suffers more challenge because of mobile device's limited storage, the mobility of mobile device, and the unreliable wireless network environments. It requires some facilities between the wired network and the terminals to help overcome the difficulties of unreliable wireless link, mobility and mobile device capability limitation. Presented in this paper is a new Web access scheme named MAWA, which uses mobile agents to facilitate seamless logging of mobile unit's access activities, thus avoiding unnecessary wireless communication cost. There are three classes of agents in MAWA: Terminal agent, base station agent, and MU agent. Terminal agent runs in the mobile unit, and requests for Web pages from browsers performed by users are transferred to this agent. Base station agent resides on base station with responsibility for creating, storing, sending and receiving an instance of MuAg according to the request from MU or other BsAg. It also maintains the local MuAg queue, and transfers the message or data between MU and its MuAg respectively. MU agent is responsible for a certain MU. This scheme can also help MU handoff among base stations and recover when mobile client suffers a failure. Compared with other schemes by simulation tests, this scheme shows its efficiency and reliability.
Network-Constrained Moving Objects Database Based Traffic Flow Statistical Analysis Model
Ding Zhiming, Han Jingyu, Li Man, and Yu Bo
2008, 45(4):  646-655. 
Asbtract ( 584 )   HTML ( 0)   PDF (583KB) ( 582 )  
Related Articles | Metrics
With the advancement and integration of mobile computing technology and intelligent transportation systems (ITS), automatic traffic flow analysis has become an important research issue. However, current traffic flow statistical analysis methods, such as stationary sensor/camera based methods (monitoring from traffic sensors or optical devices), air/space borne methods (monitoring from airplanes or satellites), and floating-car based methods (monitoring from floating/probe cars), have a lot of limitations in terms of data sampling costs, data processing efficiency, and information analysis accuracy. To solve these problems, a new traffic flow analysis framework, network-constrained moving objects database based traffic flow statistical analysis (NMOD-TFSA) model is proposed, in this paper. Through an online statistical analysis mechanism based on the spatial-temporal trajectory data submitted by moving objects, NMOD-TFSA can get the real-time traffic parameters of the transportation network. By taking the topology of the traffic network into consideration, NMOD-TFSA can reduce the communication and computation costs in data sampling. Besides, in data analysis, NMOD-TFSA can provide better accuracy since the analysis is based on the precise spatial-temporal trajectories of moving objects. The experimental results show that compared with the floating-vehicle method, which is widely used in current traffic flow analysis systems, NMOD-TFSA provides improved performances in terms of communication and computation costs, flexibility, and accuracy.
A Cooperative Caching Optimized Policy of Mobile Peer-to-Peer Networks
Niu Xinzheng, She Kun, Qin Ke, and Zhou Mingtian
2008, 45(4):  656-665. 
Asbtract ( 667 )   HTML ( 0)   PDF (541KB) ( 485 )  
Related Articles | Metrics
Mobile P2P network is topology dynamic, mobile nodes resource limited, and network resource short. Considering the limitation of network resources and shortage of peer resources, how to make full use of mobile nodes' cooperative cache resources effectively and increase mobile nodes' cooperation is a challenging task to reduce network delay, save bandwidth and prevent congestion in mobile P2P networks. To solve this problem, the limited cooperative cache replacement policy establishment, the sufficient cooperative cache resource use and the key data buffering are discussed in detail in this paper. Moreover, according to the idea of ant colony algorithm, a cooperative cache replacement policy based on the pheromone is proposed, and a mathematical model and the derivation of choosing appropriate memory space as cooperative cache are established. The theory analysis and experiment results of this paper show that the new policy can significantly improve cooperative cache resource utility and improve network performance efficiently. It also can promote the mobile nodes' cooperation and success ratio of resource obtainment from neighbor nodes and reduce the average response delay of cooperative cache resource request.
Separation of Duty in Privileged Operating Systems
Cai Jiayong,5, Qing Sihan,, and Liu Wei,5
2008, 45(4):  666-676. 
Asbtract ( 477 )   HTML ( 0)   PDF (593KB) ( 537 )  
Related Articles | Metrics
In operating systems, privilege is used to control the most important resources and functions, so administrators must enforce separation of duty (SoD) to ensure privilege safety. In this paper, how privilege would support SoD is studied by analyzing the issue of implicit authorization. The source of privilege is first discussed, and the definition of privilege is decomposed into restriction rules and execution rules. The execution rules explain the effects of privilege precisely, which are ignored by most access control models. Then by logically deducing rules, authorization is further deduced, which indicates that there is implicit authorization in privilege mechanisms. Implicit authorization may cause violation of SoD constraints, so all implicit authorizations are displayed in an authorization deduction graph. By exploring the properties and the mechanism requirements of SoD, the consistency between SoD constraints and the privilege mechanism can be ensured. Finally, the POSIX capability mechanism is taken as an example, and formalized into the BMPS model. Its deficiencies in supporting SoD are found and corrected, and a feasible security policy consistent with the SoD requirements is provided.
Violation of Static Mutual Exclusive Role Constraints in Dynamic Role Transition
Zhai Zhengde, Xu Zhen, and Feng Dengguo
2008, 45(4):  677-683. 
Asbtract ( 542 )   HTML ( 0)   PDF (408KB) ( 510 )  
Related Articles | Metrics
Secure interoperation is a crucial technique for cross-domain resource sharing and protection. In the IRBAC2000 model, Kapadia proposes role association and dynamic role translation, through which secure interoperation can be accomplished in a very flexible way. The fact that the model can cause violations of static mutual exclusive role (SMER) constraints is firstly discovered by Liao Junguo et al, the reason for which is also analyzed. A detection algorithm for SMER violations and prerequisite conditions for adding new role associations are also presented. In the paper, it is firstly made clear that Liao's assertion about the reason of constraint violations is only partial and thus violations can not be totally prohibited. It is also clarified that under the circumstance of given role associations the inappropriate user/role assignments in other domains are the real reason behind. Then the necessary and sufficient condition for SMER violation is proposed and a corresponding algorithm for violation detection is presented. Because both new role associations and new user/role assignments can cause SMER violation, prerequisite conditions for adding them are subsequently proposed, which can ensure that the SMER constraints are always enforced during the state transitions of the model.
MUIS: A New System of Multi-Domain Union of Information Sharing and Its Applications
Lin Leyu,, Liu Lei, Wang Shi, Zheng Yufei, Feng Qiangze, and Cao Cungen
2008, 45(4):  684-694. 
Asbtract ( 691 )   HTML ( 0)   PDF (830KB) ( 499 )  
Related Articles | Metrics
With the rapid development of network technology and the arrival of information period, information sharing has become a more and more important issue today in all enterprises. However, the structural heterogeneity and the geographical distribution of information make this problem more difficult and complex. In addition, the efficiency of query information from the Internet is too poor to satisfy the requirement. In this paper, a model of MUIS (multi-domain union of information sharing) is proposed to deal with this problem, which could be propitious to the information sharing and information retrieval across the multiple domains. The model can efficiently integrate and share heterogeneous databases geographically distributed and resided in different environments with the multi-agent cooperation. After summarizing the model of single-domain union of information sharing, the overall structure and major components of the model of MUIS, including standardization of heterogeneous databases, URI remote access, cross-domain query, inter-site query, and natural language interface are presented. The detailed definition of several kinds of agents and the multi-agent cooperation in this model are also presented. This system is implemented and applied in several areas such as agricultural information sharing and yellow pages sharing, and currently each MUIS system contains some databases from several websites.
DAG Scheduling for Synchronous Communication in the Network Computing Environment
Zhao Mingyu and Zhang Tianwen
2008, 45(4):  695-705. 
Asbtract ( 763 )   HTML ( 0)   PDF (774KB) ( 636 )  
Related Articles | Metrics
Although the advance in high-speed networks is rapid, parallel computing on the network computing environments is not so reliable as that performed on parallel machines. Therefore, many applications must rely on synchronous communication in the form of application-level handshaking. In general, application-level handshaking may enhance the reliability of executing parallel programs on the network computing environments, but also simplify the error handling for communication, maintaining easy programability. However, synchronization introduces a waiting time for the sender and an overhead to the already high cost of message-passing. Furthermore, synchronization may result in deadlock among communicating tasks. Most heuristics for scheduling address asynchronous communication DAG is proposed, but they are not suitable for the synchronous ones. Proposed in this paper is a novel algorithm for the synchronous scheduling problem, which is called parameters relation graph-based synchronous clustering (PRGSC). The PRGSC algorithm introduces a parameter relation graph (PRG), which represents relations of timing parameters of tasks and dynamically varies in the scheduling process, for the deadlock detection. In every scheduling step, PRGSC algorithm avoids the deadlock caused by the synchronous communication through examination of the PRG. Meanwhile, it could alleviate the impact of synchronous communicating delay on the makespan by simultaneously considering the ready tasks and scheduled tasks. Simulation shows that the PRGSC algorithm has better performance than the CASC algorithm which deals with the same type of problems.
Real-Time Dynamic Scheduling Algorithms for the Savings of Power Consumption and Fault Tolerance in Multi-Processor Computing Environment
Han Jianjun, Gan Lu, Ruan Youlin, Li Qinghua, and Abbas A.Essa
2008, 45(4):  706-715. 
Asbtract ( 538 )   HTML ( 0)   PDF (549KB) ( 480 )  
Related Articles | Metrics
Saving energy consumption of modern processors has recently become popular due to the fact that high power consumption increases heat dissipation, which leads to decreased reliability of systems. This paper aims to generate a task schedule that can fully exploit the dynamic scheduling support (as well as the underline voltage/frequency scaling capability) of the target machines such that the real-time constraints can be met with as minimum as possible energy consumption. The algorithm STFBA (shorted task first based algorithm) proposed utilizes the policy of shortest-task-first and other efficient techniques, such as shared slack reclamation, to save energy in homogeneous systems for independent task set and task set with relationship constraints, respectively. STFBA comprises static part and dynamic part to lower time complexity of algorithm. Furthermore, a dynamic fault-tolerance algorithm based on STFBA is proposed, which is combined with the policy of shortest-task-first and checkpoint, to tolerate transient faults during the executions of tasks, while meeting the requirement of timing constraints and reducing energy consumption efficiently. The feasibility condition of checkpoint placement and lower bound of static processing speed is given to reduce the fault-tolerance scheduling costs. Compared with the efficient algorithms presented so far, simulation results indicate that proposed algorithms have much better scheduling performance trade-off in terms of makespan and energy savings when the slack generated by tasks is sufficient.
An Improved Hestenes SVD Method and Its Parallel Computing and Application in Parallel Robot
Zhang Shihui, Kong Lingfu, and Feng Liang
2008, 45(4):  716-724. 
Asbtract ( 638 )   HTML ( 0)   PDF (506KB) ( 439 )  
Related Articles | Metrics
Singular value decomposition (SVD) of matrix is an important and familiar problem in maths science and engineering. Among many SVD methods, Hestenes method is widely used as it is suiting for parallel processing in particular. An improved Hestenes SVD method is proposed in this paper, which notably reduces the sweep numbers and orthogonalization numbers during the matrix singular value decomposition. It also facilitates and quickens the process of computing (generalized) inverse matrix. In addition, two kinds of parallel algorithms are studied for the improved Hestenes SVD method based on row and column division respectively, and then their performance and efficiency are analyzed. Influence coefficient plays an important role in the analysis of parallel robot's kinematics and dynamics, and the second-order influence coefficient matrix shouldn't be ignored especially in the condition of high speed. Aiming at the characteristics of parallel robot's increasing computing requirement, the experiments about the improved Hestenes SVD method and its parallel algorithm are done by computing the first-order and second-order influence coefficient matrix of 6-DOF parallel robot. Experiment results show that the proposed method can improve computing efficiency greatly, and be beneficial to parallel robot's kinematics, dynamics performance analysis and real time control based on lots of influence coefficient matrix computing. The proposed method also suits for many other engineering fields with similar matrix processing.
A Novel Discovery Method of Grid Resource Information Based on Peer-to-Peer System
Wang Yi, Zhang Yusen, and Liu Peng
2008, 45(4):  725-733. 
Asbtract ( 469 )   HTML ( 1)   PDF (469KB) ( 429 )  
Related Articles | Metrics
In grid environment, the discovery and query of correct resource in a great variety of resources is a key problem in grid computing. In this paper, based on structured peer-to-peer (P2P) system which can support indexing the data in the proper order, a novel method for discovery of grid resources is put forward. This method introduces an advanced database multiple dimensions indexing technology which is called pyramid to the P2P system. Multiple dimensions range query is supported perfectly by the P2P system by combining the database multiple dimensions indexing technology with the P2P system. This algorithm adopts symmetric structure pyramid technology of database domain which reduces the maintenance cost of grid resource management caused by the change of dynamic property. Through theory proving, on the premise that the number of dimension is big enough, the maintenance cost resulting from the dynamic property is inversely proportional to the number of dimension. This is indepent of the change size. At the same time, the strategy of load balance is taken into account to resolve the imbalance of peer-to-peer's node in this paper. Finally, the validity of range query and performance of routing to find the correct data is validated by simulation.
Transistor-Level Methodology on Power Optimization for CMOS Circuits
Luo Zuying and Pan Yuedou
2008, 45(4):  734-740. 
Asbtract ( 477 )   HTML ( 0)   PDF (379KB) ( 620 )  
Related Articles | Metrics
With IC technology scaling into nanometer regime, power consumption has become an equal important design constraint as performance. Owing to the shortage of efficient transistor-level delay simulator, previous low-power techniques have to optimize circuits on the gate level. Thanks to the fine granularity, transistor-level low-power design methods can reduce more static power than gate-level counterparts. Thus it is far more important to develop the transistor-level low-power design methodology for nanometer chips marked with high static power. Based on the transistor-level simulator developed, an efficient transistor-level optimization methodology consisting of two-step algorithms is proposed to reduce more static power. The former gate-space algorithm uses the clustering strategy to cut down algorithm complexity. The latter transistor-space algorithm employs fine granularity to pursue reducing more power consumption. Experiments show the following advantages: 1. the proposed methodology is so general that it can analyze and optimize heterogeneous circuit in which each transistor may have its own different V\-\{T0\}, channel width W, and channel length L, and 2. In the transistor-level W+V\-\{T0\}+L-sized optimization, the engine takes feasible running time (856.4s for C7552) to cut down 22.85%(average) and 43%(maximum) static power caused by gate-level optimum solution nearly without penalty of active power (average active power increases only 0.02%).
An Expandable Distributed RAID Storage Cluster System
Zhang Hongcan, Xue Wei, and Shu Jiwu
2008, 45(4):  741-746. 
Asbtract ( 789 )   HTML ( 0)   PDF (309KB) ( 544 )  
Related Articles | Metrics
An expandable, low price per storage unit, distributed RAID storage cluster system is presented to substitute the traditional hardware RAID array. It has three important features. First, it's a P2P infrastructure via network share storage model, with each data node in a completely equal status, and provides a great convenience in the expandability of distributed RAID system. Secondly, it sustains single I/O space(SIOS)feature by synchronizing RAID operations in user space, which is independent of the kernel-level RAID implementation, thus can support various advanced RAID technologies. Thirdly, it provides a virtual block device access interface, transparent to upper level file system and database system, thus can be directly used by application without any changes. Also, the system provides a node failure detection and recovery mechanism, which can effectively detect and replace failure nodes to guarantee data integrity and maintain system reliability. Experiments show that network shared storage model brings little impact on storage access bandwidth and the maximal sequential read bandwidth can reach 190 MBps, which is limited by the bandwidth of gigabit Ethernet. Also, experiments validate the reliability of the distributed RAID system. Under RAID6 configuration, one or two nodes failure will not interrupt data service, but one node failure leads to 15% bandwidth decline and two nodes failure leads to 18% bandwidth decline.