Advanced Search

    2010  Vol. 47  No. 4

    Abstract:
    The heterogeneous multi-core architecture and explicit management for multi-level memory hierarchy of cell processor pose programming & performance challenges to both programmers and applications. Most programming models for Cell/B.E. based system well support bulk data transfer applications which are suitable for streaming processing, but suffer performance degradation for the applications whose memory access patterns are irregular or unpredictable. In this paper, by extending memory access library and enhancing the co-processors independency of Cell/B.E. processor, a co-processor centric multilayer runtime library which supports both MPI and release-consistency-based Pthread programming model is proposed. The multilayer structure of the model and the flexible extended memory access library not only make the model more efficient and scalable but also boost the performance of irregular applications. In the model, while MPI programming interface enables large existing MPI applications to be ported to the Cell/B.E. processor easily and facilitates the traditional parallel programming, the release-consistency-based Pthread programming interface offers an efficient task runtime library to both MPI and the system-level users who need full control over the architecture. The experimental results show that the proposed multilayer runtime library is suitable for various applications and can achieve better performance by using a profile based optimizing technology built in the memory access library.
    Abstract:
    To overcome the defects of the relative low performance cost ratio caused by the secondary storage-based checkpointing for cluster services, a shared memory-based checkpointing mechanism for cluster services is presented in this paper. Idea of the proposed checkpointing mechanism is to make the checkpointing based on the shared memory, so as to reduce the checkpointing and recovery latency compared with the secondary storage-based checkpointing. To lower the risk of the non-persistent storage with the shared memory, in the shared memory-based checkpointing mechanism, all checkpoint servers in the cluster are organized as a single-directed circle. For each cluster service, the checkpoint data is stored both on the local checkpoint server and its predecessor in the single-directed checkpoint circle. The checkpoint management protocol is designed for the dual-stored checkpoint data to ensure the checkpointing update consistency. A group membership protocol is presented to guarantee all members in the single-directed checkpoint circle having the consistent group view, so as to backup the checkpoint data correctly. The experiment results show that the shared memory-based checkpointing mechanism achieves lower checkpointing and recovery latency. The group membership protocol needs only one-round communication to achieve the group view consistency among all checkpoint servers, hence costing low communication overhead.
    Abstract:
    As the emerging data intensive applications have received more and more attentions from researchers, its a severe challenge for duplication elimination for large volume data in a shared-nothing environment. The authors propose an effective and adaptive data placement method which is a combination of hash partition and histogram, as well as a design of an asynchronous parallel query engine (APQE) for duplication elimination. Hash partition divides data into non-relevant subsets in order to reduce data migration in duplication elimination, while histogram method keeps balance in data size in different nodes. Furthermore, adaptive approach can make data size rebalanced while data skew occurs. The parallel query engine develops maximum degree of pipeline parallelism for large scale data processing by employing coarse-grained pipelining, and the asynchronous method makes further efforts to eliminate synchronous overhead of multiple nodes parallelism. APQE launches data merging when some of database nodes returns intermediate result, and at the same time returns part of the final result as early as the slowest node returns relevant data, and then frees the memory space. Experimental results tested in a productive large scale system DBroker demonstrate that the combined data placement strategy and adaptive method work well for relative attributes duplication elimination, and the asynchronous parallel query engine can make a great performance improvement for duplication elimination of large volume of data in a cluster environment.
    Abstract:
    As the system performance of high performance computers (HPC) becomes higher and higher and its hardware scale continuously increases, how to realize highly reliable operation of the system is a great challenge in tera-scale and peta-scale HPC research and development. Beginning with the requirement for high reliability technology from HPC, the authors completely introduce the present reliability technologies in HPC hardware design, such as fault avoidance, static redundancy, dynamic redundancy, and online replacement, in which static redundancy includes such fault masking technologies as part redundancy, data path redundancy and information redundancy, and dynamic redundancy includes such reliability technologies as fault detection and diagnosis, reconstruction and recovery. Combined with online replacement technology, redundancy technology can greatly improve system RAS (reliability, availability, serviceability). Detailedly analyzed is the specific application of all kinds of reliability technologies in typical IBM, HP and Cray systems. Finally discussed is the future trend of reliability technology in peta-scale HPC, suggesting that in the development of peta-scale high performance computers, much work should focus on reliability design of multi-core processor and the all-round memory protection, and it is pointed out that blade architecture is beneficial to the realization of modularizational redundancy and online replacement of components.
    Abstract:
    Power analysis attack has become one of the most serious threats to break embedded security chips. It can crack security chips much faster than exhaustive search methods. It is a new attack method of secret key. Power analysis attack and its defense have attracted much attention in recent years. First studied in this paper are the power models that can be used for power analysis attacks and defense. It is pointed out that the high-performance and general-purpose model of power analysis attacks remains a very important research in the current. Then the authors discuss various types of power analysis attack and defense techniques, beginning from simple power analysis attack, differential power analysis attack to high-order power analysis attack. The success rate of a high-order power analysis attack is higher than that of the others, but its time and complexity of calculating are higher. Defense techniques on the algorithm level and circuit-level are also presented. Algorithm level is flexible to achieve, easy to transplant, and the realization of circuit-level is difficult, but it can gain better defense. Through the discussion and comparison, this paper can help researchers design a proper solution to defend against various power analysis attacks for their particular embedded security chips, and they can also gain useful information about power analysis attacks. Finally, the future research direction on power analysis attack and defense is presented.
    Abstract:
    The secret key exposure is a serious problem for the security of the digital signature. Unfortunately, for a regular digital signature, if the secret key is exposed, all the signatures previously signed are invalid because the verifier cannot identify whether a signature is produced before key exposure or not. Therefore, how to deal with the problem of secret key exposure in signatures is very important. Forward secure threshold signature is an important distributed signature to deal with this problem. It inherits the advantages of forward secure signature and threshold signature. The secret key is renewed periodically through the shares that the players hold, while the public key is fixed during the whole time periods. This kind of signature makes it more difficult for an adversary to compromise the security of the signature: if an adversary cannot attack a quorum number of players, he cant forge any signature; if an adversary can attack a quorum number of players in a certain time period, he cant forge any signature of previous time periods. In 2007, Peng et al. proposed a forward secure threshold signature scheme from bilinear pairing. Analyzed in this paper is the security of Peng et al.s scheme. Several techniques of security attack are given and it is pointed out that their scheme is insecure. At the same time, some improvement methods are also given.
    Abstract:
    The security protocol model is the basis of the security protocol analysis and verification. Existing modeling methods have some shortcomings, such as high complexity, bad reusability, and so on. A typed π calculus—π/+t calculus is presented in this paper, and its type rules and evaluation rules are also given. It is proved that π/+t calculus is a safely typed system. π/+t calculus can be used to build formal models of security protocols and their attackers. NRL protocol is taken for an example to show how to build a security protocol model based on π/+t calculus. The attacker model for a security protocol is also presented in this paper. The action ability which the attacker model based on π/+t calculus has is proved the same to the D-Y attacker model. So the correctness of the results which come from verifying the security protocol models based on π/+t calculus is promised. The security protocol modeling approach based on π/+t calculus can give more detailed description of the semantics of protocol data and the knowledge of participators. Compared with other kinds of methods, this method has better usability and high reusability and can provide more analysis support. The analysis result shows that the method can detect flaws of a security protocol in its modeling process.
    Abstract:
    Protocol anomaly detection, a new technique of anomaly detection, has great research value. Its incorporation with hidden Markov model (HMM) is still in infancy. In order to investigate the capabilities of hidden Markov model in this area, a protocol anomaly detection model based on HMM is given in this work. Firstly, an overview of anomaly detection is presented with emphasis on the issues about protocol anomaly detection. Then, a novel protocol anomaly detection model based on HMM is proposed. This method filters incoming TCP traffic by destination ports and then quantizes network flags into decimal numbers. These numbers are classified into sequences which are used as inputs of HMMs by TCP connections. Detection models based on HMM representing normal network behaviors are trained by Baum-Welch method. Finally, the models correctness and effectiveness is demonstrated by using forward method on MIT Lincoln Laboratory 1999 DARPA intrusion detection evaluation data set. Forward method is used here to compute the probability of a connection. Threshold K is designed to control detection rate. By comparing the probability with threshold K, this protocol anomaly detection model could find whether the traffic is normal or containing some sort of anomaly. Experimental results show that the model based on HMM has higher detection rates on attacks than the Markov chain detection method.
    Abstract:
    Due to the complexity of the Web environment and topic-multiplicity of the contents of Web pages, it is quite difficult to get all the Web pages relevant to a specific topic. It is possible for an irrelevant Web page to link a relevant Web page, so it is required to traverse the irrelevant Web page to get more relevant pages. This procedure is called tunneling. In this paper, some research about tunneling technique is presented, and also presented is a correction to the previous results. Tunneling is partitioned into grey tunneling and black tunneling. During the process of crawling, in order to avoid the effect caused by the Web page that is irrelevant to the specific topic as a whole but relevant partially, a multi-topical page is divided into several blocks and the blocks are processed individually for grey tunneling. In black tunneling, a depth value is assigned to determine whether the page should be kept to each irrelevant page according to the relevance of its father page, and then the scope of the topical crawler can be broadened. The experimental results show that the two tunneling methods have achieved the effect expected. Accordingly, the approaches are effective, robust and practicable.
    Abstract:
    Time synchronization and distance measurement are the infrastructure for any distributed system including wireless sensor networks. Time synchronization methods rely on some sort of message exchange between nodes, the same to distance estimation. Although many researchers have investigated time synchronization or distance measurement, but little work has been published on the relationship between them. Time synchronization is a prerequisite for conventional distance measurement and directly affects the accuracy of distance measurement in wireless sensor networks. In order to eliminate the restriction of time synchronization for distance measurement and expand the scope of distance measurement application, the authors present a coordinated algorithm for time synchronization and distance measurement (CATSDM) by backward analyzing the meaning of asynchronous paired ranging, which can realize not only time synchronization but also relative ranging and even absolute ranging. It combines time difference of arrivals ranging mechanism and network time protocols synchronization principle. The task of TDOA is to correct time skew. The task of NTP is to correct time offset. Theoretical analysis and simulation results show it outperforms in terms of robustness, accuracy and the number of message exchange than TPSN (timing-syn protocol for sensor network) and RBS (reference broadcast synchronization).
    Abstract:
    Unstructured P2P is a kind of very popular network application, such as Gnutella, KaZaa, etc. But its open nature makes it easy to attack by malicious peers, such as uploading inauthentic files, spreading computer viruses, etc. Hence, to mitigate the adverse behavior of unreliable or malicious peers in unstructured P2P, a reputation model based on social rules is proposed in this paper. At the same time, the parameters of the reputation model and their computing methods are presented. Based on the reputation model, a trust mechanism for unstructured P2P is designed, and the computing method of reputation and satisfaction ratio in this trust mechanism is presented. A distributed storage mechanism of evaluation records of peers is designed, and a corresponding search mechanism of reputation information is presented. An algorithm of topology evolution based on trust is presented, and its simulation shows that through topology evolution the high reputation peers form the backbone of unstructured P2P, and that can effectively improve the system stability. Under static and dynamic network settings, the resistance experiments of the proposed trust mechanism are carried out, and the experiment results show that the proposed trust mechanism can effectively resist the attack of malicious peers.
    Abstract:
    A new local search algorithm called IVNS (iterated variable neighborhood search) is proposed for the no-wait flowshop scheduling problem with setup time to minimize the total completion time. Three key factors are taken into consideration when designing local search algorithms like IVNS: neighborhoods, neighboring solution evaluations and strategies for escaping local optima. Firstly, three new neighborhoods with larger sizes are introduced to enhance the chance of finding high quality solutions. The neighborhoods are based on job block exchange and have sizes of O(n/+3) or O(n/+4), larger than the commonly-used insertion and exchange neighborhoods. Secondly, an objective increment method is adopted to speed up the evaluation of neighboring solutions in the neighborhoods, leading to two neighboring solutions compared in constant time and the neighborhoods completely explored in times proportional to their sizes. The objective increment method also reduces the time complexity of the famous NEH algorithm by one order. Finally, IVNS tries to escape local optima by switching between different neighborhoods as well as restarting from perturbation of local optima. IVNS is compared with the best known algorithms for the considered problem on 5400 benchmark instances under identical CPU times. Statistic analysis of the experiment results verifies that IVNS remarkably outperforms the reference algorithms in average solution quality.
    Abstract:
    Approximate string matching is a basic problem in computer science. It is widely used in various areas. The aim of this study is to improve the speed of approximate string matching. Filter algorithm for approximate string matching is discussed because it is suitable for large-scale text searching. A novel filter algorithm based on match-region features is presented. Firstly, a q-gram index is used to preprocess text. Secondly, both pattern and text are logically divided into blocks of fixed size kq+1, and then new match-region features are extracted from blocks, and the algorithm optimizes the fundamental q-gram filtration criterion by the new features. Finally, the improved method of choosing filtration-region based on QUASARs block addressing is used for filtration. The experimental results demonstrate that the algorithm achieves higher matching speed than that of QUASARs block addressing by way of improving filtration efficiency. In particular, its matching speed is much faster under low error rate. Experiments also reveal the relationship between matching speed and error rate of new algorithm. These results suggest that the algorithm is useful in a system for approximate string matching with low error rate. It is also powerful for long pattern approximate string matching on the condition of fixed edit distance k.
    Abstract:
    How to process similarity retrieval of medical cases efficiently and effectively, which affects whether it can provide exact and plentiful candidate cases for doctors, has become one of the hot research topics in both academic community and medicine science. So far, although many retrieval strategies used for improving query precision have been proposed, yet few of them discuss the issue of retrieval efficiency. Motivated by this, an index, i.e., M/+2+-tree, is proposed in this paper. M/+2+-tree combines, within one single index structure, information from multiple metric spaces, such as text features from diagnostic reports, physical features from medical images and so on. M/+2+-tree divides the data space made of medical cases into multiple sub-spaces based on metric space, and each sub-space is further divided into left and right twin parts based on key vector. Furthermore, it takes advantage of divisions without overlap over key vector and filtering principle of triangle inequality to speedup similarity search of medical cases. In a word, by using two kinds of divisions over metric space and vector space, many unnecessary similarity calculations are avoided, which improves the retrieval efficiency of medical cases dramatically. Experimental results show that the search performance of M/+2+-tree is better than that of typical multi-feature index M/+2-tree.
    Abstract:
    With the rapid development of computer hardware, researchers pay more and more attentions to how to make efficient use of computer resources and develop the high performance software. Traditionally, researchers produce manual-optimized software for a given machine or resort to the optimization of compiler. There are many problems in these two methods, including poor portability and the difficulty to synchronize with the development of hardware. When designing high performance self-adapting numerical software packages (SANS), such as FFTW, ATLAS, PHiPAC, OSKI, etc, using auto-tuning method becomes a very useful way and could avoid the problems mentioned above. However, the searching process always consumes too much time. In the experiment, it is found that the current best performance does not change rapidly during the whole process, which is completely the opposite of the change of performance gained at the runtime, and the best performance gained at the beginning is or almost close to the best performance gained after the whole search process. In order to search efficiently, a criterion Pt is proposed to evaluate the self-adapting searching process and the method on how to use the criterion to control the searching process for reducing the search time is also given. The experiment result shows that the method given can get an acceptable performance in a short time.
    Abstract:
    With the development of multi-core processors, the problem of application scalability up to a lot of processor cores is considered of vital importantce. Loops are one of the largest sources of parallelism in scientific programs. With the consideration of load balance, scheduling overhead, synchronization overhead and some other factors, the OpenMP API specifies four different scheduling strategies: static, dynamic, guided, and runtime. In this paper, an improved guided scheduling algorithm is proposed, which is named new_guided strategy. This strategy is implemented in OMPi, which is an open source OpenMP compiler. The key idea is to distribute the first half of the workload to all threads by static scheduling, and the second half by guided scheduling. Different scheduling strategies are also tested and analyzed with different loop types on modern multi-core architectures. Experimental data shows that the default static strategy of OpenMP is the worst in most cases. For general loops and increasing loops, the performance of dynamic strategy, guided strategy and new_guided strategy are almost the same. For decreasing loops, dynamic strategy and new_guided strategy are better than the standard guided strategy. For some extremely irregular loops, dynamic strategy is much better than guided strategy, and the performance of new_guided strategy is between them.
    Abstract:
    In the design of aspect-oriented software architecture, two kinds of elements may be involved. One is the aspectual elements which encapsulate crosscutting behaviors and features; another is basic elements which are traditional components or connectors. Furthermore, the two kinds of elements need to be woven together to form integrated model by specifying location, time point and constraints of injection, which is very important for analyzing and verifying overall behaviors and quality attributes of software architecture (SA). A kind of weaving mechanism at SA level, which is based on an aspect-oriented software architecture description language named AC2-ADL, is proposed in this paper. This weaving mechanism includes a set of weaving rules and a weaving process. Concretely, these rules are composed of match rules, conflict detection rules and interweaving rules. The match rules are used to search for location of injection over SA model. The conflict detection rules can determine whether there are temporal conflicts between crosscutting operations or not. And the interweaving rules are used to add the computation within crosscutting operation into corresponding component. Whats more, weaving process is defined based on these weaving rules. Under guidance of the weaving process, aspectual components designed independently in unwoven stage is explicitly woven into components. Then a model of SA in woven stage only containing components and connectors is acquired, which is easier to analyze and verify. Finally, this weaving mechanism is illustrated in detail through case study.
    Abstract:
    Fault localization techniques help programmers find out the locations and the causes of the faults and accelerate the debugging process. The relation between the fault and the failure is usually complicated, making it hard to deduce how a fault causes the failure. At present, analysis of variance is broadly used in many recent correlative researches. In this paper, a Bayesian belief network (BBN) for fault reasoning is constructed based on the suspicious pattern, whose nodes consist of the suspicious pattern and the callers of the methods that constitute the suspicious pattern. The constructing algorithm of the BBN, the correlative probabilities, and the formula for the conditional probabilities of each arc of the BBN are defined. A reasoning algorithm based on the BBN is proposed, through which the faulty module can be found and the probability for each module containing the fault can be calculated. An evaluation method is proposed. Experiments are executed to evaluate this fault localization technique. The data demonstrate that 0.761 in accuracy and 0.737 in recall on average are achieved by this technique. It is very effective in fault localization and has high practical value.
    Abstract:
    Flash memory has the merits of non-volatility, solid-stateness, small volume, light weight, shock resistance, high sequential access performance, high random read performance and low power consumption. Though flash memory has been widely applied in embedded systems and various kinds of technologies have been proposed in this area, recently, with the continual increase in capacity and decrease in cost, there are strong motives from both industrial and academic world to apply flash memory in notebook computers and enterprise storage systems, which raise new challenges for building a high performance, high reliable and low power-consuming storage system based on flash memory. In this paper, a comprehensive analysis and summary of the state of the art of technologies that are related to flash memory are given in the hope of enlightening future work. First, an introduction to the characteristics of flash memory and a discussion on the role of flash memory in storage system architecture are presented. Then, several key techniques for efficient exploitation of flash memory are reviewed in detail, including address-mapping technique, garbage collection mechanism, wear-leveling strategy, flash-aware buffer cache replacement strategy, flash-aware indexing data structure and flash-based transaction technique. Finally, a panoramic view is provided and possible future research areas are given.
    Abstract:
    As it possesses characteristics such as high concurrency, high scalability, high cost-effectiveness and so on, cluster storage has become an important method to build large data centers. As the system grows, its failure increases dramatically, and how to improve the system reliability to ensure continuous data access has become a key issue to cluster storage. Cluster RAID system is the extension of traditional RAID technology in cluster storage environment, and provides an effective solution to the problem of system reliability. An analytic reliability model of cluster RAID5 storage system based on the Markov model is proposed, for the purpose of quantitative analysis of the impact of a variety of parameters such as the storage topology, the nodes failure rate, the disk failure rate, rebuild speed, etc., on the cluster RAID system reliability. Analysis shows that: Firstly, multiple tiers cluster RAID5 takes advantage over single tier cluster RAID5, and is better suited to building a cluster RAID system; Secondly, increasing the rebuild speed could bring almost the same amplitude improvement of cluster RAID5 system reliability; Thirdly, when maintaining the same system reliability, enhancing the rebuild speed could decrease the need of large node MTTF(mean time to failure),and 10 times of rebuild speed improvement at most could reduce to one-seventh of the original node MTTF.