ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 January 2012, Volume 49 Issue 1
Summary of Research for Erasure Code in Storage System
Luo Xianghong and Shu Jiwu
2012, 49(1):  1-11. 
Asbtract ( 1201 )   HTML ( 15)   PDF (3434KB) ( 1722 )  
Related Articles | Metrics
With the development of massive storage system and its application in complex environments, there is a big challenge in the reliability of storage system. Erasure code is drawing more and more attention since it is the main technology for fault tolerance in storage systems. In this paper, we firstly introduce the current-status of some typical and popular erasure codes, then make careful comparison and analysis on current erasure codes with the important metrics that are used to evaluate them. Afterwards, we point out some shortages and improvement suggestions on fault tolerance, requirements for the number of disks, storage efficiency, encoding efficiency, updating efficiency and decoding efficiency for different erasure codes. What’ more, we discuss the different requirements on erasure code in disk array systems, P2P systems, distributed storage systems and archival storage systems. Finally, we indicate the unresolved problems in erasure code and their future trends. From the analysis, we found a lot of drawbacks on fault tolerance, storage efficiency and computation efficiency (including encoding efficiency, updating efficiency and decoding efficiency) for different erasure codes. It is an issue worthy of further study in a long period to make a balance on these factors and create new erasure code with higher fault tolerance, greater storage efficiency, and faster computation efficiency.
Research and Development on Key Techniques of Data Deduplication
Fu Yinjin, Xiao Nong, and Liu Fang
2012, 49(1):  12-20. 
Asbtract ( 725 )   HTML ( 5)   PDF (1133KB) ( 836 )  
Related Articles | Metrics
With the rapid growth of data capacity and the continuous improvement of data transfer rate requirement in enterprises, the needs of massive data storage capacity and high network bandwidth in data center become a grand challenge in networked storage area nowadays. Based on high data redundancy in application-specific datasets, data deduplication can reduce storage capacity needs greatly, improve the efficiency of network bandwidth and economize IT cost. Data deduplication technique is rapidly becoming a major research focus in recent years. Firstly, this paper introduces the concepts, categories and applications of data deduplication techniques, and describes the architecture along with the basic principle of data deduplication storage systems, meanwhile, contrasts them with traditional storage systems. Thereafter, it focuses on analyzing and summarizing current research status of key techniques on data deduplication, including data partition methods, I/O optimization techniques, high reliability data deployment strategies and system scalability. Finally, current research work about data deduplication is summarized, and future research directions are pointed out.
Survey of Design-for-Debug of VLSI
Qian Cheng, Shen Haihua, Chen Tianshi, and Chen Yunji,
2012, 49(1):  21-34. 
Asbtract ( 851 )   HTML ( 13)   PDF (1580KB) ( 745 )  
Related Articles | Metrics
Design-for-debug (DFD) has become an important feature of modern VLSI. On the one hand, traditional pre-silicon verification methods are not sufficient to enssure the quality of modern complex VLSI designs, thus employing DFD to facilitate post-silicon verification has attracted wide interests from both academia and industry; on the other hand, debugging parallel program is a world-wide difficult problem, which cries out for DFD hardware supports. In this paper, we analyze the existing structures of DFD comprehensively and introduce different fields of DFD for debugging hardware and software. These fields contain various kinds of DFD infrastructures, such as the DFD infrastructure for the pipe line of processor, the system-on-chips (SOC) and the networks on multi-cores processor. We also introduce the recent researches on how to design the DFD infrastructures with certain processor architecture and how to use the DFD infrastructures to solve the debug problems in these different fields. The topologic of the whole infrastructure, the hardware design of components, the methods of analyzing signals, the compressing and storing signals, the using of DFD infrastructure in debugging parallel program and post-silicon debugging are all introduced. Furthermore, some possible future directions of DFD research are given based on the state-of-the-art literatures and industrial requirements.
Accelerating Program Behavior Analysis with Dynamic Binary Translation
Zhao Tianlei, Tang Yuxing, Fu Guitao, Jia Xiaomin, Qi Shubo, and Zhang Minxuan
2012, 49(1):  35-43. 
Asbtract ( 570 )   HTML ( 2)   PDF (2370KB) ( 471 )  
Related Articles | Metrics
SimPoint methodology is one of the important methods for program runtime behavior analysis. BBV (basic block vector) profiles for SimPoint are usually gathered with functional simulators. However, BBV profile gathering is very time consuming since the speed of functional simulation is very slow. To solve this problem, this paper proposes using dynamic binary translation to accelerate BBV profile gathering. Firstly, a general framework on BBV profile gathering with dynamic binary translation, DBT-BBV, is presented. Then several optimization techniques for reducing the overhead of BBV gathering are discussed. Lastly, a highly efficient BBV profiler, QPoint, is designed and implemented based on the proposed DBT-BBV framework and optimization techniques. The performance and overhead of QPoint is evaluated with the SPEC2006 benchmarks. Compared with existing tools, QPoint has two advantages. Firstly, the performance of QPoint is much better, with a speed of up to 292 MIPS, on average 109 MIPS, on an ordinary PC. The overhead is less than 4%, which is the lowest among similar tools. Secondly, QPoint supports most architectures, including x86/x86_64, ARM, POWER, SPARC, MIPS, etc., and can be used to generate cross-platform BBV profiles. Experimental results show that dynamic binary translation technique can be well adapted to program behavior analysis acceleration.
A Best-Effort Hardware Transactional Memory Based on Dependency Graph
Zeng Kun and Yang Xuejun
2012, 49(1):  44-54. 
Asbtract ( 355 )   HTML ( 0)   PDF (1783KB) ( 466 )  
Related Articles | Metrics
In recent years, transactional memory has attracted much attention, which can greatly simplify concurrent accesses to the shared memory. Most hardware-based transactional memory systems employ concurrency control protocols that are based on conflict detection and resolution, in which if two transactions conflict with each other, one of them must be aborted and restarted to ensure the semantics of serializability. Based on the detailed analysis of “conflict”, we propose the concept of “weak conflict” to capture the situation in which two conflicting transactions may commit normally without violating the semantics of serializability. Furthermore, this paper proposes hardware transactional memory based on dependency graph, which allows transactions that are weak conflict with each other to commit. More specifically, the dependency graph is a graph which tracks the dependency relationship among active transactions. The existence of dependency circles indicates the violation of serializability. Instead of detecting conflicts, dependency-graph-based hardware transactional memory maintains the dependency graph of the active transactions and detects the dependency circles. If a dependency circle is detected, one of the transactions in the dependency circle should be aborted and restarted to ensure serializability. Simulation results show that dependency-graph-based transactional memory outperforms the transactional memory systems based on conflict detection and resolution.
A 2D-Cache Based Memory Bandwidth Optimization Method for H.264 Motion Compensation
Wang Wenxiang, Zhang Guangfei, and Shen Haihua,
2012, 49(1):  55-63. 
Asbtract ( 673 )   HTML ( 0)   PDF (2880KB) ( 498 )  
Related Articles | Metrics
Motion compensation(MC) consumes a lot of external memory bandwidth during H.264/AVC video decoding, which becomes a significant bottleneck of high definition(HD) H.264/AVC video codec design. Analytical results show that such significant bandwidth consumption comes from five parts: pixel data reload, address alignment, burst access, SDRAM page precharge/active and conflict memory accesses. In view of this, a 2D-cache based MC bandwidth optimization method is proposed. By exploiting pixel data reuse, such optimization method avoids large numbers of data reloading. Through the combination with SDRAM data mapping optimization, it integrates many short random accesses to some address aligned burst accesses, and at the same time reduces the page precharge/active frequency. In addition, a memory access group burst mode is proposed to reduce the bandwidth consumption caused by the conflict SDRAM access. Experimental results show that the new method reduces 82.9%-87.6% of the MC memory bandwidth, and it demonstrates a further reduction of 64%-87% of the bandwidth consumption compared with some existing high optimization efficiency methods. Provided the same amount of bandwidth reduction, the circuit area of the new method is reduced by 91% compared with the traditional cache architecture. The method proposed in this paper has been applied in a multimedia SoC chip.
High Efficient Memory Race Recording Scheme for Parallel Program Deterministic Replay Under Multi-Core Architecture
Liu Lei, Huang He, and Tang Zhimin
2012, 49(1):  64-75. 
Asbtract ( 546 )   HTML ( 0)   PDF (4293KB) ( 698 )  
Related Articles | Metrics
Current shared memory multi-core and multiprocessor systems are nondeterministic. When these systems execute a multithreaded application, even if supplied with the same input, they could produce a different output each time. It frustrates debugging and limits the ability to properly test multithreaded code, and is becoming a major stumbling block to the much-needed widespread adoption of parallel programming. The support for deterministic replay of multithreaded execution is greatly helpful in finding concurrency bugs. A memory race recording scheme, named Rainbow, is proposed. Its core idea is to make inter-thread communications fully deterministic. The unique feature of Rainbow is that it precisely sets up happens-before relationships between conflicting memory operations among different threads. By using effective, bloom-filter based, coherence history queue, Rainbow removes redundant happens-before relations implied in the already generated log and enables a compact log. Rainbow adds the modest hardware to the base multi-core processors, and the coherence protocol is unmodified. The analysis results show that Rainbow reduces the log size by 17% of a state-of-the-art scheme, and the records execution speed is similar to that of release consistency (RC) execution and replays at about 93% of its speed. The determinism can be provided with little performance cost using our architecture proposals on the state-of-the-art hardware, and the software-only approaches can be utilized on existing systems without problem.
InfiniBand-Based Multi-path Mesh/Torus Interconnection Network for Massively Parallel Systems
Xia Xiaoshuang, Liu Yi, Wang Yunbin, and Qian Depei,
2012, 49(1):  76-82. 
Asbtract ( 851 )   HTML ( 3)   PDF (2087KB) ( 406 )  
Related Articles | Metrics
In the field of designing high performance computing system, the efficient implementation of interconnection network is mandatory because it has momentous influence on the performance of massively parallel computing systems. As a high performance switched interconnection network standard, InfiniBand is widely used in MPP systems. Compared with fat-tree topology which is commonly used in InfiniBand networks, mesh/torus topology can achieve better performance and scalability. However, according to our study, there are some challenges to use the traditional mesh/torus topology in building interconnection network. In this paper, we formulate the problem of traditional network topologies and propose an InfiniBand-based multi-link mesh/torus interconnection network which provides higher bandwidth than traditional networks by using multiple links between switches. In order to fully use the provided multi-link, the corresponding routing scheme for the improved network is also proposed. Simulation results show that the proposed routin scheme achieves better performance and scalability than other routing schemes.
Revisiting Amdahl’s Law in the Hierarchical Chip Multicore Processors
Chen Shuming, Chen Shenggang, and Yin Yaming
2012, 49(1):  83-92. 
Asbtract ( 504 )   HTML ( 2)   PDF (2679KB) ( 381 )  
Related Articles | Metrics
Hierarchical chip multicore processors (HCMPs) can well support the memory reference and on-chip communications locality through supernodes, each of which consists of several tightly coupled processing cores, and thus efficiently reduce the data communications latency. This paper revisits the previous costperformance Amdahl model of the multicore processors, and make some extentions to account for the non uniform data communications latency of the HCMP architectures. Through those extentions, this paper investigates the relationship between the performance speedup and the size of the supernodes, which means the number of cores in a supernode in hierarchical chip multicore processors, and some important design rules are maintained. Simulation results reveal that to maintain a better Amdahl speedup, the HCMP architecture designers should carefully deal with the size of the supernode and the number of supernodes in an HCMP. Given the overall number of processing cores in an HCMP, the configuration of the supernode that makes the HCMP the optimal performance is with the intermediate number of middle-sized supernodes, and the optimal size of the supernode also varies with the overall cores in the HCMP. During the design of a specific hierarchical chip multicore processor, the proposed performance model can be utilized to help the designers make a better decision.
Program’s Performance Profiling Optimization for Guiding Static Cache Partitioning
Jia Yaocang, Wu Chenggang, and Zhang Zhaoqing
2012, 49(1):  93-102. 
Asbtract ( 532 )   HTML ( 2)   PDF (2793KB) ( 457 )  
Related Articles | Metrics
How to coordinate the core’s utilizing of cache resource is a key issue for shared cache multi-core processors. The current methods used in cache replacement may cause performance interference between the programs simultaneously running on the cores. Static cache partitioning techniques divide the shared cache into exclusive regions for programs to address the interference problem. In order to allocate the cache space with appropriate size to programs, performance profiling is needed to collect program’s performance under a variety of cache capacities, which has to run program multi-pass, one pass for each cache capacity. The enormous overhead of profiling prevents the static partitioning method from practical use. This paper presents a performance profiling optimization which needs only single-pass run. Phase analysis is used to identify program’s phases to eliminate redundant profiling for the same phases and unnecessary profiling in some cache capacities where program’s performance behavior is insensitive. Then the program’s overall performance under different capacities could be estimated with phases. Experimental results show that the overhead of this method is only 7%, while using it to guide partitioning can get 8% performance improvement than with no partitioning, and only 1% decline compared with the multi-pass profiling guided partitioning.
A New Analysis Model for Task Buffer of Pipeline Simulator Based on Queueing Network
Qiu Tie, Guo He, Feng Lin, Si Weisheng, and Liu Xiaoyan
2012, 49(1):  103-110. 
Asbtract ( 680 )   HTML ( 1)   PDF (2132KB) ( 629 )  
Related Articles | Metrics
Pipeline simulator of software is a key technology in software simulation of embedded microprocessors. A new analysis model for the pipeline simulator of the embedded SPARC-V8 microprocessor is proposed, and the associated analysis method for software simulation is also given. Specifically, the queueing network model with M/M/1/N queues is applied to analyze the task arrival and service blocking in the task buffer size of the pipeline simulator. To analyze the blocking phenomenon of pipeline stage, the “holding nodes” are added to the original model and hence obtain an equivalent model that is easy for blocking analysis. The evaluation indices of system performance are calculated by using an iterative algorithm with approximate calculation. The relationship curves between system throughput and task buffer size are established according to the system evaluation indices. The task buffer size values for each functional module for pipeline simulator are obtained by the change trend of curve. The actual buffer size of the pipeline simulator can be set by the calculated values from our model. The experiments show that the data obtained from the model are consistent with the actual operating data. Thus, the new model and the proposed analysis method have important guiding significance for optimizing the performance of the pipeline simulator.
An Alternating-Complementary Self-Recovering Method Based on Dual FSMs
Chen Xiumei, Liang Huaguo, Huang Zhengfeng, Wu Zhenni, and Cao Yuan
2012, 49(1):  111-117. 
Asbtract ( 546 )   HTML ( 3)   PDF (1554KB) ( 520 )  
Related Articles | Metrics
Deep-submicron technology allows billions of transistors on a single die, potentially running at gigahertz frequencies. According to the Semiconductor Industry Association projections, the number of transistors per chip and the local clock frequencies for high-performance microprocessors will continue to grow exponentially in the near future. Soft errors, caused mainly by cosmic rays and alpha particles from the chip packaging are affecting electronic systems in advanced CMOS technologies. Such reliability issues create tangible risk. Deep submicron process under transient faults caused by soft errors may become the important reasons for chip failure. In this paper, an alternating-complementary self-recovering method based on dual FSMs is proposed, by which the original FSM is decomposed into two sub-FSMs. The two sub-FSMs work by turns and they are indispensably complementary to each other. When an error occurs in one of the two sub-FSMs, a hardware rollback operation will be automatically performed using the correct state in the other sub-FSM. Thus, we can correct the soft error. The method is tested on MCNC91 benchmarks. Experimental results show that compared with published fault-tolerant methods, the average area overhead of the proposed methods is negligible, and the delay decreases dramatically, while it can mask 99.64% soft errors. Hence this method has certain advantages in the performance to tolerate soft error.
A Relaxed Co-Scheduling Method of Virtual CPUs on Xen Virtual Machines
Wang Kai, and Hou Zifeng,
2012, 49(1):  118-127. 
Asbtract ( 751 )   HTML ( 0)   PDF (1907KB) ( 560 )  
Related Articles | Metrics
In the environment of Xen virtual machine based on the SMP system with two or more physical CPU cores, the current virtual machine scheduling algorithms make all virtual CPUs be scheduled independently and do not consider the co-scheduling relationship of guest OS virtual CPUs, which leads to too much deviation of timer interruptions among virtual CPUs and so on, which causes the decline of guest OS stability. According to the problem, a more relaxed co-scheduling algorithm of virtual CPUs on Xen virtual machines than the relaxed co-scheduling one is given. It adopts a non-preemptive co-scheduling method to restrain relative running time of virtual CPUs between 0 and the upper limit to co-scheduling time detection T\-{max}. And it makes virtual CPUs co-scheduling more quickly and physical CPUs load more balanced by the method of optimized selection of virtual CPU and the method of dynamic migration of virtual CPUs in Credit. Meanwhile it reduces the whole virtualization systems overhead further. The experimental results show that the method ensures the stability of the virtualization system, and it improves the performance of virtual machines to a certain extent. The scheduling algorithm of virtual machines has the direct effect on the performance and stability, and it is a significant job to go into scheduling algorithms of virtual machines.
Inheritance, Subversion and Transcendence—Computational Photography
Xu Shukui, Zhang Jun, Tu Dan, and Li Guohui
2012, 49(1):  128-143. 
Asbtract ( 701 )   HTML ( 2)   PDF (10742KB) ( 458 )  
Related Articles | Metrics
Computational photography is an integrative technology which combines the computer and software technology with modern sensors and modern optics to create new imaging systems and image applications. In succession on the basis of existing photography, computational photography improves the components, workflow and even the principle of photography to break out of the limitation of traditional photography. It becomes a new revolution of photography. Computational photography is a cross-research area of many disciplines which can solve a problem through different perspectives and approaches. To fully understand this new area, computational photography is classified to computational scenes, computational optics, computational sensors and computational processing. In each subclass, some hot issues and typical projects are introduced. Finally, the existing circumstances of computational photography and trend in development are discussed.
Global Topology Based Image Stitching Using Hierarchical Triangulation
Zeng Dan, Chen Jian, Zhang Qi, and Shi Hao
2012, 49(1):  144-151. 
Asbtract ( 566 )   HTML ( 4)   PDF (3488KB) ( 421 )  
Related Articles | Metrics
Most image stitching algorithms adopt intensity or gradient for similarity measurement. Unfortunately they fail when the scene exhibits periodic contents or similar contents. In this puper, we propose a novel global topology based image stitching method. Firstly, gradient and ratios of RBG are used to describe and compare feature points. In order to reduce the omission of matched points, a threshold is set to reserve all m:n (m, n are positive integer) feature matching. Then, we compute two compatible triangulations of gradient and color matched points to compare the topology similarity. Finally, we incrementally add 1:1 matching points which are topology matched and remove problematic points. We demonstrate the illustrative results by comparing and contrasting our output with other methods. The presented algorithm gives superior results in all examples.
Cubic Bézier Triangular Patch with Shape Parameters
Liu Zhi, Tan Jieqing, and Chen Xiaoyan
2012, 49(1):  152-157. 
Asbtract ( 547 )   HTML ( 0)   PDF (1062KB) ( 540 )  
Related Articles | Metrics
The tensor product Bézier surfaces are successfully used by many commercial CAD systems to model complicated surfaces. But the theory requires that all data have a rectangular geometry. This is indeed the case for some surfaces (e.g. the trunk lid of a car), but not for others (e.g. interior car body panels). Every surface can be covered with a triangular network instead of rectangular networks. So in computer aided geometric design, Bézier triangular surfaces have now become one of the major tools in outer shape design. In order to control the shape of triangular surfaces in geometric modeling, a set of cubic polynomial basis functions with shape parameters are constructed in this paper, which are the extensions of the cubic Bernstein bases over the triangular domain. A polynomial surface with shape parameters over triangular domain is defined by using the basis functions. We then show that such basis and surfaces share the same properties as the Bernstein basis and the Bézier surfaces in polynomial spaces respectively. And the cubic Bézier triangular surface is its special case. Because of the adjustable shape parameters, modification or deformation of the surface is more flexible. The larger the parameters are, the more the surface approaches to the control net. By changing the value of the shape parameters, we can get surfaces with different shapes in invariable control net. Examples show that this technique is effective in CAGD.
Audio Authentication Based on Music Content Analysis
Wang Zhurong, Li Wei, Zhu Bilei, and Li Xiaoqiang
2012, 49(1):  158-166. 
Asbtract ( 624 )   HTML ( 0)   PDF (2496KB) ( 462 )  
Related Articles | Metrics
In almost all the existing audio authentication strategies, the smallest authentication entity is a segment with fixed duration, typically a frame whose size is consistent with both protection stage and verification stage. Such fixed-length segmentation has brought about at least two problems: Firstly, if the audio data to be authenticated has undergone some synchronization manipulations such as time-scale modification (TSM) or random cropping, the displacement between the frame sequences of the tested audio and its original may cause an unreliable authentication result; Secondly, authentication based on those fixed-length frames might not be friendly to users since in most cases a frame cannot satisfactorily cover a complete semantic entity of the audio data. In this paper, a novel music content authentication scheme based on note segmentation and fuzzy classification is proposed. We break the restriction of the traditional fixed-length segmentation strategies used in previous audio authentication methods and partition each track into a sequence of musical notes which correspond to minimum meaningful semantic elements of the music. Feature extraction and robust Hashing are performed on each note-based segment, thus settling the synchronization problem that remains a big challenge in most existing audio authentication algorithms. In the verification stage, three newly defined measures that characterize statistical and temporal properties of the distortion between the original and the dubious music, are exploited and further combined with fuzzy classification to make the final authentication decision. Moreover, tamped regions are also located for unauthentic music signals. Experimental results show that the proposed method features a high distinction between acceptable manipulations and malicious ones, and at the same time achieves satisfying performance in tamper localization.
A Computation Method of Path Diversity Based on AS Relationships
Zhang Weiguo, Yin Xia, and Wu Jianping
2012, 49(1):  167-173. 
Asbtract ( 371 )   HTML ( 0)   PDF (1265KB) ( 763 )  
Related Articles | Metrics
Autonomous system (AS) relationships play a critical role in data transferring and routing in the Internet. From the perspective of AS relationships, a concept of AS diversity is first proposed and the development trend of AS diversity in recent years is studied. The result shows that Internet AS diversity has significant growth trends and their distribution become more and more extensive in recent years. Moreover, AS diversity is combined with the traditional parallel-series reliability model, and then a new path diversity model based on AS relationships, named SPDSA (series-parallel path diversity system based on AS relationship) model, is presented. The relevant SPDSA measure is given at a later time. Finally, many virtual multihoming sites which depend on route views node data are simulated and dynamic experiments of AS path diversity are performed by SPDSA measure. Test results show that SPDSA measure is superior to traditional measures and it is especially effective to identify AS path diversity based on multi-homed network. Furthermore, the experiment results also show that multihoming can enhance AS path diversity in the Internet. Finally, the results indicate that though AS diversity has significant growth trends, AS path diversity reveals no major changes in recent years.
An Adaptive MAC Protocol for Wireless LANs Under Error-Prone Environment
Cheng Yanhong, Li Zhishu, and Chen Liangyin
2012, 49(1):  174-182. 
Asbtract ( 442 )   HTML ( 1)   PDF (1937KB) ( 456 )  
Related Articles | Metrics
The impact of conditional collision probability (p\-{cl}) on saturation throughput of 802.11 DCF (distributed coordination function) under error-prone channel is investigated through mathematical modeling. The study shows that under a certain network configuration in the basic CSMA/CA mode there exists an optimal p\-{cl} to maximize the throughput, and the optimal p\-{cl} is approximately independent of the number of active nodes, the bit error rate and the length of the packet payload (100—4000B). Based on the stated results, combining contention window and frame length adjustment methods, a channel adaptive protocol is proposed to enhance the performance of 802.11 DCF in case of error-prone environment. The contention window adjustment method adjusts the initial contention window to make p\-{cl} reach its optimum; moreover, the frame length adjustment scheme determines the optimal frame length according to channel condition. The analytical model is implemented based on NS-2 to evaluate the performance of the proposed protocol and the impact of channel condition. The results show that, as compared with legacy 802.11 DCF, the proposed protocol not only significantly improves the throughput in error-prone conditions, but also achieves the self-adapt character with the variation of channel condition and the node number.
A NoC Router with Dynamically Allocated Virtual-Output-Queueing
Zhu Honglei, Peng Yuanxi, Yin Yaming, and Chen Shenggang
2012, 49(1):  183-192. 
Asbtract ( 471 )   HTML ( 1)   PDF (3611KB) ( 555 )  
Related Articles | Metrics
The traditional router with virtual channel flow control alleviates the loss of link throughput and networks congestion by increasing virtual channels. However, it results in low utilization of buffers and excessive area cost of arbitration logic. And the router with dynamically allocated virtual channels can improve the buffer utilization and link throughput by organizing the buffers dynamically, but it can not avoid the rapid increase of the logic for flow control and arbitration. To improve the link throughput and buffer utilization, we propose a NoC router with dynamically allocated virtual-output-queues (DAVOQ) in order to obtain a good tradeoff between performance and overhead. In the router DAVOQ, the buffers are organized by rapid-linked lists and the logic area of arbitration, and pipeline is reduced and optimized based on lookahead routing. The simulation and synthesis results show that the router DAVOQ can reduce latency and improve throughput, yielding saving of standard cells area by 15.1% and saving of leakage power by 12.9% under 0.13 μm CMOS process in comparison with traditional router with virtual channels. And it can obtain considerable latency reduction with losing a little network throughput, and yield saving of standard cells area by 15.6% and saving of leakage power by 20.5% when compared with the router with dynamically allocated virtual channels.
Lifetime Optimizing Scheme of WSN
Sun Dayang, Liu Yanheng, Yang Dong, and Wang Aimin,
2012, 49(1):  193-201. 
Asbtract ( 410 )   HTML ( 2)   PDF (1877KB) ( 619 )  
Related Articles | Metrics
In this paper, a lifetime optimizing scheme is proposed to combine different applications, network service, network deployment and optimizing algorithms together. The deployment of network can be guided by this scheme through placing more nodes where higher service quality is needed. In this way, the wasted energy can be reduced efficiently. The statistic model is employed to unify different applications and optimizing algorithms in this scheme, thus application requirements and network service ability are quantified. To give a concrete scenario validating the feasibility of this scheme, an assessment method is presented based on the energy of the network nodes, with the upper bound of wasted energy being calculated. A quantization process of network density called density iterative process(DIP) is put forward subsequently to find out the distribution law of the network density which will provide wider space for lifetime optimizing. Theorems in this paper show that if the relaying function can be approximated to simple functions of initial relaying function and the network density, our quantization process DIP can get convergence. And furthermore, it is proved that DIP will get uniform convergence when the initial strength of serving request is not a constant function. Simulations also show that this process can get convergence and the solution can make the energy of different network regions burnt evenly and thus prolong the lifetime of WSN.
An Unsupervised Rough Cognition Algorithm for Salient Object Extraction
Li Zhongsheng, Li Renfa, and Cai Zesu
2012, 49(1):  202-209. 
Asbtract ( 443 )   HTML ( 0)   PDF (2956KB) ( 474 )  
Related Articles | Metrics
An unsupervised rough cognition algorithm based on granular computing for salient object extraction is proposed in this paper. Firstly, a granular computing model, as a way to simulate human thinking,is defined. Then, the following steps are done successively: 1)Partition the universe according to two concepts respectively, and filter topology equivalence classes with smaller scale; 2)Quantize the significance of topology equivalence classes with topology connectivity and topology distribution density; 3)Find the cut-off position in the significance sequence with the improved Fishers linear discriminant algorithm, and then obtain the candidate regions by removing non-significant topology equivalence classes; 4)Express the gradual changing pattern with dimensional scan gradient, which is used to do the local rough segmentation until candidate objects are available; merge the candidate objects that follow the gradual changing pattern and refresh the significant values of these candidate objects; 5)Run the Fishers linear discriminant algorithm again, and determine the final salient object according to position weight if more than one candidate object is left. Finally, the executing process of the proposed approach is validated by experiment step by step, and a comparative analysis with three recent methods is conducted, which shows the superiority of the proposed approach in terms of ability to approximate semantic of object and speed.
A Data Sealing Approach Based on Integrity Measurement Architecture
Shen Qingni, Du Hong, Wen Han, and Qing Sihan,
2012, 49(1):  210-216. 
Asbtract ( 593 )   HTML ( 0)   PDF (1142KB) ( 654 )  
Related Articles | Metrics
As an important capability of trusted computing platform, sealing can provide strong data storage security by combining data’s encryption with the platform configuration, by which data can only be unsealed under specific configurations. However, sealing operation is hard to use for the complexity of modern OS, the randomness of the loading order of the booting components, the frequently changing configuration, software update and patches. IMA (integrity measurement architecture) implemented in operating system could measure the dynamic configurations and extend them to the trust chain of the whole trusted platform, and then support the data sealing. Therefore, a new approach to data sealing based on IMA is proposed here, which seals data to a relatively fixed configuration in PCR0—PCR7 (Platform Configuration Register) and then applies a list policy (black list policy or white list policy) to the measurement list (ML) in IMA for the variable configuration in PCR10 to determine whether the unseal operation can be performed. Finally, a prototype system “TPM Master” implemented in Linux is given and its performance and security analysis are both evaluated. The results show that the proposed approach could solve the issue of the PCR value varying with the OS complexity and make updating process much more flexible by the list policy,without re-sealing the original data.