ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 June 2006, Volume 43 Issue 6
Design and Performance Analysis of the Godson-2 Processor
Hu Weiwu, Zhang Fuxin, and Li Zusong
2006, 43(6):  959-966. 
Asbtract ( 445 )   HTML ( 1)   PDF (484KB) ( 565 )  
Related Articles | Metrics
In this paper, the design and the result of performance analysis of the Godson-2 processor are presented. The Godson-2 implements a 4-way superscalar pipelined architecture, contains two 64KB L1 caches for instruction and data, and supports up to 8MB off-chip L2 cache. To improve the pipeline efficiency, The Godson-2 utilizes out-of-order executing technologies such as advanced branch prediction unit, register renaming and dynamic scheduler, and dynamic memory access mechanism like non-blocking cache and load speculation. The Godson-2 is implemented on 0.18um CMOS technology, with a maximum frequency of 500MHz under normal voltage and consumes 3-5 watts power under that frequency. The Godson-2 can perform one billion double-precision floating-point operations per second (two billion for single-precision), and the overall performance is comparable to Intel Pentium III with similar frequency. Presently a full Linux distribution (Debian) is running well on the Godson-2 prototype machines, including important desktop applications such as Mozilla web browsers, media players and OpenOffice.
Functional Units Design in Godson-2 Processor
Zhang Ge, Qi Zichu, and Hu Weiwu
2006, 43(6):  967-973. 
Asbtract ( 518 )   HTML ( 0)   PDF (377KB) ( 708 )  
Related Articles | Metrics
The algorithm and its implementation of functional units are very vital for the performance of today's state of art general-purpose microprocessor design. An overview of the functional units design in Godson-2 processor is given and some details including architecture and physical design are described. Godson-2 has two fixed-point functional units: ALU1 and ALU2, and two floating-point units (FPU): FALU1 and FALU2. The MMX-like instructions are also implemented in Godson2 FPU. The FPU is IEEE-754 and MIPS compliant. The floating-point adder and multiplier have 4-cycle and 5-cycle latencies respectively, and the floating-point division has various 4~17 cycle latencies. The physical design based on the standard cell methodology with SMIC 0.18μm CMOS technology show that 2Gflops for single precision and 1Gflops for double precision performance are achieved with the speed of 500MHZ.
Function Verification of Godson-2 Processor
Zhang Heng, and Shen Haihua
2006, 43(6):  974-979. 
Asbtract ( 476 )   HTML ( 2)   PDF (311KB) ( 610 )  
Related Articles | Metrics
Developing a new leading edge Godson-2 processor is an immensely complicated undertaking. In the case of Godson-2 processor, the micro-architecture is significantly more complex than the previous Godson 1 processor and how to ensure the function correctness is a great challenge to verification participants. Simple architectural level tests are insufficient to gain confidence in the quality of the design. Detailed plan must be combined with a broad collection of methods and tools to ensure that design defects are detected as early as possible. Described in this paper are the verification flow and methodology in the Godson-2 processor design prior to tape out. Simulation is the dominant method in the design, and the state of art formal method are also used to verify some parts of the design.
Critical Techniques of System Optimization for Godson-2 Processor
Wu Ming, Zhang Fuxin, Lin Wei, Xu Xianchao, Yuan Nan, and Wang Jian
2006, 43(6):  980-986. 
Asbtract ( 477 )   HTML ( 2)   PDF (442KB) ( 513 )  
Related Articles | Metrics
As the interface between applications and processor, system software plays an important role in maintaining the stabilization of processor and improving the performance of applications. In this paper, ways to resolve cache synonyms in Godson-2 processor and methods to decrease performance loss resulting from TLB miss, such as augmenting the page size, software TLB and FAST_TLB_REFILL, are described. And a way of Uncache Accelerate for improving performance of video output is depicted. Experimental results show that these kinds of methods in system software can benefit the stabilization and performance of the computer system.
Research on High Performance General Purpose Microprocessor Architecture
Zhang Minxuan, Wang Yongwen, Xing Zuocheng, Deng Rangyu, Jiang Jiang, and Zhang Chengyi
2006, 43(6):  987-992. 
Asbtract ( 527 )   HTML ( 4)   PDF (318KB) ( 856 )  
Related Articles | Metrics
The X processor is an EPIC based high-performance general-purpose microprocessor. Eight-stage pipelines and OLSM execution model are designed, which overcomes the limitations of traditional EPIC execution model with little hardware overhead. A multi-branch predication structure is designed for parallel execution of multiple branch instructions. Predication is used to reduce branch instructions. Two-level cache memory is designed. DTD method is presented for low power design and speculation is used to hide memory latency. Finally, the future of high-performance microprocessors is prospected.
Research and Development of High Performance YHFT Digital Signal Processor
Chen Shuming, Li Zhentao, Wan Jianghua, Hu Dinglei, Guo Yang, Wang Dong, Hu Xiao, and Sun Shuwei
2006, 43(6):  993-1000. 
Asbtract ( 706 )   HTML ( 0)   PDF (455KB) ( 751 )  
Related Articles | Metrics
YHFT-DSP/700 is a high performance floating-point VLIW DSP developed in 2004. Its frequency is 238MHz and its peak performance is 1,428MFLOPS and 1,904MIPS. The architecture, design methodology and compiler techniques of YHFT-DSP/700 are presented. The latest simultaneous multi-threading DSP architecture called YHFT-DSP/SMT is proposed, which improves the system throughput by 40%. The architecture and development trends of the mainstream high performance DSPs are analyzed as well.
A Multiple-Scan-Chain Test Approach Based on Combinational Decompression Circuits
Dong Jie, Hu Yu, Han Yinhe, and Li Xiaowei
2006, 43(6):  1001-1007. 
Asbtract ( 393 )   HTML ( 1)   PDF (375KB) ( 676 )  
Related Articles | Metrics
An on-chip decompressor is an efficient method to reduce test cost in multiple scan chain designs. In this paper, a new technique is investigated to implement the decompressor by utilizing combinational circuits. The proposed architecture drives a large number of internal scan chains with far fewer external input pins, thus delivering significant reductions in test data volume. Based on the analysis of compatible relationships among scan slices, the number of external scan inputs can be minimized. The effectiveness and applicability of the proposed scheme are demonstrated by experimental results.
Multilayer Light-Gossip—An Efficient Search Algorithm in Peer-to-Peer Networks
Li Renfa, Yue Guangxue, and Zhou Zude
2006, 43(6):  1008-1018. 
Asbtract ( 485 )   HTML ( 1)   PDF (684KB) ( 446 )  
Related Articles | Metrics
In existing unstructured P2P systems, the application layer protocol simply uses flooding algorithm to route peer's querying and the lack of cache scheme, which is just implemented on application layer and doesn't use down-layer's information routing of Internet. So it has poor scalability and low efficiency. In the RLP2P network based on the architecture of layer-region, the routing space is divided into two layers, the on region layer and the in region layer, advantages of both the flooding distributed forward search and spanning tree algorithm are combined, and a new efficient search algorithm of multilayer light-gossip and routing strategy of beehive based on regular hexagon in region is proposed. First, it classifies the message of the network into diffusions based on region and in region, keeps the robust and validity searched for the networks with certain redundant messages, and makes the workload of locating service and all the network host-count of range query bring down to regions. Second, it adopts the forecast evaluation means realize to route messages that are divided into groups in advance, enable the route messages to forward automatically along a shortest path on time measurement. The simulations results show that multilayer light-gossip algorithm improve search efficiently greatly and lowers redundant messages so that it enables the whole comprehensive performance of the networks maintain a fine state and scalability under the environment of wide area.
An Energy Efficient and Location-Independent QoS Protocol for Wireless Sensor Networks
Mao Yingchi, Gong Haigang, Liu Ming, Chen Daoxu, and Xie Li,
2006, 43(6):  1019-1026. 
Asbtract ( 445 )   HTML ( 2)   PDF (480KB) ( 470 )  
Related Articles | Metrics
Wireless sensor networks often face the critical problem of maintaining the sufficient sensing coverage (QoS) at an application specific level while keeping a small number of nodes active at any time to save energy. To solve this problem, the relationship between the desired QoS requirement and the minimum number of active nodes is analyzed without the knowledge of location or directional information in the randomly deployed sensor networks. Based on the analytical results, an energy efficient and location-independent QoS (ELIQoS) protocol is proposed, which selects the minimum number of active nodes based on the nodes' energy without using any location information. Simulation and analysis study demonstrates that the ELIQoS protocol can not only reduce the network consumption and balance the energy dissipation among nodes, but can provide the desired QoS requirement effectively.
Analysis of Short-Term and Long-Term Forecast of Weighted Internet Traveling Diameter
Zhao Hai, Xu Ye, Su Weiji, Zhang Wenbo, and Zhang Xin
2006, 43(6):  1027-1035. 
Asbtract ( 391 )   HTML ( 0)   PDF (584KB) ( 387 )  
Related Articles | Metrics
Physical properties of Internet are discussed in this paper. Based on giant samples, weighted Internet is firstly defined, a physical property of weighted Internet-weighted traveling diameter-is defined. And then according to features of samples, a model is put foreword. The model includes two parts: one part is logistic model, which simulates the development of traveling diameter, and the other part is sine and cosine functions, which simulates the oscillation during the development of traveling diameter. Since the development of weighted traveling diameter is very complex, the model proposed in this paper is far from suitable while being used for a long-term forecast. To solve the problem of long-term forecast, correlation dimension of weighted traveling diameter is firstly calculated, which proves that the development of weighted Internet traveling diameter is a phenomenon of chaos and a strange attractor exists. Finally, based on correlation dimension and properties of chaos path near the strange attractor, a model of four-dimension function group is put foreword, which is comparatively suitable for a long-term forecast.
Multicast Scheduling in Buffered Crossbar Switches with Multiple Input Queues
Sun Shutao, He Simin, Zheng Yanfeng, and Gao Wen,
2006, 43(6):  1036-1043. 
Asbtract ( 523 )   HTML ( 2)   PDF (489KB) ( 408 )  
Related Articles | Metrics
The scheduling of multicast traffic in bufferless crossbar switches has been extensively investigated. However, all the proposed solutions are hardly practical for high capacity switches because of either poor performance or high complexity. A buffered crossbar switch with multiple input queues per input port for transferring multicast traffic is proposed. Under this architecture, the scheduler operates in three stages, namely cell assignment, input scheduling, and output scheduling. The scheduling algorithms with complexity from O(1) to higher are presented for different scheduling stages. Simulation results show that both the number of input queues and the size of crosspoint buffer can affect the throughput performance of a buffered crossbar under multicast traffic. However, under bursty multicast traffic, increasing the number of input queues gains more, no matter which algorithm is used, i.e. either HA-RR-RR with complexity O(1) or MMA-MRSF-LQF with higher complexity. This shows that the proposed scheme is more appropriate for high performance switches.
Application of an Improved PageRank in Web Crawler
Qin Zheng, Zhang Ling, and Li Na
2006, 43(6):  1044-1049. 
Asbtract ( 400 )   HTML ( 1)   PDF (366KB) ( 582 )  
Related Articles | Metrics
The PageRank algorithm is used in ranking Web pages. It estimates the pages' authority by taking into account the link structure of the Web. However, it assigns each outlink the same weight and is independent of topics, resulting in topic-drift. In this paper, an improved PageRank algorithm based on topical segments is proposed. This algorithm segments the Web page into blocks and passes the page's PageRank to outlinks in each block in proportion with the block's relativity to the given topic. Moreover, it regards the visited outlink as feedback to modify the block's relevance. The experiment in Web crawler shows that the new algorithm has better performance.
An Attribute-Based Extended Delegation Model
Ye Chunxiao, Wu Zhongfu, Fu Yunqing, Zhong Jiang, and Feng Yong
2006, 43(6):  1050-1057. 
Asbtract ( 386 )   HTML ( 1)   PDF (469KB) ( 420 )  
Related Articles | Metrics
To increase the security of delegation, an attribute-based delegation model called ABDM\-A is presented, which is an extension of current delegation models. Delegation constraint in ABDM\-A consists of both delegation attribute expression (DAE) and delegation prerequisite condition (CR). Delegatees must satisfy delegation constraint (especially DAE) when assigned to a delegation role. For a better flexibility, delegation attribute expression is divided into two types: permanent and temporary delegation attribute expressions. With temporary delegation attribute expression, the delegator can temporarily, not permanently, delegate high level permission to low level delegatees. ABDM\-A relieves the security management effort of the delegator and the system administrator in delegation and increases the security of delegation.
Detection of Secret Message in Spatial LSB Steganography Based on Contaminated Data Analysis
Liu Wenfen, Guan Wei, Cao Jia, and Zhang Weiming
2006, 43(6):  1058-1064. 
Asbtract ( 459 )   HTML ( 2)   PDF (410KB) ( 373 )  
Related Articles | Metrics
Information hiding technique is of great importance for network information security and how to search and detect secret messages transferred in network is crucial and practical to safeguard the national security. On Internet, a large number of steganographic programs use the least significant bit (LSB) embedding to hide message in digital images. Based on contaminated data analysis, a new steganalytic technique capable of a reliable detection of spatial LSB steganography is proposed and its mathematical model is built . This method can not only detect the existence of hidden messages embedded in images fast and reliably, but also estimate accurately the amount of hidden messages embedded by using a sequentially or randomly scattered algorithm.
Separation of Duty in Dynamic Role Translations Between Administrative Domains
Liao Junguo, Hong Fan, Zhu Xian, and Xiao Haijun
2006, 43(6):  1065-1070. 
Asbtract ( 503 )   HTML ( 0)   PDF (389KB) ( 530 )  
Related Articles | Metrics
Secure interaction and interoperability between two or more administrative domains is a major concern. Kapadia et al. proposed the IRBAC 2000 model, which can be used to accomplish flexibly dynamic inter-domain role translations. However, in the IRBAC 2000 model, separation of duties is not considered, which is one of three basic security principles supported by the RBAC model, and enforced by statically mutually exclusive role constraints. Therefore, in this paper, the scenarios where dynamic role translations violate statically mutually exclusive role constraints are analyzed in detail, an approach to check the security problem is provided, and a protective mechanism utilizing prerequisite conditions to enforce the security of the IRBAC 2000 model is proposed.
Enumerations and Counting of Orthomorphic Permutations
Ren Jinping and Lü Shuwang
2006, 43(6):  1071-1075. 
Asbtract ( 392 )   HTML ( 1)   PDF (260KB) ( 511 )  
Related Articles | Metrics
Orthomorphic permutations play important roles in the design of cryptosystem. Studying the properties, enumeration and counting of orthomorphic permutations is heavily significant to design and analysis of cryptosystem. The theory research of orthomorphic permutations has become a focal issue. Counting and enumeration of orthomorphic permutations are discussed in this paper. An enumeration algorithm is given using sum matrix. and all of n-tuples orthomorphic permutations can be listed with this enumeration method. This is the first enumeration method of orthomorphisms. The lower bound and upper bound of the n-tuples orthomorphic permutation's number is given with this enumeration method. They are the best bounds until now.
VLSI Design of a High-Speed RSA Crypto-Coprocessor with Reconfigurable Architecture
Fan Yibo, Zeng Xiaoyang, and Yu Yu
2006, 43(6):  1076-1082. 
Asbtract ( 719 )   HTML ( 0)   PDF (479KB) ( 447 )  
Related Articles | Metrics
Through analyzing and comparing the normal RSA algorithm to the modified modular exponentiation and Montgomery multiplication algorithm in the arithmetic level, a nested pipelined RSA crypto-processor architecture is presented. This architecture has high-speed and reconfigurable feature. Based on the architecture, an RSA crypto-processor with various speed & key size can be designed, and it is especially valuable for high speed, long key size applications. It is also suitable for low speed, long key size, and high security level applications, or configured as an IP core embedded in SoC platform. As an example, a high-speed 1024-bit RSA crypto-processor is implemented using 0.18μm CMOS technology. The simulation result indicates that the encryption rate of the crypto-processor is more than 5000 times per second at 150MHz clock frequency. The performance of the crypto-processor proposed is the best reported in the literature in China.
A Priority Mapping Algorithm Without Affecting the Schedulability of Tasks Set
Wang Baojin, and Li Mingshu,
2006, 43(6):  1083-1089. 
Asbtract ( 435 )   HTML ( 0)   PDF (391KB) ( 422 )  
Related Articles | Metrics
In practice, the schedulability of static priority scheduling may be reduced if priority levels of the system are insufficient. When the priority levels that a task set requires are more than the system can support, more than one task must be assigned the same priority. The priority mapping algorithms that have been used can augment the worst-case response time of task with higher priority, and may reduce the system schedulability. If keeping system schedulable, these algorithms may require more priority levels than the system can support. With preemption threshold scheduling model, the schedulability is improved as compared to both preemption and non-preemption scheduling models, and all tasks can be handled with the minimum number of event handling threads. But, the number of system priority levels that a thread needs does not be reduced. This paper presents a priority mapping algorithm called TSM (threshold segment mapping) and a thread implementation architecture with event-driven. It is shown that the TSM algorithm can maintain the strict order of tasks' priorities, and does not improve task's worst-case response time. The number of system priority levels used by the algorithm and architecture equals to that of threads. Simulations show that when maintaining system schedulability, the TSM algorithm uses less priority levels than existing priority mapping algorithms.
Performance Lossless Voltage Scheduling for Low Energy Software
Lei Ting, Li Xi, and Zhou Xuehai
2006, 43(6):  1090-1096. 
Asbtract ( 322 )   HTML ( 3)   PDF (391KB) ( 362 )  
Related Articles | Metrics
The high power consumption of a processor is becoming a critical problem for both battery-powered devices and high-performance computers. Recently, dynamic voltage and frequency scaling of the processor has been identified as one of the most effective ways to reduce software energy consumption. In this paper, the problem of performance lossless energy reduction for variable-voltage processors is introduced, and a compilation optimization strategy is discussed, which identifies voltage scaling opportunities to achieve energy savings without performance loss. A mixed integer linear programming model for the problem is proposed, which can sufficiently exploit the difference of voltage scaling characteristics of subtasks. Also presented are two heuristic algorithms for the mixed integer linear programming. Experiment results demonstrate the effectiveness of the strategy with processor energy savings up to 22.3% for the real programs, and the heuristic algorithm performs well in terms of solution.
Research on Generating Incremental ETL Processes Automatically
Zhang Xufeng, Sun Weiwei, Wang Wei, Feng Yahui, and Shi Baile
2006, 43(6):  1097-1103. 
Asbtract ( 613 )   HTML ( 1)   PDF (382KB) ( 525 )  
Related Articles | Metrics
ETL processes are used for collecting data from data sources to data warehouse. ETL processes can be separated into two portions: full ETL processes and increment ETL processes. A full ETL process can be designed easily but it can only deal full data. An incremental ETL process is used for loading only those data which are newly created in the data sources, but it is difficult to design manually. In this paper, using existing methods of incremental maintenance of materialized views for reference, an approach to generate an incremental ETL process automatically from a full ETL process is put forward. Existing researches are focused on the incremental maintenance of materialized views in such circumstances which involve the operators of selection, projection, join and aggregation but not the difference operators. Since difference operators are used frequently in an ETL process, incremental maintenance of materialized views defined with difference operators is also discussed in detail.
An Accelerating Chaos Evolution Algorithm of Bilateral Multi-Issue Automated Negotiation in MAS
Gao Jian and Zhang Wei
2006, 43(6):  1104-1108. 
Asbtract ( 462 )   HTML ( 0)   PDF (281KB) ( 471 )  
Related Articles | Metrics
Automated negotiation is a key problem in multi-agent systems (MAS). It means establishing contracts for working together between agents. Now, in many cases these contracts consist of multi-issue, which makes the negotiation more complex than dealing with just one-issue. In multi-issue negotiation, how to conduct automated negotiation quickly and high-efficiently has become an important problem that we must deal with in MAS. A bilateral multi-issue automated negotiation model (MN) is proposed, and then an accelerating chaos evolution algorithm for the bilateral multi-issue automated negotiation is given. In this method the chaos mechanism is introduced into evolution and then algorithm constantly shrinks the searching area to accelerate method. By this way, the algorithm avoids premature converging to local Nash point, and solves slow convergence of multi-issue negotiation and chaos evolution algorithm. Theory analysis and experiments indicate that this method can fast and more efficiently converge to the global optimal numerical value in the field at probability of 1.
Cognitive-Cooperation-Oriented Knowledge Flow Research
Dou Wanchun, Liu Xiping, and Cai Shijie
2006, 43(6):  1109-1114. 
Asbtract ( 403 )   HTML ( 1)   PDF (303KB) ( 586 )  
Related Articles | Metrics
For facilitating the discussion of cognitive cooperation based on group learning, a basic environment supporting cognition is proposed taking advantage of the ontology technology, and some basic concepts related to the environment are presented. Accordingly, an integrated environment supporting knowledge application is explored by ontology approach. Moreover, the knowledge flow is defined in formalization and the logical process engaged in cognitive cooperation is discussed under the integrated environment based on the flow relation and the theory of Markov decision processes. Finally, the conclusion and directions in future research are presented.
A Parallel Geometric Correction Algorithm Based on Dynamic Division-Point Computing
Ou Xinliang, Chen Songqiao, and Chang Zhiming
2006, 43(6):  1115-1121. 
Asbtract ( 454 )   HTML ( 0)   PDF (518KB) ( 440 )  
Related Articles | Metrics
Recently, parallelization of geometry correction on remote-sensing image has become an important research area. But there exist some problems in current parallel algorithms: They do not have the capability of load-balance, or the global computation is very large. What's more, local operation is time-consuming. Therefore, a parallel geometry correction algorithm based on dynamic division-point computing, called PIWA-DDC, is proposed. By means of the LogP model, it is deduced that PIWA-DDC has good scalability. The results from experiments on MPP show that this algorithm has good capability of load-balance and high efficiency of geometry aberration processing.
Disambiguation in a Modern Chinese General-Purpose Word Segmentation System
Luo Zhiyong, and Song Rou
2006, 43(6):  1122-1128. 
Asbtract ( 482 )   HTML ( 2)   PDF (379KB) ( 664 )  
Related Articles | Metrics
Disambiguation is one of the most important parts of segment systems in Chinese. A Chinese general-purpose word segmentation (GPWS) system demands higher capacity of disambiguation techniques particularly, because it has functions such as allowing users to create their own dictionaries dynamically and employing multiple user's dictionaries to word segmentation. Based on inspection of the distributions and characteristics of ambiguity fragments (especially overlapping ambiguity fragments) in large-scale real corpus, an improved forward maximum match algorithm for ambiguity fragment detection, as well as a practical “rules + exceptions” disambiguation strategy, are proposed in this paper. An exhaustive extraction has been made of the overlapping ambiguity sections (about 2.4 million occurrences) from a People's Daily corpus of 100 million characters (234MB approximately), and open-ended experiments on the above strategy randomly were carried out, which achieved accuracy average of 99%.
Sentences Optimum Selection for Multi-Document Summarization
Qin Bing, Liu Ting, Chen Shanglin, and Li Sheng
2006, 43(6):  1129-1134. 
Asbtract ( 345 )   HTML ( 4)   PDF (340KB) ( 685 )  
Related Articles | Metrics
An approach for sentence optimum selection based on sub-topics of multi-documents is proposed. Multi-documents can be clustered into sub-topics after sentence similarity calculation, which can be sorted by scoring. Then sentences from all sub-topics are selected in order to get maximum coverage ratio of effective words. Using this method, the information redundancy of each sub-topic and among sub-topics is reduced. The information coverage ratio of the summarization is better improved. The experiment shows that the result is satisfied.