Loading...
ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 June 2005, Volume 42 Issue 6
Paper
Research on High Performance Computer Technology Based on InfiniBand
Xie Xianghui, Peng Longgen, Wu Zhibing, and Lu Deping
2005, 42(6):  905-912. 
Asbtract ( 407 )   HTML ( 0)   PDF (523KB) ( 562 )  
Related Articles | Metrics
The network performance is the key bottleneck which always restricts the development of high performance computing technology, whether computing network or storage network, the progress of communication is lag behind that of CPU. InfiniBand interconnection architecture can fill the performance gap in network and CPU, and make the performance of high performance computing to balance in computing and communication. For developing high performance interconnection components of HPC, the research on InfiniBand specification began in 2000, and the InfiniBand network products branded with “SunWay” were worked out in 2003. Discussed in this paper are the components, archtecture and applications of the high performance computing system based on the InfiniBand technology, and then the result of performance test is shown.
CC-NUMA Architecture Based IO System Design
Wu Jiqing, Liu Hengzhu, and Wang Haitao
2005, 42(6):  913-917. 
Asbtract ( 532 )   HTML ( 0)   PDF (259KB) ( 455 )  
Related Articles | Metrics
In CC-NUMA architecture that is widely adopted by high performance calculation, IO resources are distributed among the nodes. They are managed and maintained dispersedly. This kind of organization is subject to some latent troubles. First, the troubles are analyzed and then a new organization of IO resources in CC-NUMA system is put forward, based on some available techniques of IO bus and storage network. In addition, the design details of key modules are depicted. Finally, the design with the system platform is verified and the outcome proves that the new IO system is efficient.
gDevice: A Protocol for the Grid-Enabling of the Computer Peripherals
Zhang Yuedong, Yang Yi, Fan Jianping, and Ma Jie
2005, 42(6):  918-923. 
Asbtract ( 427 )   HTML ( 0)   PDF (337KB) ( 453 )  
Related Articles | Metrics
Grid computer is one of the trends of the future computer architectures, and the grid-enabled components are the key elements of the grid computer system. The main characteristics of the grid-enabled component are grid entity, functional service and intelligent interconnection, and the key issues in the grid-enabling technology of the computer components include device description, interconnection, resource sharing and multiplexing, security etc. The gDevice protocol is a protocol intending for the grid-enabling of the computer peripherals. The protocol has been partly validated in a grid computer console system called grid console.
SEA: A High-Performance Modular Long Integer Exponentiation Coprocessor
Zhao Xuemi, Lu Hongyi, Dai Kui, Tong Yuanman, and Wang Zhiying
2005, 42(6):  924-929. 
Asbtract ( 380 )   HTML ( 0)   PDF (387KB) ( 730 )  
Related Articles | Metrics
Modular exponentiation of long integers is the primary operation of several public-key algorithms and often the bottleneck for implementation. A high-performance modular exponentiation coprocessor, SEA, is presented here, and three novel ways are employed. First, a parallel binary modular exponentiation algorithm (PBME) is used to decrease cycles, and a high radix Montgomery modular multiplication algorithm is modified to the radix based high radix Montgomery modular multiplication algorithm (RBHRMMM) to increase the frequency; second when mapping algorithms to a systolic array, modular square and modular multiplication are alternatively computed to cover up the dependencies between iterations in the RBHRMMM algorithm and the bypass is used to eliminate the dependencies in the PBME algorithm; third, multipliers are split first, and then accumulations are compressed as partial products to decrease carry propagation delay in the critical path. The SEA can do a full 1024-bit modular exponentiation in 72738 cycles and is implemented based on standard cells, its die area being 4.2×4.2mm\+2 which equals 739933 gates. Now the SEA has been taped out successfully in 0.18μm 1P6M CMOS technology, the working frequency of SEA is 133MHz, the power is 962.26mW, and a 1024-bit RSA signature can be finished in 316.9μs with SEA.
An Implementation of Reconfigurable Computing Accelerator Card Oriented Bioinformatics
Zhang Peiheng, Liu Xinchun, and Jiang Xianyang
2005, 42(6):  930-937. 
Asbtract ( 400 )   HTML ( 0)   PDF (480KB) ( 640 )  
Related Articles | Metrics
After the completion of human genome sequencing, the biologists require higher processing and analysis power to handle the huge gene data. Computing is a basic research method of bioinformatics, many bioinformatics programs have some common features, such as huge data volume, relative simple algorithm, few operation types, many repeating processes, showing that these programs are potentially parallelizable. When running in a general computer, these programs not only waste a lot of system resources, but also need complex maintenance. However, a lot of program still couldn't get a satisfying result within limited time. A kind of general algorithm-reconfigurable hardware accelerator architecture is presented, the principle of how to map the global Smith-Waterman algorithm to the hardware is discussed and its possible applications in other fields are pointed out.
Parallel Modeling for Line Speed Approximate Content-Based Packet Classification
Li Xudong, Xu Yang, Li Jing, and Liu Bin
2005, 42(6):  938-944. 
Asbtract ( 254 )   HTML ( 0)   PDF (378KB) ( 485 )  
Related Articles | Metrics
A parallel and pipeline hardware scheme is proposed for approximate content-based packet classification, which is scalable for large rule set and high-speed network rate. With the employment of configurable window unit, the error level of approximate matching can be flexibly adjusted. Furthermore, various kinds of approximate matching errors (insertion, deletion, substitution, transposition) can be detected with different structures of rule combination unit. A probability model of packet matched is also proposed for large alphabet (Chinese char) environment, which proves that the hardware scheme is practicable.
A New Rendering Technology of GPU-Accelerated Radiosity
Hu Wei and Qin Kaihuai
2005, 42(6):  945-950. 
Asbtract ( 448 )   HTML ( 0)   PDF (407KB) ( 497 )  
Related Articles | Metrics
A new rendering technology of GPU-accelerated radiosity is presented in this paper. Exploiting parallel computation power of current graphics hardware, the method implements entire classical radiosity solution on GPU without participation of CPU. Using new OpenGL extensions to realize texture traverse, classification and accumulation, the rendering results of hemi-cube method can be used directly on GPU. New Jacobi iteration solution is also proposed based on the matrix and vector representations on GPUs.
A Hardware-Based PATRICIA Algorithm for Fixed-Length Match
Li Xin, Hu Mingzeng, and Ji Zhenzhou
2005, 42(6):  951-957. 
Asbtract ( 415 )   HTML ( 0)   PDF (410KB) ( 496 )  
Related Articles | Metrics
PATRICIA algorithm has become a classic method for information retrieval. But PATRICIA insertion is time-consuming. By analyzing PATRICIA, it is discovered that not keeping the order of NBTs(next bit to test) in PATRICIA trie can improve the performance of PATRICIA insertion and decrease hardware design complexity. A new PATRICIA algorithm for fixed-length match is proposed. It is proved that this algorithm is an optimal binary trie-based algorithm. An ASIC (application specific integrated circuit) for this algorithm is implemented for the application of state table of stateful inspection. The theoretical and experimental results show that this algorithm can work very well for the application of state table in gigabit network.
Design of System Area Network Adapter
Yang Xiaojun, Zhang Peiheng, Miao Yanchao, Sun Ninghui, and Guo Lili
2005, 42(6):  958-964. 
Asbtract ( 366 )   HTML ( 0)   PDF (404KB) ( 467 )  
Related Articles | Metrics
An effective system area network (SAN) adapter is critical to the achievement of a high-performance cluster system. The design of SAN adapter based on the Intel IOP310 I/O processor chipset, a universal embedded system, is proposed in this paper. It is a part of DCNet, which is the SAN of Dawning 4000A cluster. In the adapter architecture, the memory bus is extended to be a local bus for system peripheral interconnects, and a network interface unit (NIU) based on the local bus is implemented and embedded. All these innovations not only thoroughly compensate for the lack of high-performance data channel in the embedded system, but also efficiently utilize the memory bus bandwidth and DMA engine to reduce the latency for data transfer between the host and network. Furthermore, the Intel IOP310 I/O processor chipset makes it powerful for the adapter to offload the processing of communication protocol from the host CPU. The testing results show that the adapter obtains competitive communication performance compared with Myrinet, SCI, and QsNet, and prove that the way to design a high-performance adapter based on embedded system is feasible and effective.
Using Multi-Stage Switch Fabric in High Performance Router Design
Guan Jianbo, Sun Zhigang, and Lu Xicheng
2005, 42(6):  965-970. 
Asbtract ( 406 )   HTML ( 0)   PDF (361KB) ( 560 )  
Related Articles | Metrics
Traditional single-stage switch architectures cannot scale up well, so multi-stage architectures are widely considered in large scale switching fabric designs. Topology and packet routing style of multi-stage switching fabrics influence the performance heavily. Based on the comparison of several popular k-ary n-cube structures used in MPP systems it is argued that the 3D Torus network is most suitable for implementing large switching fabrics. Then a novel routing algorithm DMR is proposed. It can achieve high throughput and high availability by balancing traffic loads on multiple paths while at the same time it can maintain packets order in one flow. The performance of DMR routing algorithm is studied using a simulation approach and is compared with two other routing algorithms, the e-cube routing and the random routing. The results show that the performance of the DMR algorithm is almost the same as that of the random routing and much better than the e-cube routing. At the same time the DMR algorithm can maintain the packets order in one flow while the random routing cannot.
Parallel Communication Protocol Based on Smart NICs
Lin Ji, Zhou Xiaocheng, and Meng Dan
2005, 42(6):  971-978. 
Asbtract ( 420 )   HTML ( 0)   PDF (404KB) ( 464 )  
Related Articles | Metrics
As an important part of a cluster, the performance of the communication system is one of the most critical factors determining the performance of a whole cluster system. With the enhancing of a single node's computing capability, the communication capability of network needs to be improved corresponsively. An important method to enhancing capability of communication is using multiple cards to deal with messages at the same time. In this paper, an implementation of parallel communication based on smart NICs is presented and evaluated by both communication benchmarks and applications. The experimental results show that both performance of communication and applications is better than parallel communication based on RMA mechanism.
Fully Integrated Cluster Operating System: Phoenix
Meng Dan, Zhan Jianfeng, Wang Lei, Tu Bibo, and Zhang Zhihong,
2005, 42(6):  979-986. 
Asbtract ( 346 )   HTML ( 0)   PDF (488KB) ( 468 )  
Related Articles | Metrics
This paper defines a complete layered architecture of cluster system software named Phoenix from a perspective of operating system's view, including three layers: heterogeneous resources, cluster operating system kernel and user environment. According to the core requirement of different user environment, the components of cluster operating system kernel and especially their correlations are described and defined. The scalable and fault-tolerant characteristic of cluster OS is guaranteed on the basis of the improvement of group structure. The Phoenix system has been installed on Dawning 4000A used for system monitoring, system administration and job management, and presently it supports Linux, AIX, Windows and Solaris operating system.
Implementation of Checkpoint System Towards Large Scale Parallel Computing
Zhou Enqiang, Lu Yutong, and Shen Zhiyu
2005, 42(6):  987-992. 
Asbtract ( 467 )   HTML ( 0)   PDF (316KB) ( 654 )  
Related Articles | Metrics
As high-performance clusters continue to grow in size and popularity, issues of fault tolerance and reliability are becoming limiting factors on parallel computing. Two bottlenecks, checkpointing protocol overhead and storage cost of checkpoint image, limit the scalability of checkpoint system, which is critical to large-scale clusters. To address these issues, the design of C system is presented which provides coordinated checkpointing based on dynamic virtual connection and distributed checkpoint image storage for MPI-based parallel applications. Full use is made of some characteristics of parallel applications and capability of local disks of cluster system to reduce checkpointing cost of large scale parallel job. C system is suitable to large scale cluster and initial experimental results show negligible performance impact due to the incorporation of the mechanism into the C system implemented on the cluster testbed.
DCFT-Kernel: A Fault-Tolerant Cluster Middleware Based on Group Service
Huang Wei, Zhan Jianfeng, and Fan Jianpin
2005, 42(6):  993-999. 
Asbtract ( 446 )   HTML ( 1)   PDF (341KB) ( 494 )  
Related Articles | Metrics
Being highly available and fault-tolerant is one of the most important factors that are used for evaluating cluster system. But with the scale of cluster system becoming more and more larger, how to implement system software for fault-tolerant management in cluster becomes a difficult technical problem. In this paper, the group services method is put forward to resolve the problem of high scalability and high availability when implementing fault-tolerant management software. The main idea of group services is to divide the cluster system into several small partitions and let every partition being fault-tolerant upon that the whole system can be fault-tolerant. Using group services technology together with real-time event service technology, the fault-tolerant management system software, named DCFT-Kernel, is implemented in the DAWNING-4000A cluster system. In this paper, emphasis is put on describing the group services technology, but an introduction to DCFT-Kernel is also provided. Furthermore. some performance evaluations are also given in the paper.
LUNF—A Cluster Job Scheduling Strategy Using Characterization of Nodes' Failure
Wu Linping, Meng Dan, Liang Yi, Tu Bibo, and Wang Lei
2005, 42(6):  1000-1005. 
Asbtract ( 452 )   HTML ( 0)   PDF (346KB) ( 493 )  
Related Articles | Metrics
Owing to the outstanding scalability of cluster systems, the demand of high performance can be easily met by increasing the number of nodes. But, with the scale of cluster system expanding, node failures become a commonplace feature of such large-scale systems. New ways are needed to accommodate the occurrence of node failure. As an important part of cluster operating system software, job scheduling completes the task of high efficient resource management and reasonable job scheduling. The function of job scheduling in cluster system is divided into two sub-processes: strategy of job selection and node allocation policy. In this paper, the LUNF (longest uptime node first) node allocation policy is introduced using characterization of nodes' failure. The simulation results show that LUNF policy do better than random node allocation policy for the system performance.
High Performance and High Availability MySQL Server Cluster Based on Active TCP Connection Replication
Shao Zhiyuan, Jin Hai, and Tang Xiaohui
2005, 42(6):  1006-1012. 
Asbtract ( 548 )   HTML ( 0)   PDF (358KB) ( 535 )  
Related Articles | Metrics
In this paper, a scheme is presented, which builds high performance and high availability database cluster by using actively replicated TCP connections. The scheme forwards a technique that actively replicates a set of TCP connections by converting them into atomic multicasting and distributing read-only requests to different processing units. The technique is further used on MySQL database clusters and results in high performance and high availability. With the experiments conducted on the prototype systems, high performance on read-only database queries with little sacrifice on the update operations of the cluster is exhibited.
Implementation of Grid Router of Dawning 4000A
Yang Weibing, Sun Ninghui, Chen Mingyu, and Sun Xiaojuan,
2005, 42(6):  1013-1018. 
Asbtract ( 422 )   HTML ( 0)   PDF (295KB) ( 471 )  
Related Articles | Metrics
The evolution of grid technology has promoted the changes in the architecture and application environment of cluster systems. These changes introduce new system-level issues to be addressed, including secure, controllable and efficient service access. Traditional access control systems can not satisfy the requirements of applying high performance computers in grid environment due to their weak authentication and coarse granularity of access control. The grid router component of Dawning 4000A discussed in this paper provides a solution to these problems in an application independent manner, by providing strong PKI and grid key based authentication and fine-grained service access protection.
Metadata Consistency in DCFS2
Xiong Jin, Fan Zhihua, Ma Jie, Tang Rongfeng, Li Hui, and Meng Dan
2005, 42(6):  1019-1027. 
Asbtract ( 552 )   HTML ( 0)   PDF (524KB) ( 664 )  
Related Articles | Metrics
With the increase of required performance, capacity and scalability for a cluster file system, the multi-metadata-server structure is the trend for future cluster file systems. Distributed metadata processing based on multiple metadata servers is an important but difficult issue. In order to get high metadata performance and scalability, DCFS2, a cluster file system, uses distributed metadata processing over multiple metadata servers. Moreover, DCFS2 solved the metadata consistency problem for distributed metadata processing through a distributed logging technology and a simplified two-phase commit protocol. And performance results show that DCFS2's metadata processing policy based on distributed logging can deliver high I/O performance, and the file system can quickly recover from a metadata server failure.
BWFS: A Distributed File System with Large Capacity, High Throughput and High Scalability
Yang Dezhi, Huang Hua, Zhang Jiangang, and Xu Lu
2005, 42(6):  1028-1033. 
Asbtract ( 437 )   HTML ( 0)   PDF (363KB) ( 765 )  
Related Articles | Metrics
With the increasing requirements of applications and developments in computer science, research on networking storage system (NSS) becomes hot spot of I/O subsystem research. As one of the core components of NSS, distributed file systems should be paid much attention to. Based on the study of the existing research results, BlueWhale File System (BWFS) was designed by NRCHPC, the Institute of Computing technology, the Chinese Academy of Sciences. And it enables large capacity, high throughput and high scalability of BW1K NSS. In this paper, we described architecture of BWFS and its major characteristics are described and the test results of BW1K NSS are used to verify these characteristics.
Distributed Layered Resource Management Model in Blue Whale Distributed File System
Huang Hua, Zhang Jiangang, and Xu Lu
2005, 42(6):  1034-1038. 
Asbtract ( 401 )   HTML ( 0)   PDF (312KB) ( 540 )  
Related Articles | Metrics
In order to manage massive storage efficiently, Blue Whale distributed file system discards the traditional central resource management model, and adopts a distributed layered resource management model. This model supports multiple storage nodes and a cluster of metadata servers. The out-of-band data transportation alleviates the bottlenecks of performance, and enables metadata server cluster to handle metadata concurrently and efficiently, and also provides load balancing in the system. Theoretical analysis and test results show that this model outperforms in capability and scalability in various circumstances.
Research on Performance Bounds of Networked RAID Storage Systems
Cui Baojiang, Liu Jun, Wang Gang, and Liu Jing
2005, 42(6):  1039-1046. 
Asbtract ( 529 )   HTML ( 0)   PDF (470KB) ( 474 )  
Related Articles | Metrics
Previous work on performance evaluation of networked storage systems has been mostly qualitative, and the quantitative analytical method and model are still limited. A quantitative analytical model based on CQN-FC (closed queueing networks with finite capacity) is presented according to the data flow of distributed networked software RAID (dns-RAID). In order to cope with the state space explosion problem of CQN-FC solution, a novel approximate performance bounds analysis (APBA) method is proposed, which has lower computational complexity than other approximate analytical methods in the literature. Experimental testing results show that, the performance bounds of dns-RAID based on CQN-FC calculated by APBA method can reflect the actual throughput and I/O response time bounds in light load, heavy load and over load respectively, and can offer the maximal system load as well.
The Optimization for Molecular 3D-Structure Comparison Method and Its Parallel Implementation of Vectors Deployment
Lang Xianyu, Niu Beifang, Shen Bin, Lu Zhonghua, and Chi Xuebin
2005, 42(6):  1047-1052. 
Asbtract ( 349 )   HTML ( 0)   PDF (381KB) ( 565 )  
Related Articles | Metrics
Molecular similarity index describes the similarity between two molecules quantitatively. But acquiring the global optimal index is a complicated problem confusing the scientists. No matter what kinds of methods are used to search the best superposition of two molecules like grid-based integral or efficient iterative technique, selecting the relative situations of molecules is needed. Because of the limitations about the random search, which has always been used, it is required to test a great lot of relative situations. In this paper, an experimental method of uniform design is applied to deploy regularly the initial relative situations of molecules in 3D space, which makes the relative situations of molecules representative and even. It ensures to get global optimization within few certain numbers of initial relative situations of molecules. Parallel implementation for deploying these situations on Np processors reduces largely the running time, which can give the same final computational result as serial program. “Uniform design” combined with parallel computation searches the best superposition between two molecules rapidly with high possibilities.
Research on the High Performance Algorithms of Dawning 4000H Bioinformatics Specific Machine
Feng Shengzhong, Tan Guangming, Xu Lin, Sun Ninghui, and Xu Zhiwei
2005, 42(6):  1053-1058. 
Asbtract ( 413 )   HTML ( 0)   PDF (394KB) ( 536 )  
Related Articles | Metrics
The bioinformatics specific supercomputer is the key project supported by the Knowledge Innovation Program of CAS. The challenge is to accelerate the important core algorithms of bioinformation processing based on the modern computer system architecture and reconfigurable computing chip, and reach tens of times improvement. In this paper, the three typical algorithms, BLAST, dynamic programming, and Zuker-RNA secondary structure prediction, are optimized with hiding I/O latency, fine granularity paralleling, and pipeline paralleling, and C simulators are developed to evaluate the performance of those algorithms. The results show the performance can be improved efficiently by the special machine.
Implementation of Phase Domain Decomposition Parallel Algorithm of Three-Dimensional Variational Data Assimilation
Zhang Weimin, Zhu Xiaoqian, and Zhao Jun
2005, 42(6):  1059-1064. 
Asbtract ( 406 )   HTML ( 0)   PDF (322KB) ( 618 )  
Related Articles | Metrics
The principles of 3DVAR(three-dimensional variational) data assimilation of meteorological observations are introduced, and parallel strategies of 3DVAR systems, such as phase domain decomposition, loading balance and message communication, etc. are studied. Based on the SPMD programming model and a message passing interface, the 3DVAR parallel program, is designed and implemented, and some analysis of the results are finally presented.
Tuning Pipeline Granularity Based on Dynamic Profiling Framework
Ma Lin, Chen Li, and Feng Xiaobing
2005, 42(6):  1065-1072. 
Asbtract ( 388 )   HTML ( 0)   PDF (468KB) ( 460 )  
Related Articles | Metrics
Pipelining is one of useful parallelization techniques for those loops which have cross-processor data dependences. And the pipeline granularity is the key to make the computation time be suitable for communication time and obtain good pipeline performance. Loop strip-mining and loop interchange are good methods to help find the optimal pipeline granularity. And the amount of computation between communication operations in each node is called pipeline granularity or block size. A lot of factors decide the optimal pipeline granularity, such as access mode of application program, program size, total computing node, computation ability and memory architecture of the computing node, performance of communication network, communication mode, and overheads of runtime library, etc. It's hard to assume the block computation time by using static scheme, and the run time scheme will have more extra runtime overhead and may lose more optimization of the application. An approach is presented and realized to compute the pipeline granularity by dynamic profiling and the cost model including the cache locality by loop transform. How to decrease the time of profiling running and guarantee the precision of the cost model is also considered. The results of the experiments prove that the pipeline granularity achieved by dynamic profiling framework has good adaptability and speedup of the execution time of pipelined loop.
Evaluation and Test for Scalability of Numerical Parallel Computation
Chi Lihua, Liu jie, and Hu Qingfeng
2005, 42(6):  1073-1078. 
Asbtract ( 438 )   HTML ( 0)   PDF (303KB) ( 689 )  
Related Articles | Metrics
In this paper, the existing problems of the current scalability models are analyzed, and aiming at the requirements of the practical evaluations and tests, a practical scalability metric based on iso-average-computation-load is proposed to provide a quantitative measurement of the scalability. In this metric, the definitions of scalable speedup and scalability are different from the current metrics. By using this metric, a practical method can be obtained to test scalable speedup and scalability, and combined with curve fitting or parallel computing time model, the scalabilities of parallel systems can be predicted.
Performance Analysis of NPB Benchmark on Domestic Tera-Scale Cluster Systems
Yuan Wei, Zhang Yunquan, Sun Jiachang, and Li Yucheng
2005, 42(6):  1079-1084. 
Asbtract ( 561 )   HTML ( 0)   PDF (360KB) ( 566 )  
Related Articles | Metrics
In this paper, NPB benchmarking is performed on three domestic tera-scale cluster systems with emphasis on the performance characteristics and trends when carrying out tera-scale parallel computing on systems with thousands of processors. The effects of different system configurations (processor, interconnection network, etc.) on the final NPB performance are analyzed and it is found that the programs in NPB suites got their best performance on different clusters. Through further analysis, it is indicated that the scalability of NPB programs can reach hundreds of processors, but can't reach thousands of processors. Most of the NPB programs can only exploit around 10% of the system peak performance, so the scalability of cluster systems and real application performance on tera-scale cluster systems need further improvement. For manufacturing of tera-scale cluster systems with thousands of processors, the performance of collective communication and fine-grained message passing needs further improvement.
Implementation of Parallel Computing for Electromagnetic Scattering on Linux Cluster
Han Minghua, Peng Yuxing, Li Sikun, and Chen Fujie
2005, 42(6):  1085-1088. 
Asbtract ( 301 )   HTML ( 0)   PDF (250KB) ( 481 )  
Related Articles | Metrics
The demand for computing in computational electromagnetics (CEM) is continuously increasing in industry, especial in military application. Given in this paper are the results that were obtained using the parallel electric field simulation program MLFMA(multilevel fast multi-pole algorithm) on the Linux cluster of PCs connected via an 1000-based Ethernet network interface for large-scale scattering problems. Due to the high performance of the CPUs and interconnection technology, the results obtained on clusters of PCs can be compared to those obtained on expensive multiprocessor machines. A practical EM scattering engineering problem has been calculated.
Binary Compatibility Test and Performance Evaluation of Some Commerce Software on Dawning 4000A
Li Genguo and Li Lijun
2005, 42(6):  1089-1091. 
Asbtract ( 269 )   HTML ( 0)   PDF (196KB) ( 552 )  
Related Articles | Metrics
Some main commerce softwares such as Nastran, Ansys, LS-Dyna, Fluent, etc. for engineering simulation are tested on Dawning 4000A supercomputer. All the tested commerce softwares can be installed and used on Dawning 4000A supercomputer. The parallel capabilities of the tested softwares are very well.