ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 June 2014, Volume 51 Issue 6
A Many-Core Processor Resource Allocation Scheme for Packet Processing
Luo Zhangqi, Huang Kun, Zhang Dafang, Guan Hongtao, and Xie Gaogang
2014, 51(6):  1159-1166. 
Asbtract ( 644 )   HTML ( 1)   PDF (2082KB) ( 442 )  
Related Articles | Metrics
Single-core processors' performance has been behind the times. Many-core processors are recognized as a promising approach to packet forwarding in a router due to their powerful parallel processing. However, how to utilize the lots of cores in many-core processors to improve the system's parallel performance is a problem. Many-core based packet processing systems use a pipeline architecture, where each stage has a different execution time and requires a different number of CPU cores. Recently proposed EQUI is an equal core allocation scheme for packet processing, which allocates a same number of cores to each job. However, such scheme suffers from the wasting core resources, limiting packet processing performance. To address this issue, this paper proposes an optimized core allocation scheme. This scheme first partitions the task of packet processing into multiple jobs, and then proportionally allocates a proper number of cores to each job according to its execution time. Experimental results show that the new scheme improve 20% packet forwarding rate compared with EQUI.
A Data Gathering MAC Protocol Based on State Translation and Grouping for WSN
Li Zhetao, Wang Zhiqiang, Zhu Gengming, and Li Renfa5
2014, 51(6):  1167-1175. 
Asbtract ( 668 )   HTML ( 1)   PDF (2738KB) ( 490 )  
Related Articles | Metrics
In order to decrease the delivery latency, increase the throughput and lower the traffic loads, this paper presents MDD-MAC (multi-divided deliver MAC), a new synchronous MAC protocol with low delivery latency and low traffic loads. MDD-MAC minimizes collisions and overhearing by dividing data transmission cycle into two time slots, i.e. data sending slot and data receiving slot. To achieve low latency and energy-efficiency, MDD-MAC introduces an efficient group deliver mechanism by combining the topology information of network. We evaluate MDD-MAC on OMNet++ and compare it with DW-MAC and TMAC, two previous energy-efficient MAC protocols, under the scenarios of cross network, random networks and grid networks. In all experiments, MDD-MAC significantly outperforms those other protocols. For example, on scenarios of cross network with 43 nodes, the delivery latency and maximum buffer load for MDD-MAC is 67.59% and 89.16% less than that of DW-MAC, and 72.49% and 89.80% less than that of TMAC respectively. Specially, the average collision times for DW-MAC and TMAC is 8 and 34 respectively while the MDD-MAC is almost zero.
A Hybrid Checkpointing Strategy for Mobile Ad Hoc Networks
Liao Guoqiong, Xiong Anjin, Di Guoqiang, Wan Changxuan, and Xia Jiali
2014, 51(6):  1176-1184. 
Asbtract ( 483 )   HTML ( 0)   PDF (1748KB) ( 451 )  
Related Articles | Metrics
Considering the features of mobile Ad Hoc networks such as lack of center nodes, multi-hops routing and limited resources, a hybrid checkpointing strategy combining synchronous checkpointing and asynchronous checkpointing is suggested for clustering-based Ad Hoc networks. Namely, the checkpoints in the same cluster must keep synchronous, while the checkpoints in different clusters are independent. Firstly, a hybrid checkpointing model and its correctness criteria are discussed. Then, the elimination rules of different kinds of checkpoints are suggested based on the intra-cluster and inter-cluster checkpoint dependency graphs. Finally, the algorithms of checkpointing and rollback recovery are given, and the correctness of recovery is proved. The proposed strategy can not only avoid resource waste due to cascading rollback among the processes in the same clusters, but also avoid too much message transmission among the processes in different clusters, to reduce the wireless communication delay largely. Experiment results show that, comparing with the pure synchronous and pure asynchronous checkpointing methods, the proposed hybrid checkpointing strategy is a flexible and good trade-off scheme taking all kinds of resource constraints of Ad Hoc networks into account, and has the advantages such as short recovery time, less dependence on cluster heads.
Power Control and Vertical Handoff Based on Evolutionary Game Theory
Dong Rongsheng, Sun Dongdong, Guo Yunchuan, and Liu Jianming
2014, 51(6):  1185-1198. 
Asbtract ( 634 )   HTML ( 0)   PDF (3664KB) ( 477 )  
Related Articles | Metrics
The formal models of power control and vertical handoff of radio resource management are constructed respectively based on evolutionary game theory. A payoff function of power control is designed based on pricing mechanism. According to the classification of 3GPP for radio communication services, the handoff decision process is divided into 4 levels, which can reduce the handoff complexity. The cost function of target networks is defined. The network parameters are divided into 2 categories: cost and income parameters with normalization processing, and then the fairness of heterogeneous network parameters comparison is realized. It is proved that there exists unique evolutionary stable strategy in power control game and vertical handoff game. Then the algorithm of power control and the scheme of vertical handoff are presented based on evolutionary game theory. Simulation results show that the given algorithm of power control reduces the number of hidden terminals and improves the network capacity, and the scheme of vertical handoff not only reduces the handoff frequency while increasing the accuracy of network selection, but also balances the benefits between service providers and users.
A Fragmentable Admission Control Algorithm for Resource Reservation
Wu Libing, Dang Ping, Nie Lei, He Yanxiang, and Li Fei
2014, 51(6):  1199-1205. 
Asbtract ( 483 )   HTML ( 1)   PDF (2037KB) ( 469 )  
Related Articles | Metrics
Admission control algorithms have direct impact on the overall performance of advance reservation mechanisms in the distributed computing. Based on the advantages and disadvantages of existing flexible admission control algorithms, a fragmentable admission control algorithm for resource reservation is proposed. When the fixed resource reservation cannot be achieved, the algorithm allows reserving resources in fragments with the maximum fragment interval. And when there are slots whose remaining amount of resources is less than the requested amount in a fragment, it allows setting aside the minimum amount of resources. Experiments with other three kinds of extensible admission control algorithms (shorten the duration and increase the reserved bandwidth; reduce the reserved bandwidth and extend the duration; change the start time) show that the fragmentable admission control algorithm for resource reservation can effectively reduce the resource fragmentations, and obtain better performance in acceptance rate and effective resource utilization.
Research on Timed-Release Encryption
Yuan Ke, Liu Zheli, Jia Chunfu, Ma Haoyu, and Lü Shuwang
2014, 51(6):  1206-1220. 
Asbtract ( 749 )   HTML ( 2)   PDF (1589KB) ( 548 )  
Related Articles | Metrics
TRE (timed-release encryption) is a cryptographic primitive where the sender encrypts a message to prevent from being decrypted by anyone, including the designated receiver, until a future pre-set release time specified by the sender. Some other extensions make TRE have time properties of decrypting in advance and decrypting in a time interval. Since many applications in practice are time-sensitive, such as sealed-bid auctions, mortgage payments, on-line examinations and electronic confidential archives, TRE is considered as a valuable cryptographic tool. By summarizing existing TRE schemes and analyzing their characteristics, we give the formal definition and security goals definition of TRE. On the top of that, we introduce three fundamental frameworks of TRE along with their related mathematical problems, and further describe some typical constructions. We comprehensively analyze the security goals (specifically the message confidentiality and message unforgeability) of TRE, as well as their security bound under the adaptive chosen-plaintext attack and adaptive chosen-ciphertext attack models. Finally, we conduct research on the application of TRE; especially propose the preconditions and generic schemes for combining TRE with other cryptographic mechanisms. And we also construct a concrete scheme of public key timed-release searchable encryption which is a combination of TRE and public key encryption with keyword search. The future research directions in TRE are discussed in addition.
Efficient Proxy Re-encryption with Keyword Search Scheme
Guo Lifeng and Lu Bo
2014, 51(6):  1221-1228. 
Asbtract ( 642 )   HTML ( 1)   PDF (815KB) ( 648 )  
Related Articles | Metrics
The concept of proxy re-encryption with keyword search (PRES) was introduced by Shao et al. in 2010 and a bidirectional PRES scheme in the random oracle model was constructed. They addressed an open problem on how to design an efficient PRES scheme in the standard model. In this paper, we give the definition and security model of PRES with a designated tester (dPRES) and present an efficient dPRES scheme which is proven secure against chosen keyword attack and chosen ciphertext attack in the adaptive corruption model without resorting to random oracle. Our dPRES scheme obtains three advantages: firstly, when a user transmits the trapdoor of keyword to his designated tester, the user does not use a secure channel; secondly, the proposed dPRES scheme resists keyword off-line guessing attacks; thirdly, because Shao et al.'s PRES needs strongly unforgeable one-time signatures, their scheme was less efficient. We propose a dPRES scheme against an adaptive chosen keyword with no attached strongly-unforgeable one-time signature so that our dPRES scheme is more efficient. Proxy re-encryption with keyword search has practical applicatioins in such as distributed file system, email forword etc.
Asynchronous Checkpoint/Restart Based on Memory Buffer
Yi Huizhan, Wang Feng, Zuo Ke, Yang Canqun, Du Yunfei, and Ma Yaqing
2014, 51(6):  1229-1239. 
Asbtract ( 673 )   HTML ( 2)   PDF (3435KB) ( 570 )  
Related Articles | Metrics
Since high-performance computer systems have an increasingly large amount of computing and memory resources, the reliability problem of systems is being deteriorated. Checkpoint/restart is one of the most representative techniques in fault tolerance. Since the performance of parallel file systems has increased much slowly compared with the size of memories and checkpoint files, the traditional checkpoint/restart technique has seriously affected the performance of applications. Utilizing the large amount of computing and memory resources, we present an asynchronous checkpoint/restart technique based on memory buffer. In the paper, we have divided the traditional checkpoint/restart technique into two steps: at the first step, the checkpoint files are saved to the local memory on each computing node in parallel; at the second step, a help task in each computing node is used to copy the checkpoint files from local memory to parallel file systems. The help task and computing tasks on each computing node are working asynchronously and in parallel. Due to the high bandwidth of local memory and the concurrent execution of the help task and computing tasks, the asynchronous technique significantly reduces the overhead of checkpoint/restart, and the conclusion is verified by the experiment results of simulation and real applications.
A Double-Helix Structure Genetic Algorithm for Task Scheduling on Heterogeneous Computing Systems
Xu Yuming, Zhu Ningbo, Ouyang Aijia, and Li Kenli
2014, 51(6):  1240-1252. 
Asbtract ( 841 )   HTML ( 0)   PDF (4904KB) ( 569 )  
Related Articles | Metrics
The task scheduling problem is known to be an NP-complete problem and the heuristic-based optimization approaches have been proposed to obtain suboptimal solutions. But they are not likely to produce consistent results on a wide range of problems. In this paper, by the DNA double-helix structure inspired, a double-helix structure genetic algorithm (DHSGA) for task scheduling on heterogeneous computing systems is proposed. Its basic idea is to exploit the advantages of heuristic-based algorithms while avoiding their drawbacks. According to the data dependencies of a DAG task graph, the heuristic approach is used to control the crossover and mutation operations of GA algorithm in order to change the main chain structure of a chromosome reasonably, thus the DHSGA generates a better task scheduling priority queue. Then imitating the complementary based on a pairing principle, the heuristic HEFT algorithm improves the effectiveness and convergent speed by mapping from the chromosomal chain (task set) to another chromosome chain (processor set). The software simulation results show that the proposed DHSGA significantly improves the makespan on distributed heterogeneous computing systems, over a large set of randomly generated graphs as well as graphs for real-world problems with various characteristics, compared with the list scheduling algorithms. The average makespan reduction is about 10.1%.
An Efficient Tile Assembly Model for Maximum Clique Problem
Zhou Xu, Zhou Yantao, Ouyang Aijia, and Li Kenli
2014, 51(6):  1253-1262. 
Asbtract ( 690 )   HTML ( 1)   PDF (2124KB) ( 460 )  
Related Articles | Metrics
The tile self-assembly model has attracted enormous attention from the scientist because of its characters which are nanoscale properties, self-assembly, programmable and so on. However, the exponential explosion problem of solution space has already become the great obstacle to its further development. Hence, a novel efficient tile self-assembly model for maximum clique problem is proposed in this paper. The new model is composed of TileDual subsystem, initial configuration subsystem and detecting subsystem. For the sake of reducing the solution space, the concept of TileDual molecular and the enlighten strategy are introduced to the TileDual subsystem. Based on the three subsystems above, a new algorithm for the maximum clique problem is proposed. Compared with the proposed tile self-assembly models for the maximum clique problem, the solution space in our model can reduce from 2n0 to 1.712n-2n and the successive rate can improve from 0.5n0 to 0.5n-0.57n, where n0, n represent the vertex number of the graph G and G0 respectively. So the proposed model not only has the advantages of tile self-assembly model, but also needs much less solution space and can get much higher successive rate.
Instruction Level Parallel Optimizing for Scientific Computing Application
Luo Hongbing, Zhang Xiaoxia, Wang Wei, and Wu Linping
2014, 51(6):  1263-1269. 
Asbtract ( 650 )   HTML ( 2)   PDF (2989KB) ( 542 )  
Related Articles | Metrics
Achieving a high fraction of performance on super computers is difficult for actual scientific computing applications. An application must be optimized to exploit the characteristics of the architecture, such as inter-node communication, intra-node connection, hierarchy memory structure and the architecture of single processor core, etc. On a cluster comprised of several Intel Xeon multi-core processors, we explore how to improve the instruction-level parallel efficiency of a scientific application on single processor core. Taking Euler program based on a software infrastructure named J adaptive structured meshes applications infrastructure (JASMIN) as an example, we identify the performance hotspots of the application by the performance analysis tools, analyze the performance monitoring data to derive the performance bottlenecks, and tail the code to fit the characteristics of single core architecture. After a few attempts the performance of Euler program improve greatly. The execution time of Gas1dapproxy module of the program is shortened 60%—62%, and the total execution time of program is shortened 21%—34% for a 2D physical model and a 3D physical model respectively. The experiment results show that the pipeline efficiency is one of key factors to achieve higher performance for scientific computing applications. It can be optimized by reducing dependence degree in computation code, decreasing the number of long delay operators, such as replacing division with multiplication.
Performance Optimization for Short Job Execution in Hadoop MapReduce
Gu Rong, Yan Jinshuang, Yang Xiaoliang, Yuan Chunfeng, and Huang Yihua
2014, 51(6):  1270-1280. 
Asbtract ( 982 )   HTML ( 3)   PDF (4003KB) ( 690 )  
Related Articles | Metrics
Hadoop MapReduce is a widely used parallel computing framework for solving data-intensive problems. Now days, for its good capability for processing large scale data, Hadoop MapReduce has also been adopted in many query applications. To be able to process large scale datasets, the fundamental design of the standard Hadoop places more emphasis on the high-throughput of data than on the job execution performance. This causes performance limitation when we use Hadoop MapReduce to execute short jobs. This paper proposes several optimization methods to improve the execution performance of MapReduce jobs, especially for short jobs. We make three major optimizations: 1) reduce the time cost during the initialization and termination stages of a job by optimizing its setup and cleanup tasks; 2) change the assignment model of the first batch of tasks from the pull model to the push model; 3) replace the heartbeat-base communication mechanism with an instant message communication mechanism for event notifications between the JobTracker and TaskTrackers. We also adopt a typical MapReduce-based parallel query application, BLAST, to evaluate the effects of our optimizations. Experimental results show that the job execution performance of our improved version of Hadoop is about 23% faster on average than the standard Hadoop for different types of BLAST MapReduce jobs.
Divisible Loads Scheduling Using Concurrent Sending Data on Multi-Core Cluster
Zhong Cheng, Cai Dexia, and Yang Feng
2014, 51(6):  1281-1294. 
Asbtract ( 611 )   HTML ( 0)   PDF (4487KB) ( 355 )  
Related Articles | Metrics
By applying the overlapped computation and communication mode, a multi-round scheduling divisible loads model is proposed on the heterogeneous cluster that each node has multiple multi-core processors and shared L3 cache, in which master node executes multiple processes to send concurrently data to all the slave nodes. The presented scheduling model can adapt to different computation speed, communication rate and memory capacity of each node, compute dynamically the number of scheduling rounds and the size of the data block to be assigned to each node in each round scheduling to balance the loads among nodes, and reduce the communication overhead and shorten the scheduling length. According to the usable capacity of L1 cache, L2 cache and L3 cache in each node, this paper presents a multi-level cache partitioning model for the received load block in main memory of each node to balance the loads among multiple multi-core processors and the loads among processing cores. By applying the presented multi-round loads scheduling model and multi-level data allocation method, the communication and cache-efficient parallel algorithms for solving k-selection problem are designed on the heterogeneous cluster that each node has multiple multi-core processors. The execution performance of the presented parallel algorithm is evaluated on Sugon TC5000A cluster system by different methods that master node sends data to salve nodes, different use rate for L3 cache, L2 cache and L1 cache, and different number of running threads. The experimental results show that when master node sends concurrently the data to salve nodes, the execution performance of the parallel k-selection algorithm is superior to the performance of the algorithm when master node sends serially the data to salve nodes in each round scheduling; the use rate of L3 cache and L2 cache will impact remarkably the performance of the algorithm; when each core runs 3 threads using the optimized combination value of utilization rate of L3 cache, L2 cache and L1 cache, the required execution time of the algorithm is the shortest.
Design and Implementation of FPGA Verification Platform for Multi-core Processor
Zhu Ying, Chen Cheng, Xu Xiaohong, and Li Yanzhe
2014, 51(6):  1295-1303. 
Asbtract ( 597 )   HTML ( 0)   PDF (2029KB) ( 521 )  
Related Articles | Metrics
As the design of high performance microprocessor becomes more and more complicated, the hardware-software co-verification, which is based on the FPGA (field programmable gate-array) prototyping verification platform, is usually used before tape-out. The use of FPGA platform is aimed to shorten the period of verification and reduce the risk of manufacture. With the developing of multi-core microprocessor, the implement of FPGA prototyping verification platform becomes more and more complex. Firstly, the paper introduces how to design and implement the FPGA prototyping verification platform for a high-performance multi-core microprocessor. Secondly, the paper describes the details about the construction of the FPGA platform, including strategy of FPGA partitioning, time division multiplexing communication, and implement of I/O interfaces. The FPGA platform is constructed by several mother boards and daughter boards, so it is adaptable for different periods of chip verification by changing the scale. This FPGA verification platform is scalable and flexible, and contributes a lot to the function correctness verification and performance evaluation of the target design. As a result, the success of the FPGA verification efforts underscored by first prototype chips which can boot operating system and test lots of application programs successfully. In the end, the paper also analyses the application prospect of this FPGA prototyping verification platform.
A Coordinate Descent Algorithm for Solving Capped-L1 Regularization Problems
Wang Yujun, Gao Qiankun, Zhang Xian, and Tao Qing
2014, 51(6):  1304-1312. 
Asbtract ( 990 )   HTML ( 2)   PDF (1949KB) ( 677 )  
Related Articles | Metrics
L1 regularization plays a crucial role in sparse learning and higher accuracy can be achieved by using the capped-L1 regularization. However, it leads to a non-convex optimization problem, which is currently solved by a multi-stage convex relaxation algorithm (MSCR). As a convex optimization problem has to be solved exactly at each stage, extra computational cost can not be avoided in MSCR. In order to circumvent this drawback, in this paper, a new non-convex coordinate descent algorithm (Non-convex CD) is presented for solving the capped-L1 regularization problems, in which only one single-step of regular coordinate descent is employed in each stage of MSCR. Such a Non-convex CD can effectively reduce the computational cost. Theoretical analysis shows convergence of the proposed algorithm. For the commonly Lasso problems, the experiments on large-scale text databases demonstrate that Non-convex CD spends even less CPU time than the regular coordinate descent method, while keeping almost the same accuracy as MSCR. To further illustrate the high performance of Non-convex CD in real application, we also consider an image deblurring example.
Dynamic Multi-Objective Optimization Algorithm Based on Ecological Strategy
Zhang Shiwen, Li Zhiyong, Chen Shaomiao, and Li Renfa
2014, 51(6):  1313-1330. 
Asbtract ( 668 )   HTML ( 1)   PDF (5366KB) ( 525 )  
Related Articles | Metrics
Dynamic multi-objective optimization problems (DMOP) are some problems whose objective functions, constraints, or parameters change dynamically. DMOP are important and challenging tasks in the real-world optimization domain. Generally, it is difficult to track the Pareto front of DMOP by the traditional evolutionary algorithm. Aimed at the characteristic of dynamic multi-objective problems, a novel co-evolutionary algorithm for DMOP (dynamic multi-objective optimization algorithm based on ecological strategy, ESDMO) is proposed based on ecological strategies and a new self-detecting environmental change operator. The ecological strategies are very important for individuals to fit to the changing environment and get higher competition ability. The proposed method adopts an evolutionary computing model that combines co-evolution mechanism and reinforcement learning strategy, which is inspired from ecological strategy between predator populations and prey populations. A self-detecting environmental change operator is defined and used to measure the changing environment in the algorithm. Hereby different populations take different ecological strategies to cope with environmental change. Several typical dynamic multi-objective problems are tested. The experimental results show that the proposed algorithm can get better diversity, uniformity and convergence performance. It demonstrates that the proposed algorithm is effective for solving DMOP.
Robust Activation Function of Extreme Learning Machine and Linear Dimensionality Reduction in High-Dimensional Data
Feng Lin, Liu Shenglan, Zhang Jing, and Wang Huibing
2014, 51(6):  1331-1340. 
Asbtract ( 709 )   HTML ( 0)   PDF (3832KB) ( 568 )  
Related Articles | Metrics
Extreme learning machine (ELM), with the advantage of fast training and high classification accuracy, has been widely used in practical applications (for example, face recognition) and got good result. While ELM algorithm is often severely affected by noise and outliers in high-dimensional real word datasets, which will reduce the accuracy rate of ELM. This should be attributed to the following two reasons: 1) the high dimensionalities of input samples; 2) improper selection of activation function. The two reasons above result in that the outputs of activation functions are approaching zero, which finally reduce the performance of ELM. As for the first problem, we propose a robust linear dimensionality reduction method, RAF-Global Embedding (RAF-GE), to preprocess the high dimensional data and then classify the data with ELM. While for the second one, we give an in-depth analysis of different activation function and propose a robust activation function (RAF) which can avoid the outputs of activation function approaching zero, thus it can improve the performance of RAF-GE and ELM as well. The experiment results show that the performance of face recognition method in this paper generally outperforms the comparing methods with other activation function.
The Realization and Application of Symbolic Computation Modules in SGARP
Zhang Jingzhong, Zhang Chuanjun,,Zheng Huan, Rao Yongsheng, and Zou Yu
2014, 51(6):  1341-1351. 
Asbtract ( 514 )   HTML ( 0)   PDF (3452KB) ( 401 )  
Related Articles | Metrics
SGARP (sustainable geometry automated reasoning platform) enables users to add or change the knowledge involved in automated geometry theorem proving, such as geometry objects, predicates, theorems or reasoning rules, so that varieties of rule-based machine proving methods or human-machine interactive reasoning methods could be developed. In order to further enhance the automated reasoning power of SGARP and to expand its scope of application, this paper proposes a convenient way of implementing the symbolic computation modules in SGARP. Based on the embedded symbolic computation modules, the mass point method and the analytic method are successfully added into SGARP. The mass point method can prove the Hilbert intersection point statements, and the analytic method can assist in proving all types of difficult geometric theorems, such as the famous Thebault's Theorem. As for the two embedded methods, we test them with 180 geometry problems in TGTP (thousands of geometric problems for geometric theorem provers) which is a Web-based library of problems in geometry. The results show that all of the 180 geometry problems could be proved within a reasonable time with readable machine proofs, which shows that the upgraded SGARP can better satisfy the users' requirements for learning and developing automated geometry reasoning.
Extraction of Gene/Protein Names Involved in Each Stage of Spermatogenesis Based on Literature Mining
Zhu Jun, Yin Jianping, Zhao Zhiheng, Zhu En, and Ban Rongjun
2014, 51(6):  1352-1358. 
Asbtract ( 536 )   HTML ( 0)   PDF (1059KB) ( 444 )  
Related Articles | Metrics
Spermatogenesis is an important bioprocess in the lifetime of male mammalians, which has deep effect on mammals reproduction. Abnormal spermatogenesis is a major cause of male infertility, however treatments for this are limited. Characterizing the genes/proteins involved in spermatogenesis is fundamental to understand the mechanisms underlying this biological process and to develop treatments for the problems in spermatogenesis. However, most crucial information of spermatogenesis-related genes/proteins scatters in vast amount of research articles, so manually curation of these genes/proteins could be a time-consuming task. In this paper, a novel strategy is proposed to automatically extract the names of spermatogenesis-related genes/proteins, which function in different stages of spermatogenesis based on literature mining. Firstly, it compares three different algorithms performance on different terms and applys an SVM classifier trained with a manually prepared dataset to classify spermatogenesis-related texts into three classes in accordance with the three stages of spermatogenesis. Then, integrating expert knowledge and grammar rules, it recongnizes and extracts the gene/protein names of each spermatogenesis stage with high confidence. Finally, a manually curation test dataset is used to test the performance of this strategy, and the strategy gets an accuracy of 71.9%, which verifys the reliability of proposed method and proves the value of application.
Entity Ranking Based on Wikipedia for Related Entity Finding
Zhang Junsan, Qu Youli, Shui Yidong, and Tian Shengfeng
2014, 51(6):  1359-1372. 
Asbtract ( 555 )   HTML ( 0)   PDF (3522KB) ( 431 )  
Related Articles | Metrics
Entity ranking is a very important step for related entity finding (REF). Although researchers have done many works about “entity ranking based on Wikipedia for REF”, there still exists some issues: the semi-automatic acquirement of target-type, the coarse-grained target-type, the binary judgment of entity-type relevancy and ignoring the effects of stop words in calculation of entity-relation relevancy. This paper designs a framework, which ranks entities through the calculation of a triple-combination (including entity relevancy, entity-type relevancy and entity-relation relevancy) and acquires the best combination-method through the comparisons of experimental results. A novel approach is proposed to calculate the entity-type relevancy. It can automatically acquire the fine-grained target-type and the discriminative rules of its hyponym Wikipedia-categories through inductive learning, and calculate entity-type relevancy through counting the number of categories which meet the discriminative rules. Also, this paper proposes a “cut stop words to rebuild relation”approach to calculate the entity-relation relevancy between candidate entity and source entity. Experiment results demonstrate that the proposed approaches can effectively improve the entity-ranking results and reduce the time consumed in calculating.
A Pareto-Based Data Placement Strategy in Database as a Service Model
Zhang Tiantian, Cui Lizhen, and Xu Meng
2014, 51(6):  1373-1382. 
Asbtract ( 757 )   HTML ( 0)   PDF (2354KB) ( 449 )  
Related Articles | Metrics
DaaS (database as a service) as a new data provisioning pattern, has been widely applied. With the arrival of the era of big data, the amount of data is increasing dramatically and the data placement problem in DaaS model becomes more and more important. How to place the data according to their different performance requirements has an important effect on improving the quality of service, enhancing user experience and reducing the service cost. However, for service providers, improving service quality and reducing service cost are contradictory. It can't use less cost to buy the higher performance servers so that it is difficult to provide better service quality for users. It models the DaaS data placement problem as the DaaS data placement graph. Then using the pareto optimal thought, it puts forward a DaaS data placement strategy based on performance-cost equilibrium, including the initial solutions generation stage, multi-objective optimization stage and the final solution generation stage. It supports that DaaS service providers use less cost to provide the better service quality for users. Compared with the random strategy and greedy strategy through experiments, the strategy proposed in this paper can effectively reduce the resources cost while provide better service quality. It achieves the trade-off of the service performance and resources cost.