ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 March 2014, Volume 51 Issue 3
Combination TDoA Localization Algorithms for Far-Field Sound Source Based on Short Base-Line Sensor Network
Cui Xunxue, Lu Songsheng, Chen Yunfei, Gao Haomin, and Yi Ting
2014, 51(3):  465-478. 
Asbtract ( 627 )   HTML ( 1)   PDF (5147KB) ( 583 )  
Related Articles | Metrics
For the localization problem of far-field sound source, the TDoA positioning scheme is proposed by using short base-line sensor network in the paper. Usually there is a kind of localization algorithms which do not need an estimated initial point; on the other hand, there is another kind of algorithms which depend on an initial estimation. In the paper the two kinds of methods are combined, and several combinational localization algorithms are developed. The main idea is that the first kind of localization algorithms are applied to finish a coarse positioning process and output an initial estimation location, which is inputted as the initial value of the second kind of algorithms with high positioning precision to take an advantage and supply a gap from them. The circular error probability of combinational localization algorithms is proposed to demonstrate the feasibility of the performance evaluation criterion, which is based on Monte-Carlo method and a simple case of geometry deployment. The circular error probability of combinational algorithms is numerically analyzed, and it is concluded that the combination of spherical interpolation and least squared equation error has an optimal performance. The box-plot character is experimented about this combination algorithm. According to the simulation experiments of different distances and standard deviation of sound range difference, and some sound source data obtained from the actual field scene, the conclusion of the paper has been verified.
Completely-Competitive-Equilibrium-Based Double Spectrum Auction Mechanism
Huang He, Sun Yu'e, Chen Zhili,, Xu Hongli,, Xing Kai,, and Chen Guoliang
2014, 51(3):  479-490. 
Asbtract ( 551 )   HTML ( 1)   PDF (3093KB) ( 761 )  
Related Articles | Metrics
Due to spectrum scarcity and its inefficient usage, many emerging services going wireless are in shortage of spectrum resources. Auction has been widely used for resource allocation in many fields. Spectrum auction is deemed as a promising solution to relieve the conflict between scarce spectrum resource supply and ever-increasing demand, which could achieve the optimal of the spectrum reallocation through the market competition. Existing researches on spectrum auction mainly focus on spectrum spatial reuse and spectrum utilization. Nevertheless, their clearing price often deviates from actual value of spectrum because of too small market scale, and the blindness of buyers' bid factors are not taken into consideration. To solve the above problem, a completely-competitive-equilibrium-based double spectrum auction mechanism (ComDSA) is proposed. Firstly, multi-player game between players can be modeled as double-person game between person and nature. And then, the probability of market types and Harsanyi transformation are introduced in ComDSA to transform the problem into complete information game. Finally, the clearing price achieves the completely competitive equilibrium through multi-round bidding according to continuous bidding model. Solid theoretical analysis and extensive simulation study illustrate the improvement of spectrum reuse and transaction rate on the basis of completely competitive equilibrium.
Spectrum Migration Scheme Based on Interval Mamdani Fuzzy Inference in Cognitive Radio Networks
Wang Zhendong, Wang Huiqiang, Feng Guangsheng, Lü Hongwu, and Chen Xiaoming
2014, 51(3):  491-501. 
Asbtract ( 600 )   HTML ( 2)   PDF (3491KB) ( 444 )  
Related Articles | Metrics
The existing opportunistic spectrum access technologies can improve spectrum resource utilization significantly, but they still have some shortcomings in guaranteeing QoS of secondary users and effective utilization probability of spectrum resource in cognitive radio networks. In order to improve QoS of secondary users in cognitive radio networks, and promote the real performance of cognitive radio networks, a new concept named spectrum migration is introduced, and a novel spectrum migration scheme based on interval Mamdani fuzzy inference is put forward. In the spectrum migration scheme, unnecessary treated data and system load can be reduced by applying pre-decision method. In addition, spectrum occupation rate and link maintenance rate of licensed spectrum are comprehensively considered as spectrum migration factors. Then, spectrum migration degree can be calculated by fuzzy inference to guide secondary users to the optimal spectrum holes. To shorten inference time, interval Mamdani fuzzy inference is proposed based on Mamdani fuzzy inference, and the twice judgments are utilized to reduce the complexity of the spectrum migration process. Simulation results show that the scheme can decrease forced termination probability, service retransmission probability and spectrum migration times of secondary users' service transmission, maintain a higher system throughput and improve the effective utilization probability of cognitive radio networks spectrum resource at the same time.
CM-MAC: A Cluster-Based Multi-Channel MAC Protocol for VANET
He Peng, Yan Baoping, Li Zhi, and Sun Limin
2014, 51(3):  502-510. 
Asbtract ( 890 )   HTML ( 0)   PDF (2576KB) ( 779 )  
Related Articles | Metrics
VANET (vehicular ad hoc network) is a new kind of MANET (mobile ad hoc network) for ITS (intelligent transportation system). With the development of vehicles and mobile ad hoc network technology, VANET has become an emerging field of research, which is envisioned to provide a variety of safety applications (such as collision avoidance and warning systems) that require high reliability and bounded delay and non-safety applications (such as Internet access and electronic commerce) that are throughput sensitive. This paper proposes a stable clustering algorithm according to VANET features. On the basis of this algorithm, according to DSRC (dedicated short range communication) allocation of CCH (control channel) and SCH (service channel), considering the wireless communication interference between vehicles and the QoS requirements for different applications, the paper also proposes a cluster-based multi-channel hybrid MAC protocol. The non-competition based TDMA scheme is taken for intra-cluster communications and the competition based CSMA/CA scheme is taken for inter-cluster communications. The different SCHs are allocated for neighbor clusters. Such a concept is attractive as it can be combined with cluster-based routing strategies. So the overhead introduced by the clustering process is beneficial not only on one layer, but for the whole communication process. Simulation experiments show that the protocol is superior to the current MAC protocol in the delay requirement of safety applications and the throughput of non-safety applications.
Time Synchronization Method of Wireless Network for Factory Automation
Yang Yutuo, Liang Wei, Zhang Xiaoling, and Liu Shuai
2014, 51(3):  511-518. 
Asbtract ( 816 )   HTML ( 0)   PDF (1964KB) ( 659 )  
Related Articles | Metrics
As one of the key technologies in wireless networks, time synchronization is the supporting technology of factory automation wireless network that guarantees high reliability and high real-time. The high rate and simply effective topology of IEEE 802.11 make it become the best protocol for wireless network of factory automation. This paper first analyzes the importance of clock synchronization precision for high reliability and high real-time in factory automation and proves that high time synchronization precision is the important goal for wireless network in factory automation. Then it designs and realizes the high precision time synchronization method of wireless network for factory automation based on TDMA mechanism on commercial IEEE 802.11 hardware. The high-precision time synchronization method includes three algorithms that are clock skew measurement, clock skew prediction, and clock skew compensation, which are interlocked in one cycle of superframe of TDMA mechanism for high precision of time synchronization. Experiments show that through measurement, prediction and compensation, the time synchronization precision is improved obviously relative to the traditional two-way time synchronization algorithm. The high reliability and strong adaptability of this method adapt to the factory environment, and the high precision of time synchronization is helpful to realize the TDMA timeslot schedule in microsecond of wireless network for factory automation.
Fast Cross-Domain Classification Method for Large Multisources/Small Target Domains
Gu Xin, and Wang Shitong
2014, 51(3):  519-535. 
Asbtract ( 708 )   HTML ( 2)   PDF (5338KB) ( 529 )  
Related Articles | Metrics
Most of current cross-domain classifiers are proposed for single source and single target domains and basically based on the assumption that there is a balance between these two domains. However, this assumption is often violated in the real world. When these classifiers are applied to imbalanced domains, their classification performance and robustness to noise will heavily degrade. For example, Baysian classifier depends heavily on the estimation of the sample distributions of source and target domains. When large source domain but only a small target domain are available, the classification accuracy of this classifier will degrade a lot. In order to address this imbalanced issue and use abundant data in the source domain to do an effective transfer learning between small target domain and multisource domains, a novel fast cross-domain classification method called IMCCL for “small-target+multisource” datasets is proposed here. The proposed method IMCCL is rooted at logistic regression model and MAP. Accordingly, the proposed IMCCL is integrated together with the latest advance—CDdual algorithm—to develop its fast version IMCCL-CDual for “small-target+large-multisource” domains. This fast classification method is also theoretically analyzed. Our experimental results on artificial and real datasets indicate the effectiveness of the proposed method IMCCL-CDual in classification accuracy, the classification speed, robustness and domain adaption.
Research on Sensitivity Analysis for Dynamic Bayesian Networks
Yao Hongliang, Zhang Yiming, Li Junzhao, and Wang Hao
2014, 51(3):  536-547. 
Asbtract ( 759 )   HTML ( 0)   PDF (2539KB) ( 679 )  
Related Articles | Metrics
Sensitivity analysis is an important method for researching the characteristic of complex system. The present algorithms of dynamic sensitivity analysis are used to deal with some special kinds of dynamic Bayesian network and their computation complexity is very high. Therefore, in order to handle the sensitivity analysis on general dynamic Bayesian network effectively, a new algorithm is presented based on junction-tree algorithm (DSA_JT). The function relations between parameters and conditions probability distribution of object node are established by message-propagation on junction-tree. DSA_JT can lower exponential time and reduce calculation significantly. But its computation complexity is still high. In order to further improve the computational performance, DSA_BK is introduced based on DSA_JT algorithm. DSA_BK introduces the factor idea with the product of the probability of the subsystems to approximate the probability of the whole system. DSA_BK reduces the size of the root by the local marginalisation of interface and updating the joint probability distribution of model. Compared with DSA_JT, DSA_BK can lower the computational exponential times further and significantly improves the calculation efficiency, and the error is bound to be proved. Then, based on abstracting the process of the two algorithms, the computation formulas of dynamic sensitivity function are proved, and it is shown that the two algorithms can effectively deal with sensitivity analysis of general dynamic Bayesian network. Finally, the effectiveness is proved in the experiment on the network of the Shanghai Stock.
A Least Square Actor-Critic Approach for Continuous Action Space
Zhu Fei, Liu Quan, Fu Qiming, and Fu Yuchen
2014, 51(3):  548-558. 
Asbtract ( 736 )   HTML ( 0)   PDF (2063KB) ( 617 )  
Related Articles | Metrics
The research of the reinforcement learning problem with continuous action space is one of the most challenging and difficult concerns for the time being. Conventional reinforcement learning algorithms are usually aimed at solving the problems of the small scale and discrete action space. For the problems with continuous actions space, most approaches tend to discretize the continuous space by taking advantage of prior information, and then try to find out the optimal solution. However, in many practical applications, action spaces are usually continuous, and moreover little prior information is available for discretizing the action space appropriately. In order to solve this problem, we hereby put forward a least square actor-critic algorithm (LSAC) for continuous action space, which takes advantage of approximate function to represent value function and policy respectively; and uses online least square method to obtain the parameters of approximate value function and approximate policy, where approximate value function is considered as the critic part to guide the solution of the parameter of approximate policy. We applied LSAC to solve the cart pole balancing problem and the mountain car problem which are characterized by continuous action space, and then compared the results with those returned by two classic algorithms, Cacla (continuous actor-critic learning automaton) algorithm and eNAC (episodic natural actor-critic) algorithm. The experimental results show that LSAC can solve the continuous action space problem well and has better executing performance.
Semantics Social Network Community Detection Algorithm Based on Topic Comprehensive Factor Analysis
Yang Jing, Xin Yu, and Xie Zhiqiang
2014, 51(3):  559-569. 
Asbtract ( 638 )   HTML ( 1)   PDF (3752KB) ( 712 )  
Related Articles | Metrics
Aiming at the problem that general social network community detection algorithm could only detect the community with sole relationship, which couldn't represent the semantics similarity of real social community and couldn't resolve the problem of SSN community detection with multiple semantic topic, we propose an SSN (semantics social network) detection algorithm based on topic comprehensive factor analysis. This algorithm firstly defines the multivariate semantics information as topic, takes multivariate TCF (topic comprehensive factor) as the measurement of topics, and the difference of topic density as polymerization direction, and establishes the initial community structure. Secondly, we establish the cost function with the goal of minimizing the semantics similarity inside the communities and maximizing the semantics similarity between different communities, when some boundary nodes change the community. Thirdly, the SAOP (simulated annealing optimization policy) is established based on the initial community and cost function, which takes the value of cost function as parameter when the boundary nodes change. We could optimize the initial community structure globally and achieve the semantics community detection with multivariate. Finally, the effectiveness of SSN community detection is proved by a serial of simulations.
Properties and Distributed Tableaux Reasoning Algorithm for D3L(ccy)
Zhao Xiaofei, Tian Dongping, Zhang Wenbo, and Shi Zhongzhi
2014, 51(3):  570-579. 
Asbtract ( 661 )   HTML ( 1)   PDF (1659KB) ( 457 )  
Related Articles | Metrics
As the distributed extension of traditional dynamic description logics(DDL), distributed dynamic description logics(D3L) enables reasoning with multiple heterogeneous DDL ontologies interconnected by directional semantic mapping. D3L captures the idea of importing and reusing knowledge between several ontologies. This idea combines well with the basic assumption of the distributed and dynamic systems such as the semantic Web and the heterogeneous information integration system. We find in the case when more than two ontologies are involved and bridge rules form chains, knowledge does not always propagate along chains of bridge rules even if we would expect it. Inspired by package-based description logics, we propose D3L(ccy) by imposing so called compositional consistency condition on domain relations in D3L interpretations. Under this semantics knowledge propagates along chains of bridge rules correctly. We research the properties and distributed Tableaux reasoning algorithm for D3L(ccy) systematically. We prove that D3L(ccy) satisfies the monotonicity property(D3L(ccy) is a monotonic logic), the directionality property(the effect of bridge rules is directional) and the restrained inconsistency propagation property(if some of the local ontologies is inconsistent, it does not necessarily pollute the whole distributed system). Furthermore we provide a distributed Tableaux reasoning algorithm which is sound and complete for deciding satisfiability of concepts in D3L(ccy). Compared with original one, the extended D3L provides more reasonable logic foundation for distributed and dynamic systems such as the Semantic Web and the information integration system.
An ε Constrained Biogeography-Based Optimization with Dynamic Migration
Bi Xiaojun, Wang Jue, Li Bo, and Li Jicheng
2014, 51(3):  580-589. 
Asbtract ( 494 )   HTML ( 0)   PDF (1260KB) ( 612 )  
Related Articles | Metrics
A new ε constrained biogeography-based optimization with dynamic migration, εBBO-dm, is proposed to solve constrained optimization problems. In the proposed algorithm, the ε constrained method is utilized to handle the constraints. According to the constraint violation of the colony, the ε level is set to utilize the useful information for the better infeasible individual sufficiently and to improve the search efficiency for the feasible space. Simultaneously, based on the feature of ε constrained method, a new ordering rule based on ε constrained is used to obtain the immigration rate and the emigration rate, which can dynamically balance the relation between the feasible individuals and the infeasible individuals. Additionally, a new dynamic migration strategy is shown to enhance the search ability of the migration mechanism. Eventually, with the purpose of improving the precision of convergence, the piecewise logistic chaotic map is introduced to improve the variation mechanism. Numerical experiments on 13 well-known benchmark test functions show that εBBO-dm is competitive with other optimization algorithms on the accuracy and the speed of convergence, especially when being applied to solve complex single-objective COPs.
Locality Preserving Twin Support Vector Machines
Hua Xiaopeng, and Ding Shifei,
2014, 51(3):  590-597. 
Asbtract ( 636 )   HTML ( 1)   PDF (1655KB) ( 676 )  
Related Articles | Metrics
For classification problems, support vector machine (SVM) achieves state-of-the-art performance in many real applications. A guarantee of its performance superiority is from the maximization of between-class margin. However, SVM solution does not take into consideration the class distribution and may result in a non-robust solution. Recently, multiple surface support vector machine (MSSVM), as an extension of traditional SVM, has been one of the hot research topics in the field of pattern recognition. Unfortunately, many known MSSVM classification algorithms have not considered the underlying local geometric structure and the descriminant information fully. Therefore, a locality preserving twin support vector machine (LPTSVM) is presented in this paper by introducing the basic theories of the locality preserving projections (LPP) into the MSSVM. This method inherits the characteristic of MSSVM for dealing with the XOR problem, fully considers the local geometric structure between samples and shows the local underlying discriminant information. The linear case, the small sample size case and the nonlinear case of the LPTSVM are discussed in this paper. The LPTSVM optimization problem in the small sample size case is solved by using dimensionality reduction through principal component analysis (PCA) and the problem in the nonlinear case is transformed into an equivalent linear LPTSVM problem under empirical kernel mapping (EKM) method. Experimental results on the artificial and real datasets indicate the effectiveness of the LPTSVM method.
New Worst-Case Upper Bounds for X2SAT
Zhou Junping, Jiang Yunhui, and Yin Minghao
2014, 51(3):  598-605. 
Asbtract ( 488 )   HTML ( 0)   PDF (696KB) ( 518 )  
Related Articles | Metrics
The problem of X2SAT is a generalized problem of XSAT. Given a conjunctive normal function, this problem asks whether there exists a satisfying assignment for the input formula, such that exactly two literals in each clause are true. The rigorous theoretical analysis of the algorithms for solving X2SAT problem is proposed in the literature. An algorithm X2SAT-N based on Davis-Putnam-Logemann-Loveland (DPLL) is first presented to solve the X2SAT problem. In the algorithm X2SAT-N, a simplification process is first called to simplify the input formula. After that, several strategies are used to cut the branches on different kinds of clauses. It can be proved that the worst-case upper bound of algorithm X2SAT-N for solving X2SAT should be O(1.4203n), which improves the previous fastest X2SAT algorithm of O(1.4511n) up to now. Here n denotes the number of variables in a formula. Additionally, an algorithm called as X2SAT-L for solving X2SAT problem is also presented. This algorithm is also based on DPLL and using similar simplification process. We prove that the worst-case upper bound of this algorithm is O(1.3643l), where l is the length of the formula. Within our knowledge, this algorithm is the best algorithm for X2SAT with the parameter of the length of the formula.
Optimization of Range Queries and Analysis for MapReduce Systems
Zhao Hui, Yang Shuqiang, Chen Zhikun, Yin Hong, and Jin Songchang
2014, 51(3):  606-617. 
Asbtract ( 562 )   HTML ( 0)   PDF (2944KB) ( 562 )  
Related Articles | Metrics
Recently, MapReduce parallel computing paradigm has gained extensive attention from industry and academia. MapReduce works well in Google, Yahoo! and Facebook on massive data processing. However, MapReduce-based systems originally were used to manage massive un-structured and semi-structured data, such as inverted indexing, Web page ranking, log analyzing etc. They ignored the optimizing of structured data, such as the brute-force scanning, which is inefficient for some common workloads in structured data management, such as select, filter etc. For this problem, we introdue a global indexing technology, which has been widely used in database, aiming to optimizing queries and analysis in a range of the overall dataset. Global index will help reduce redundant map tasks, resulting in decreasing the cost of I/O and scheduling. Finally, we evaluate the effect of our framework by four data selection ratios which are 80%, 50%, 30% and 10% under different cluster sizes. We find that the response time has 5x improvement at most, I/O cost improves 10x at most and cost of scheduling improves 11x at most.
An Adaptive Tasks Scheduling Method Based on the Ability of Node in Hadoop Cluster
Zheng Xiaowei, Xiang Ming, Zhang Dawei, and Liu Qingkun
2014, 51(3):  618-626. 
Asbtract ( 650 )   HTML ( 0)   PDF (2067KB) ( 605 )  
Related Articles | Metrics
Scheduling algorithm has played an important role in improving Hadoop cluster performance. However, the phenomenon of uneven load distribution exists in current Hadoop inherent task-level scheduling methods. In order to maintain the load status of each node in the cluster, we focus on the analysis and research about the implementation capacity of cluster nodes. An adaptive tasks scheduling method based on the node capability is proposed in this paper. According to the node history and the current load status, the method takes node performance, task characteristics and node failure rate as the parameters to calculate node executive ability. On this basis, the different amount of tasks are assigned to cluster nodes. Thus, joined nodes adaptively adjust the amount of running tasks to make the node suited for different tasks better. Finally experimental results indicate that the proposed adaptive tasks scheduling method can make the total task completion time being reduced significantly, and moreover, the load on each node gets more balanced and the node resource utilization is more reasonable.
Top-k Query Processing of Reverse Skyline in Metric Space
Zhang Bin, Jiang Tao, Gao Yunjun, and Yue Guangxue
2014, 51(3):  627-636. 
Asbtract ( 704 )   HTML ( 3)   PDF (2494KB) ( 518 )  
Related Articles | Metrics
Unlike the traditional metric skyline query, this paper studies a novel skyline query in metric space, called metric top-k reverse skyline (MkRS), which executes the skyline computation from a reverse perspective. Given a query object q and a monotonic preference function f, MkRS returns k subsets which contain exactly m objects of the input dataset P such that each subset G has q in its metric skyline. When evaluating the query, we need to perform exhaustive search of selecting m objects from n objects of dataset P and metric skylines for each combination. These computations are costly due to the extremely large search space. We first present STS (sort and threshold skyline) algorithm to speed-up the computation, which exploits the sorting machinery so that only a part of subsets needs examining and STS can early stop the computation. Then, we reuse the information produced during index accessing which reduces more than 80% I/O accesses of STS and propose the customized algorithm rSTS based on STS. The experimental results show that our proposed algorithms are effective and efficient.
An Automatic Performance Tuning Framework for FFT on Heterogenous Platforms
Li Yan, and Zhang Yunquan
2014, 51(3):  637-649. 
Asbtract ( 930 )   HTML ( 1)   PDF (5185KB) ( 686 )  
Related Articles | Metrics
The fast Fourier transform (FFT) is an important computational kernel in scientific and engineering computation which has broad applicability, especially in the field of signal processing, image processing and solving partial differential equation. In this paper, we propose an automatic performance tuning framework, called MPFFT (massively parallel FFT), which is well-suited to heterogeneous platforms such as GPU (graphic processing unit) and APU (accelerated processing unit). We employ two-stage adaptation methodology in two levels, namely installation time and runtime. At installation time, there is a code generator that could automatically generate FFT codelet for arbitrary size called by GPU kernel. The code generator could also generate high optimized code for GPU kernel according to auto-tuning techniques at runtime. Experimental results demonstrate that MPFFT substantially outperforms the clAmdFft library both on AMD GPU and APU. For 1D, 2D and 3D FFT, the average speedup of MPFFT compared with clAmdFft 1.6 achieves up to 3.45, 15.20, 4.47 on AMD APU A-360 and 1.75, 3.01, 1.69 on AMD HD7970. It also achieves comparable performance as the CUFFT library on NVIDIA GPU, and the overall performance is within 93% of CUFFT 4.1 on Tesla C2050, and the maximum speedup is 1.28.
Community Service Selection Algorithm for Network Simulation Task
Sun Liyang, Li Yang, Lin Jianning, Mao Shaojie, and Liu Zhong
2014, 51(3):  650-660. 
Asbtract ( 432 )   HTML ( 0)   PDF (3846KB) ( 549 )  
Related Articles | Metrics
Net-centric simulation (NCS) is a new simulation mode, which provides a brand new method to support distributed simulation in WAN. To meet different users' requirements, building a new simulation task community (STC) dynamically by integrating distributed various services on network is the key problem of NCS. A simulation task community service selection algorithm (STCSSA) is proposed. It aims at the building of STC and adopts an improved particle swarm optimization (PSO) which overcomes the shortcomings of traditional PSO. The essence of the algorithm is that the problem of STC construction is transformed into a multi-objective service selection optimization with QoS constraints. Firstly, QoS model of STC service and the evaluation of STC composition process is described. Then, STCSSA is introduced in terms of the function steps, including a dynamic inertia weight strategy and an alternative method of mutation. Finally, several experiments are conducted to test STCSSA. On the one hand, STCSSA is tested with typical functions to prove the proposed algorithm can not only improve the convergence speed of service options, but also avoid the algorithm going into a local optimum. On the other hand, practical application is presented in detail. STCSSA is utilized to help a military STC search simulation services. Compared with some other PSO algorithms, STCSSA can effectively support large-scaled STC construction in NCS environment.
Correlated Software Prediction for Indirect Branch in Dynamic Translation Systems
Jia Ning, Yang Chun, Tong Dong, and Wang Keyi
2014, 51(3):  661-671. 
Asbtract ( 449 )   HTML ( 0)   PDF (3522KB) ( 417 )  
Related Articles | Metrics
Dynamic translation system should perform an address translation for each execution of indirect branch instructions, so handling indirect branch becomes a major performance overhead of the system. The translation systems without hardware support always use software prediction to reduce the overhead of address translation, but the low prediction accuracy restricts the performance improvement. This paper analyzes the performance bottleneck of software prediction, and proposes a novel prediction mechanism called low-overhead correlated software prediction (LOCSP), which can significantly improve the prediction accuracy using branch correlation. LOCSP uses code replicas to distinguish the different branch occasions of an indirect branch instruction. By making different control flows execute different code replicas, and deploying individual software prediction chains for each replica, LOCSP realizes correlated prediction without any increase in dynamic instruction count. Meanwhile, LOCSP classifies the indirect branch instructions by dynamic profiling, and only applies correlated prediction on hot and hard-to-predict instructions, further minimizing prediction overhead. The experiment shows that, compared with software prediction, LOCSP can improve the average prediction accuracy from 58.9% to 82.2%, thus reduces the performance overhead by 19.3% on average, up to 41.9%, while only increases static code size by 2.4% on average. Furthermore, LOCSP can cooperate with other optimization techniques of handling indirect branch.
Data Object Scale Aware Rank-Level Memory Allocation
Zhong Qi, Wang Jing, Guan Xuetao, Huang Tao, and Wang Keyi
2014, 51(3):  672-680. 
Asbtract ( 439 )   HTML ( 1)   PDF (3470KB) ( 440 )  
Related Articles | Metrics
The main memory is organized as bank/rank/channel structure, which can be used to improve performance by exploiting parallelism and locality. The previous works have employed sub-ranking techniques to add more bank resource, or guided the bank partition among parallel running processes for isolating the memory interference. However, these methods ignore the interference problem when the memory system involves multiple ranks. In this paper, through an analysis on data layout, we find that program's data is inclined to cluster into a single rank because of the limited working set. This phenomenon results in the underutilized memory resource and system performance. We propose DSRA (data object scale aware rank-level memory allocation), which provides a software-only way to deal with this problem. Based on the cost of interference among objects, DSRA puts them into different ranks to avoid cluster. Meanwhile, with the information extracted by compiler and operating system, it requires no modification of application and underlying hardware. Measurement shows that DSRA, implementing in the Linux 2.6.32 kernel and running on two different types of processors, improves the performance of memory intensive NAS benchmark and SPEC CPU2000 by up to 16%(6.8% on average), with little effect on the cache miss rate.
A Cache Locking and Direct Cache Access Based Network Processing Optimization Method
Su Wen,, Zhang Longbing,, Gao Xiang,, and Su Menghao,
2014, 51(3):  681-690. 
Asbtract ( 815 )   HTML ( 0)   PDF (2736KB) ( 457 )  
Related Articles | Metrics
As network speed continues to grow, new challenges of network processing are emerging. Although many innovated solutions have been proposed in recent years, based on the analysis of the memory accessing trace and program locality in network processing, we point out that there are still defects in current processor network subsystem designs. Moreover, we find that the interaction and context switch between network processing and local programs are bottlenecks of network performance promotion, which have not been paid enough attention before. Motivated by the studies, a hardware and software co-design solution for network optimization is proposed, which includes improved direct cache access scheme, cache locking for system software, related interconnection architecture and the coherence protocol. The experiment shows that based on the proposed system, the peak TCP bandwidth is increased about 48%, while the UDP package loss rate is decreased by 40% under heavy pressure, and the network latency is decreased by more than 10%. Especially, the network bandwidth is improved about 44% when network processing benchmark executes with SPEC2000 programs in parallel. Also we discuss the collaboration scheme among the proposed solution and other main stream network optimization technologies, as well as the basic rules for the collaboration of multiple network optimization techniques.