ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 June 2016, Volume 53 Issue 6
Two-Stage Synchronization Based Thread Block Compaction Scheduling Method of GPGPU
Zhang Jun, He Yanxiang, Shen Fanfan, Jiang Nan, Li Qing’an
2016, 53(6):  1173-1185.  doi:10.7544/issn1000-1239.2016.20150114
Asbtract ( 956 )   HTML ( 0)   PDF (4900KB) ( 653 )  
Related Articles | Metrics
The application of general purpose graphics processing unit (GPGPU) has become increasingly extensive in the general purpose computing fields facing high performance computing and high throughput. The powerful computing capability of GPGPU comes from single instruction multiple data (SIMD) execution model it takes. Currently, it has become the main stream for GPGPU to implement the efficient execution of the computing tasks via massive high parallel threads. However the parallel computing capability is affected during dealing with the branch divergent control flow as different branch path is processed sequentially. In this paper, we propose TSTBC (two-stage synchronization based thread block compaction scheduling) method based on analyzing the previously proposed thread block compaction scheduling methods in inefficient dealing with divergent branches. This method analyzes the effectiveness of thread block compaction and reconstruction via taking the use of the adequacy decision logic of thread block compaction and decreases the number of inefficient thread block compaction. The simulation experiment results show that the effectiveness of thread block compaction and reconstruction is improved to some extent relative to the other same type of methods, and the destruction on data locality inside the thread group and the on-chip level-one data cache miss rate can be reduced effectively. The performance of the whole system is increased by 1927% over the baseline architecture.
Dynamic Scheduling of Dual-Criticality Distributed Functionalities on Heterogeneous Systems
Liu Liangjiao, Xie Guoqi, Li Renfa, Yang Liu, Liu Yan
2016, 53(6):  1186-1201.  doi:10.7544/issn1000-1239.2016.20150175
Asbtract ( 786 )   HTML ( 0)   PDF (5939KB) ( 609 )  
Related Articles | Metrics
Heterogeneous distributed systems are mixed-criticality systems consisting of multiple functionalities with different criticality levels. A distributed functionality contains multiple precedence-constrained tasks. Mixed-criticality scheduling of heterogeneous distributed systems faces severe conflicts between performance and time constraints. Improving the overall performance of systems while still meeting the deadlines of higher-criticality functionalities, and making a reasonable tradeoff between performance and timing are major optimization problems. The F_DDHEFT(fairness of dynamic dual-criticality heterogeneous earliest finish time) algorithm is to improve the performance of systems. The C_DDHEFT (criticality of dynamic dual-criticality heterogeneous earliest finish time) algorithm is to meet the deadlines of higher-criticality functionalities. The D_DDHEFT (deadline-span of dynamic dual-criticality heterogeneous earliest finish time) algorithm is to allow the lower-criticality functionalities to be processed positively for better overall performance while still meeting the deadlines of higher-criticality functionalities, such that a reasonable tradeoff between performance and timing is made. Both example and extensive experimental evaluation demonstrate significant improvement of the D_DDHEFT algorithm.
DLPF: A Parallel Deep Learning Programming Framework Based on Heterogeneous Architecture
Wang Yueqing, Dou Yong, Lü Qi, Li Baofeng, Li Teng
2016, 53(6):  1202-1210.  doi:10.7544/issn1000-1239.2016.20150147
Asbtract ( 1119 )   HTML ( 4)   PDF (4335KB) ( 1095 )  
Related Articles | Metrics
Deep learning plays an important role in machine learning field, and it has been widely used in various applications. The prospect of research and applications of deep learning are huge. However, deep learning also faces several challenges. Firstly, there are many tools in deep learning field, but these tools are not convenient to use for non-expert users because the installation and usage of them are really complex. Secondly, the diversity of deep learning is limited because the flexibility of existing deep learning models is not enough. Furthermore, the training time of deep learning is so long that the optimal hyper-parameters combination cannot be found in a short time. To solve these problems, we design a deep learning programming framework based on heterogeneous architecture in this paper. The programming framework establishes a unified module library which can be used to build a deep model through the visual interface conveniently. Besides, the framework also accelerates the basic modules on heterogeneous platform, and makes the speed of searching optimal hyper-parameters combination be faster. Experimental results show that the programming framework can construct deep models flexibly, and more importantly, it can achieve comparative classification results and better timing performance for a variety of applications. In addition, the framework can search optimal hyper-parameters efficiently and make us infer the relationship of all hyper-parameters.
A Global Hierarchical Adaptive Routing Mechanism in Many-Core Processor Network-on-Chip
Zhang Yang, Wang Da, Ye Xiaochun, Zhu Yatao, Fan Dongrui, Li Hongliang, Xie Xianghui
2016, 53(6):  1211-1220.  doi:10.7544/issn1000-1239.2016.20150149
Asbtract ( 929 )   HTML ( 1)   PDF (3494KB) ( 561 )  
Related Articles | Metrics
Accompanied by the arrival of the era of big data, data center has been becoming an infrastructure in human life.Many-core processor provides a highly parallel capability to solve applications in data center such as sorting and searching efficiently. For the purpose to utilize the parallelism of many-core processor, routing algorithm in interconnection network turns into one of the most important issues in many-core system. Mesh and ring are the most employed topological structures in many-core processor for their features of easy implementation and high scalability. Depending on the scope of congestion information, routing algorithms in mesh and ring can be divided into oblivious routing, local adaptive routing, regional adaptive routing and global adaptive routing. The oblivious routing algorithm applied in the mesh architecture affects the load-balance of the network which is reflected in reducing through-put and high packet latency. Current local adaptive routing and regional adaptive routing both suffer from short-sightedness and are not suitable for large scale mesh structure. And prior global adaptive routings are limited by the slow computation of global route. We propose a novel global hierarchical adaptive routing mechanism, which is comprised of a global congestion information propagation network Roof-Mesh and a global hierarchical adaptive routing algorithm GHARA. Roof-Mesh provides a platform to share global congestion information in a hierarchical way among all nodes on the network. Depending on the information supplied by Roof-Mesh, GHARA reduces the procedure of routing by hierarchically computing from large region perspective to neighbor nodes. The result of experiment shows that GHARA performs better than other regional and global adaptive routings.
A Dataflow Cache Processor Frontend Design
Liu Bingtao, Wang Da, Ye Xiaochun, Zhang Hao, Fan Dongrui, Zhang Zhimin
2016, 53(6):  1221-1237.  doi:10.7544/issn1000-1239.2016.20150317
Asbtract ( 761 )   HTML ( 3)   PDF (5802KB) ( 773 )  
Related Articles | Metrics
In order to exploit both thread-level parallelism (TLP) and instruction-level parallelism (ILP) of programs, dynamic multi-core technique can reconfigure multiple small cores to a more powerful virtual core. Usually a virtual core is weaker than a native core with equivalent chip resource. One important reason is that the fetch, decode and rename frontend stages are hard to cooperate after reconfiguration because of their serialized processing nature. To solve this problem, we propose a new frontend design called the dataflow cache with a corresponding vector renaming (VR) mechanism. By caching and reusing the data dependencies and other information of the instruction basicblock, the dataflow cache exploits the dataflow locality of programs. Firstly, the processor core can exploit better instruction-level parallelism and lower branch misprediction penalty with dataflow cache; Secondly, the virtual core in dynamic multi-core can solve its frontend problem by using dataflow cache to bypass the traditional frontend stages. By experimenting on the SPEC CPU2006 programs, we prove that dataflow cache can cover 90% of the dynamic instructions with limited cost. Then, we analyze the performance effect of adding the dataflow cache to pipeline. At last, experiments show that with a frontend of 4-instruction wide and an instruction window of 512-entry, the performance of the virtual core with dataflow cache is improved up to 9.4% in average with a 28% maximum for some programs.
A Concurrent Memory Race Recording Algorithm for Snoop-Based Coherence
Zhu Suxia, Chen Deyun, Ji Zhenzhou, Sun Guanglu, Zhang Hao
2016, 53(6):  1238-1248.  doi:10.7544/issn1000-1239.2016.20150100
Asbtract ( 819 )   HTML ( 1)   PDF (2922KB) ( 486 )  
Related Articles | Metrics
Memory race record-replay is an important technology to resolve the nondeterminism of multi-core programs. Because of high hardware overhead, the existing memory race recorders based on point-to-point logging approach are difficult to be applied to the practical modern chip multiprocessors. In order to reduce the hardware overhead of point-to-point logging approach, a novel memory race recording algorithm implemented in concurrent logging strategy for chip multiprocessors adopting snoop-based cache coherence protocol is proposed. This algorithm records the current execution points of all threads concurrently when detecting a memory conflict. It extends the point-to-point memory race relationship between two threads to all threads in recording phase, reducing hardware overhead significantly. It also uses distributed logging mechanism to record memory races to reduce bandwidth overhead effectively in the premise of not increasing the memory race log. When replaying, this algorithm uses a simplified producer-consumer model and introduces a counting semaphore for each processor core to ensure deterministic replay, improving replay speed and reducing coherence bandwidth overhead. The simulation results on 8-core chip multiprocessor (CMP) system show that this concurrent recording algorithm based on point-to-point logging approach adds about 171B hardware for each processor, and records about 2.3B log per thousand memory instructions and adds less than 1.5% additional interconnection bandwidth overhead.
Workload Analysis for Typical GPU Programs Using CUPTI Interface
Zheng Zhen, Zhai Jidong, Li Yan, Chen Wenguang
2016, 53(6):  1249-1262.  doi:10.7544/issn1000-1239.2016.20148354
Asbtract ( 988 )   HTML ( 3)   PDF (6942KB) ( 687 )  
Related Articles | Metrics
GPU-based high performance computers have become an important trend in the area of high performance computing. However, developing efficient parallel programs on current GPU devices is very complex because of the complex memory hierarchy and thread hierarchy. To address this problem, we summarize five kinds of key metrics that reflect the performance of programs according to the hardware and software architecture. Then we design and implement a performance analysis tool based on underlying CUPTI interfaces provided by NVIDIA, which can collect key metrics automatically without modifying the source code. The tool can analyze the performance behaviors of GPU programs effectively with very little impact on the execution of programs. Finally, we analyze 17 programs in Rodinia benchmark, which is a famous benchmark for GPU programs, and a real application using our tool. By analyzing the value of key metrics, we find the performance bottlenecks of each program and map the bottlenecks back to source code. These analysis results can be used to guide the optimization of CUDA programs and GPU architecture. Result shows that most bottlenecks come from inefficient memory access, and include unreasonable global memory and shared memory access pattern, and low concurrency for these programs. We summarize the common reasons for typical performance bottlenecks and give some high-level suggestions for developing efficient GPU programs.
Accelerator Support in YARN Cluster
Li Qin, Zhu Yanchao, Liu Yi, Qian Depei
2016, 53(6):  1263-1270.  doi:10.7544/issn1000-1239.2016.20148351
Asbtract ( 901 )   HTML ( 3)   PDF (2880KB) ( 601 )  
Related Articles | Metrics
Accelerators, such as GPU and Intel MIC, are widely used in scientific computing and image processing, and have strong potentials in big data processing and HPC based on cloud platform. YARN is a new generation of Hadoop distributed computing framework. Its adoption of computing resources is only limited to CPU, lacking of support for accelerators. This paper adds the support to nodes with accelerators to YARN to solve the problem. By analyzing the problem of supporting heterogeneous node, there are three identified difficulties which should be solved to introduce hybrid/heterogeneous to YARN. The first one is how to manage and schedule the added accelerator resources in the cluster; the second one is how to collect the status of accelerators to the master node for management; the third one is how to address the contention issue among multiple accelerator tasks concurrently running on the same node. In order to solve the above problems, the following design tasks have been carried out. Resource encapsulation which bundles neighbor nodes into one resource encapsulation is designed to solve the first problem. Management functions which collect the real-time accelerators status from working nodes are designed on the master node to solve the second problem. Accelerator task pipeline which splits accelerator tasks into three parts and executes them in parallel is designed on the nodes with accelerators to solve the third problem. Our scheme is verified with a cluster consisting of 4 nodes with GPU, and the workload testing the cluster includes LU, QR and Cholesky decomposition from the third party benchmark MAGMA, and the program performes feature extraction and clustering upon 50000 images. The results prove the effectiveness of the scheme presented.
Dynamic Task Scheduling Model and Fault-Tolerant via Queuing Theory
He Wangquan, Wei Di, Quan Jianxiao, Wu Wei, Qi Fengbin
2016, 53(6):  1271-1280.  doi:10.7544/issn1000-1239.2016.20148445
Asbtract ( 1095 )   HTML ( 2)   PDF (2600KB) ( 1263 )  
Related Articles | Metrics
The design of efficient dynamic task scheduling and fault-tolerant mechanism is an issue of crucial importance in high-performance computing field. Most existing methods, however, can hardly achieve good scalability on large-scale system. In this paper, we propose a scalable dynamic task scheduling model via N-level queuing theory, which dramatically reduces the programming burden by providing programmer with concise parallel programming framework. On one hand, we utilize the Poisson process theory to analyze the average wait time of tasks, and then decide the task layers according to threshold. On the other hand, we reduce the fault tolerance overhead using region-aware light-weight degradation model. Experimental results with Micro Benchmark on Bluelight system with 32768 cores show that our method achieves good scalability when the tasks take 3.4s on average and the overhead is just 7.2% of traditional model. Running on 16384 cores, pharmacological application DOCK achieves performance improvement by 34.3% with our scheduling. Moreover, the results of DOCK show our fault-tolerant model achieves 3.75%~5.13% performance improvements over traditional mechanism.
A Data Caching Scheme Based on Node Classification in Named Data Networking
Huang Sheng, Teng Mingnian, Wu Zhen, Xu Jianghua, Ji Ruijun
2016, 53(6):  1281-1291.  doi:10.7544/issn1000-1239.2016.20148097
Asbtract ( 796 )   HTML ( 0)   PDF (3490KB) ( 548 )  
Related Articles | Metrics
Compared with the traditional Internet, in-networking caching is one of the most distinguishable features in named data networking (NDN). In NDN, a node caches every passing data packet as a default model. The caching scheme generates a large number of redundant data in in-networking. As a consequence, the networking cache resource is wasted seriously. To solve the problem, a caching scheme based on node classification (BNC) is proposed firstly in this paper. Based on different node positions, the nodes that data packet passes through are divided into two types: “edge” type and “core” type. When data packet passes through the “core” type nodes, by considering location and data popularity distribution at different nodes, it is cached in a node which is beneficial to other nodes. When the data packet passes through the “edge” nodes, a node is selected through data popularity to be beneficial to the client. The simulation results show that the proposed scheme can efficiently improve the in-network hit ratio and reduce the delay and hops of getting the data packet.
Design and Implementation of New SAN Architecture Based on Controller
Lu Yifei, Zhang Zhenwei, Tao Jun, Tang Ling
2016, 53(6):  1292-1305.  doi:10.7544/issn1000-1239.2016.20150025
Asbtract ( 701 )   HTML ( 2)   PDF (4393KB) ( 540 )  
Related Articles | Metrics
With the rapid development of Internet technology and the explosive growth of data, the storage system designed by tightly coupled hardware and software is limiting the development of storage technology severely, and unable to meet the fast changing needs in the mobile Internet and big data era increasingly. Software defined network (SDN) as new network architecture is more suitable for the development of the next-generation data centers. This paper proposes a new controller-based storage area network (SAN)—CSN architecture using the idea of SDN technology. CSN decoupling protocol control plane of FC switch from data plane deploys protocol control plane and distributed functions in the controller. At first, we introduce the architecture of CSN and discuss the specific design of controller for CSN in this paper. Then, the implement of communication and overall mechanism are described in detail. After that, CSN-based on-demand bandwidth-available first routing protocol is introduced and discussed. Finally, we verify the feasibility through the actual development environment. As a result of several experiments, server can establish a connection with the storage more quickly in CSN, and furthermore, CSN has better throughput and faster convergence, as well as more reliability and scalability. In addition, stress testing is conducted in the central controller for CPU, memory and bandwidth.
A Cooperative Game Based Data Center Backbone Network Bandwidth Allocation Policy
Meng Fei, Lan Julong, Hu Yuxiang
2016, 53(6):  1306-1313.  doi:10.7544/issn1000-1239.2016.20148400
Asbtract ( 709 )   HTML ( 0)   PDF (2041KB) ( 614 )  
Related Articles | Metrics
Currently, traffic engineering is typically deployed to improve the utilization of data centers (DC) backbone networks, which usually belongs to the same online service providers. Although the efficiency is remarkable, the bandwidth allocation fairness of different aggregate flow isn’t considered. Hence, the QoS guarantee is restricted. Because the bandwidth resource is expensive and packet loss is typically thought unacceptable, the bandwidth utilization should be maximized, at the same time, the QoS guarantee of different flow should be improved. In this paper, the problem of contending the share bandwidth is modeled as a cooperative game, and different aggregate flow compets the share bandwidth and maximizes the overall bandwidth resource utilization simultaneously, and the optimal bandwidth allocation policy, called cooperation game based bandwidth allocation (CGBA), is obtained through searching the Nash bargaining solution (NBS) of the game and balancing the tradeoff between minimum bandwidth guarantee and bandwidth allocation fairness. Simulation on a Mininet testbed shows that the proposed policy can effectively guarantee minimum bandwidth of each aggregate flow while ensuring the allocation fairness, compared with three other classical bandwidth allocation policies.
A Modeling Framework with Population Dynamics for Content Pollution Proliferation in P2P IPTV System
Wang Haizhou, Chen Xingshu, Du Min, Wang Wenxian
2016, 53(6):  1314-1324.  doi:10.7544/issn1000-1239.2016.20150066
Asbtract ( 880 )   HTML ( 0)   PDF (3003KB) ( 429 )  
Related Articles | Metrics
With the growing maturity of peer-to-peer (P2P) technology, IPTV (Internet protocol television) applications based on it have gained great commercial success, which has received more and more attentions from both global industry and academia. However, the characteristics of distribution, anonymity, rapid propagation, and large-scale users in P2P IPTV systems make them more likely to be vulnerable targets. In this paper, the procedure of content pollution attack in P2P IPTV systems is presented firstly, and then the various user behaviors under the pollution attack are analyzed. With that knowledge, a modeling framework with population dynamics of content pollution in P2P IPTV systems is proposed. Different from the existing content pollution propagation models, this model considers the impact of user behaviors in the content pollution attack of real world P2P IPTV systems, such as continuous watching behavior, arrival behavior, departure behavior. Theoretical analysis and simulation experiments show that the proposed model is a feasible and efficient tool to analyze the content pollution propagation in real world P2P IPTV systems, especially for the dynamic evolution of participating users and the user departure behavior.
Capture-Aware Bayesian Tag Estimation for Dense RFID Tags Environment
Wang Yang, Wu Haifeng, Zeng Yu
2016, 53(6):  1325-1331.  doi:10.7544/issn1000-1239.2016.20148405
Asbtract ( 537 )   HTML ( 0)   PDF (1168KB) ( 567 )  
Related Articles | Metrics
Dynamic framed slotted Aloha algorithm is one kind of commonly used passive radio frequency identification (RFID) tag anti-collision algorithms. In the algorithm, the frame length requires dynamical set to ensure high identification efficiency. Generally, the settings of the frame length are associated with the number of tags and the probability of capture effect. Traditional estimation algorithms can estimate the number of tags and the probability of capture effect, but the number of tags is greater than an initial frame length when it is in dense RFID tags environment, and the estimation errors will increase significantly. In order to solve the problem that the conventional algorithms can not be applied to dense RFID tags environment, capture-aware Bayesian tag estimation is proposed in the paper, and the settings of optimal frame length with non-isometric slots are given. From the experimental results, the proposed algorithms have significantly lower estimation errors than traditional algorithms in dense RFID tags environment. And the identification efficiency got by setting the frame length according to the estimation results is also higher than that of traditional algorithms.
An Intergrated Processing Platform for Traffic Sensor Data Based on Cloud Architecture
Zhao Zhuofeng, Ding Weilong, Han Yanbo
2016, 53(6):  1332-1341.  doi:10.7544/issn1000-1239.2016.20150458
Asbtract ( 868 )   HTML ( 2)   PDF (2084KB) ( 568 )  
Related Articles | Metrics
With the continuous expansion of the scope of traffic sensor networks, traffic sensor data becomes widely available and is continuously being produced. Traffic sensor data gathered by large amounts of sensors shows the massive, continuous, streaming and spatio-temporal characteristics compared with traditional traffic data. How to provide intergrated support for multi-source, massive and continuous traffic sensor data processing is becoming one key issue of the implementation of diversified traffic applications. However, due to the absence of support for spatio-temporal traffic sensor data, it is difficult to develop corresponding applications and optimize the data transfer among different nodes in currenent distributed computing platforms. In this paper, we propose a traffic domain-specific processing model based on spatio-temporal data object. The spatio-temporal data object is treated as the first-class object in the distributed processing model. According to the model, we implement an intergrated processing platform for traffic sensor data based on the share-nothing architecture of cloud computing, which is designed to combine spatio-temporal data partition, pipelined parallel processing and stream computing to support traffic sensor data processing in a scalable architecture with real-time guarantee. Applications of the platform in real project and experiments based on real traffice sensor data show that our platform excels in performance and extensibility compared with traditional traffic sensor data processing system.
An Uncompleted Information Game Based Resources Allocation Model for Cloud Computing
Yuan Ying, Wang Cuirong, Wang Cong, Ren Tingting, Liu Bingyu2
2016, 53(6):  1342-1351.  doi:10.7544/issn1000-1239.2016.20150062
Asbtract ( 756 )   HTML ( 0)   PDF (2945KB) ( 607 )  
Related Articles | Metrics
Considering the competing characteristics under multi-tenant environment in cloud computing and aiming to improve the profit of both the resource supply and demand sides, an uncompleted information game based cloud resource allocation model is proposed. Firstly, a hidden Markov model (HMM) is introduced to predict the current bid of service providers based on the historical resource demand. Then a game model for dynamical pricing is established base on the predicted bid value. It can motivate service providers to choose the optimal bidding strategy in accordance with overall interests and so as to achieve maximum benefits. Finally, a resource allocation model on the basis of unit prices of different types of resources is put forward to guarantee optimal gains for infrastructure provider (INP). The allocation model can support synchronous allocation for both multi service providers and various resources. Simulation results show that, in the pricing game model, the predicted price is close to the actual transaction price which is lower than the actual valuation, so it can guarantee the profit of service providers; the resource allocation model can simultaneously increase infrastructure provider’s revenue.
Incremental Dynamic Social Network Anonymity Technology
Guo Caihua, Wang Bin, Zhu Huaijie, Yang Xiaochun
2016, 53(6):  1352-1364.  doi:10.7544/issn1000-1239.2016.20140695
Asbtract ( 817 )   HTML ( 0)   PDF (3816KB) ( 521 )  
Related Articles | Metrics
With the rapid development and popularity of social networks, how to protect privacy in social networks has become a hot topic in the realm of data privacy. However, in most of existing anonymity technologies in recent years, the social network is abstracted into a simple graph. In fact, there are a lot of incremental changes of social networks in real life and a simple graph can not reflect incremental society network well, so abstracting the social network into the incremental sequences becomes more realistic. Meanwhile, in real life most of the network contains weight information, that is, a lot of social networks are weighted graphs. Compared with the simple graph, weighted graphs carry more information of the social network and will leak more privacy. In this paper, incremental dynamic social network is abstracted into a weighted graph incremental sequence. In order to anonymize weighted graph incremental sequence, we propose a weighted graph incremental sequence k-anonymous privacy protection model, and design a baseline anonymity algorithm (called WLKA) based on weight list and HVKA algorithm based on hypergraph, which prevents the attacks from node point labels and weight packages. Finally, through a lot of tests on real data sets, it proves that WLKA can guarantee the validity of the privacy protection on weighted graph incremental sequence. Compared with WLKA, HVKA is better to retain the original structure properties and improves the availability of weight information, but it also reduces the time cost of anonymous process.
Message Dissemination Based on Interest Matching for Opportunistic Social Networks
Zhang Yushu, Wang Huiqiang, Feng Guangsheng, Lü Hongwu
2016, 53(6):  1365-1375.  doi:10.7544/issn1000-1239.2016.20148412
Asbtract ( 774 )   HTML ( 3)   PDF (3084KB) ( 618 )  
Related Articles | Metrics
Opportunistic social networks can provide message dissemination service through encounters created by nodes’ movement in intermittently connected or completely disconnected wireless networks, but there are still some shortcomings in message dissemination efficiency and user experience. In order to promote the performance of dissemination system and improve the user experience, an opportunistic social network message dissemination mechanism based on node interest matching is proposed. A kind of opportunistic social networks with hybrid architecture is introduced to avoid problems such as incomplete information of network topology and low computing ability of nodes. Considering the fact that node behaviors and interests can both affect the performance of dissemination, the analytical method designed takes both of them as reference factors, and an improved co-clustering algorithm for data with complex relationship is proposed. In addition, the interest matching rate for nodes is used to show what percentage of buffer is occupied by the useful messages to user, and a novel message dissemination strategy is proposed to raise the rate. Simulation results show that the proposed scheme can both improve the capability of opportunistic social networks in terms of delivery rate, latency and buffer usage, and promote the performance of message dissemination systems in terms of efficiency, coverage rate and interest matching rate.
Semi-Supervised Local Expansion Method for Overlapping Community Detection
Chen Junyu, Zhou Gang, Nan Yu, Zeng Qi
2016, 53(6):  1376-1388.  doi:10.7544/issn1000-1239.2016.20148339
Asbtract ( 914 )   HTML ( 1)   PDF (3221KB) ( 742 )  
Related Articles | Metrics
Overlapping community detection has become one of the hottest research issues in the field of network science, as well as attracted the attention of researchers with different backgrounds. A novel semi-supervised local expansion method (SLEM) is proposed for detecting overlapping communities more effectively in real world networks. The proposed method makes use of not only the topology information of the network but also the attribute information of partial vertices. Inspired by the idea of semi-supervised clustering with constraints in the field of machine learning, SLEM starts from utilizing the attribute information of partial vertices to get pairwise constraints which can be used to modify the topology structure of the original network. Afterward, a vertex degree centrality-based seeding method is proposed for selecting seeds as initial communities. Then these seeds expand into local communities by a greedy strategy, after which partial connected close-knit communities are formed. Finally, similarities between different communities are computed on the basis of a community distance measurement, and then near-duplicated communities are combined. Taking more advantage of network information than traditional unsupervised community detection methods, SLEM can produce communities with higher structure quality. Experimental results on both synthetic benchmark networks and real world networks show that SLEM can achieve better effect than the state-of-the-art local expansion methods on the networks of different sparsity degrees.
A Collaborative Filtering Recommendation Algorithm Based on Multiple Social Trusts
Wang Ruiqin, Jiang Yunliang, Li Yixiao, Lou Jungang
2016, 53(6):  1389-1399.  doi:10.7544/issn1000-1239.2016.20150307
Asbtract ( 1130 )   HTML ( 3)   PDF (2115KB) ( 828 )  
Related Articles | Metrics
Collaborative filtering (CF) is one of the most successful recommendation technologies in the personalized recommendation systems. It can recommend products or information for target user according to the preference information of similar users. However the traditional collaborative filtering algorithms have the disadvantages of low recommendation efficiency and weak capacity of attack-resistance. In order to solve the above problems, a novel collaborative filtering algorithm based on social trusts is proposed. Firstly, referring to the trust generation principle in social psychology, a social trust computation method based on multiple trust elements is presented. In social networking environment, trust elements mainly include credibility, reliability, intimacy and self-orientation. Then specific methods of identifying, extraction and quantification of the trust elements are studied in depth. Finally, the trustworthy neighbors of target user are selected in accordance with the social trust, so as to make trust-based collaborative recommendation. Using the FilmTrust and Epinions as test data sets, the performance of the novel algorithm is compared with that of the traditional CF and the-state-of-art methods, as well as the CF based on single trust element. Experimental results show that compared with the other methods, the proposed algorithm not only improves the recommendation precision and recall, but also has powerful attack-resistance capacity.
EDDPC: An Efficient Distributed Density Peaks Clustering Algorithm
Gong Shufeng Zhang Yanfeng
2016, 53(6):  1400-1409.  doi:10.7544/issn1000-1239.2016.20150616
Asbtract ( 1144 )   HTML ( 3)   PDF (2497KB) ( 702 )  
Related Articles | Metrics
Clustering is a commonly used method for data relationship analytics in data mining. The clustering algorithm divides a set of objects into several groups (clusters), and the data objects in the same group are more similar to each other than to those in other groups. Density peaks clustering is a recently proposed clustering algorithm published in Science magazine, which performs clustering in terms of each data object’s ρ value and δ value. It exhibits its superiority over the other traditional clustering algorithms in interactivity, non-iterative process, and non-assumption on data distribution. However, computing each data object’s ρ and δ value requires to measure distance between any pair of objects with high computational cost of O(N\+2). This limits the practicability of this algorithm when clustering high-volume and high-dimensional data set. In order to improve efficiency and scalability, we propose an efficient distributed density peaks clustering algorithm—EDDPC, which leverages Voronoi diagram and careful data replicationfiltering to reduce huge amount of useless distance measurement cost and data shuffle cost. Our results show that our EDDPC algorithm can improve the performance significantly (up to 40x) compared with naive MapReduce implementation.
Multi-Objective Constrained Differential Evolution Using Generalized Opposition-Based Learning
Wei Wenhong, Wang Jiahai, Tao Ming, Yuan Huaqiang
2016, 53(6):  1410-1421.  doi:10.7544/issn1000-1239.2016.20150806
Asbtract ( 1037 )   HTML ( 10)   PDF (2991KB) ( 723 )  
Related Articles | Metrics
Differential evolution is a simple and efficient evolution algorithm to deal with nonlinear and complex optimization problems. Generalized opposition-based learning (GOBL) often guides population to evolve in the evolutionary computing. However, real-world problems are inherently constrained optimization problems often with multiple conflicting objectives. Hence, for resolving multi-objective constrained optimization problems, this paper proposes a constrained differential evolution algorithm using generalized opposition-based learning. In this algorithm, firstly, the transformed population is generated using general opposition-based learning in the population initialization. Secondly, the better individuals that are selected from the initial population and the transformed population using non-dominated sorting, crowding distance sorting and constraint handling techniques compose the new initial population. Lastly, based on a jumping probability, the transformed population is calculated again after generating new populations, and the fittest individuals that are selected from the union of the current population and the transformed population compose new population using the same techniques. The solution can be evolved toward Pareto front slowly according to the generalized opposition-based learning, so that the best solutions set can be found. The proposed algorithm is tested in multi-objective benchmark problems and compared with NSGA-Ⅱ, MOEA/D and other multi-objective evolution algorithms. The experimental results show that our algorithm is able to improve convergence speed and generate solutions which approximate to the best optimal Pareto front.