ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 February 2015, Volume 52 Issue 2
Big Data Privacy Management
Meng Xiaofeng, Zhang Xiaojian
2015, 52(2):  265-281.  doi:10.7544/issn1000-1239.2015.20140073
Asbtract ( 3104 )   HTML ( 44)   PDF (3345KB) ( 2004 )  
Related Articles | Metrics
With the high-speed development of information and network, big data has become a hot topic in both the academic and industrial research, which is regarded as a new revolution in the field of information technology. However, it brings about not only significant economic and social benefits, but also great risks and challenges on individuals’ privacy protection and data security. Currently, privacy related with big data has been considered as one of the greatest problems in many applications. This paper analyzes and summarizes the categories generated by big data, the privacy properties and types in terms of difference reasons, the challenges in technologies and laws and regulations on managing privacy, and describes the differences of the current technologies which handle those challenges. Finally, this paper provides an active framework for managing big data privacy on the actual private problems. Under this framework, we illustrate some privacy-preserving technology challenges on big data.
Functional Dependencies Discovering in Distributed Big Data
Li Weibang, Li Zhanhuai, Chen Qun, Jiang Tao, Liu Hailong, Pan Wei
2015, 52(2):  282-294.  doi:10.7544/issn1000-1239.2015.20140229
Asbtract ( 1725 )   HTML ( 4)   PDF (2922KB) ( 988 )  
Related Articles | Metrics
Discovering functional dependencies (FDs) from relational databases is an important database analysis technique, which has a wide range of applications in knowledge discovery, database semantics analysis, data quality assessment and database design. Existing functional dependencies discovery algorithms are mainly applied in centralized data, and are suitable to the case of small data size only. However, it is far more challenging to discover functional dependencies in distributed databases, especially with big data. In this paper, we propose a novel functional dependencies discovering approach in distributed big data. Firstly we execute functional dependencies discovering algorithm in parallel in each node, then prune the candidate set of functional dependencies based on the results of discovery. Secondly we group the candidate set of functional dependencies according to the features of candidate functional dependencies’ left hand side, and execute functional dependencies discovery algorithm based on each candidate set in parallel, and get all the functional dependency eventually. We analyze the number of candidate functions with regard to different groups, and data shipment and load balance are taken into account when discovering functional dependencies. Experiments on real-world big datasets demonstrate that compared with previous discovering methods, our approach is more effective in efficiency.
Automatically Discovering of Inconsistency Among Cross-Source Data Based on Web Big Data
Yu Wei,Li Shijun,Yang Sha, Hu Yahui,Liu Jing, Ding Yonggang, Wang Qian
2015, 52(2):  295-308.  doi:10.7544/issn1000-1239.2015.20140224
Asbtract ( 1785 )   HTML ( 3)   PDF (2541KB) ( 1196 )  
Related Articles | Metrics
Data inconsistency is a pervasive phenomenon existing in Web, which has gravely affected the quality of Web information. The current research of data inconsistency mainly focused on traditional database application. It is lack of consistency research on diverse, complicated, rapidly-changing and abundant Web big data. On account of multi-source heterogeneous Web data and 5V features of big data, we present unified algorithm of data extraction and Web object data model based on three aspects: website structure, characteristic data and knowledge rules. We study and sort the features of data inconsistency, and establish inconsistency classifier model, inconsistency constraint mechanism and inconsistency inference algebra computing system. Then based on cross-source Web data consistency theory system, we've researched Web inconsistency data automatically discovery method via constraint rules detection and statistical deviation analysis. Combining the characters of the two methods, we propose an automatically discovery algorithm of Web inconsistency data in view of hierarchy probabilistic judgment based on Hadoop MapReduce architecture. The framework is applied to multiple B2C electronic commerce big data on Hadoop platform and compared with traditional architecture and other methods. The results of our experiment proves the accuracy and efficiency of the method.
Theme-Aware Task Assignment in Crowd Computing on Big Data
Zhang Xiaohang,Li Guoliang, Feng Jianhua
2015, 52(2):  309-317.  doi:10.7544/issn1000-1239.2015.20140267
Asbtract ( 2007 )   HTML ( 1)   PDF (2171KB) ( 1140 )  
Related Articles | Metrics
Big data has brought tremendous challenges for the traditional computing model, because of its inherent characteristics such as large volume, high velocity, high variety, low-density value. On the one hand, the large volume and high velocity require the techniques of massive data computation and analysis; on the other hand, the high variety and low-density value make big data computing tasks highly depend on the complex cognitive reasoning technology. To overcome the coexistence challenges of massive data analysis and complex cognitive reasoning, human-machine collaboration based crowd computing is an effective way to solve the big data problem. In crowd computing, task assignment is one of the basic problems. However the current crowdsourcing platforms cannot support the active task assignment, which iteratively assigns tasks to appropriate workers based on the knowledge background or users. To address this problem, we propose an iterative theme-aware task assignment framework, and deploy it into existing crowdsourcing platforms. The framework includes two components. The first component is task modeling, which models the tasks as a graph where vertices are tasks and edges are task relationships. The second component is the iterative task assignment algorithm, which identifies the themes of the workers by their historical records, computes the workers’ accuracy on different themes, and assigns the tasks to the appropriate workers. Various experiments validate the effectiveness of our method.
Distributed Stream Processing: A Survey
Cui Xingcan, Yu Xiaohui, Liu Yang, Lü Zhaoyang
2015, 52(2):  318-332.  doi:10.7544/issn1000-1239.2015.20140268
Asbtract ( 3413 )   HTML ( 42)   PDF (2523KB) ( 2782 )  
Related Articles | Metrics
The rapid growth of computing and networking technologies, along with the increasingly richer ways of data acquisition, has brought forth a large array of applications that require real-time processing of massive data with high velocity. As the processing of such data often exceeds the capacity of existing technologies, there has appeared a class of approaches following the distributed stream processing paradigm. In this survey, we first review the application background of distributed stream processing and discuss how the technology has evolved to its current form. We then contrast it with other big data processing technologies to help the readers better understand the characteristics of distributed stream processing. We provide an in-depth discussion of the main issues involved in distributed stream processing, such as data models, system models, storage management, semantic guarantees, load control, and fault tolerance, pointing out the pros and cons of existing solutions. This is followed by a systematic comparison of several popular distributed stream processing platforms including S4, Storm, Spark Streaming, etc. Finally, we present a few typical applications of distributed stream processing and discuss possible directions for future research in this area.
Big Data Analysis and Data Velocity
Chen Shimin
2015, 52(2):  333-342.  doi:10.7544/issn1000-1239.2015.20140302
Asbtract ( 2182 )   HTML ( 150)   PDF (3828KB) ( 1547 )  
Related Articles | Metrics
Big data poses three main challenges to the underlying data management systems: volume (a huge amount of data), velocity (high speed of data generation, data acquisition, and data updates), and variety (a large number of data types and data formats). In this paper, we focus on understanding the significance of velocity and discussing how to face the challenge of velocity in the context of big data analysis systems. We compare the requirements of velocity in transaction processing, data stream, and data analysis systems. Then we describe two of our recent research studies with an emphasis on the role of data velocity in big data analysis systems: 1) MaSM, supporting online data updates in data warehouse systems; 2) LogKV, supporting high-throughput data ingestion and efficient time-window based joins in an event log processing system. Comparing the two studies, we find that storing incoming data updates is only the minimum requirement. We should consider velocity as an integral part of the data acquisition and analysis life cycle. It is important to analyze the characteristics of the desired big data analysis operations, and then to optimize data organization and data distribution schemes for incoming data updates so as to maintain or even improve the efficiency of big data analysis.
A Survey on PCMBased Big Data Storage and Management
Wu Zhangling, Jin Peiquan,Yue Lihua, Meng Xiaofeng
2015, 52(2):  343-361.  doi:10.7544/issn1000-1239.2015.20140116
Asbtract ( 2056 )   HTML ( 5)   PDF (3583KB) ( 1873 )  
Related Articles | Metrics
Big data has become a hot topic in both academia and industry. However, due to the limitations of current computer system architectures, big data management is facing a lot of new challenges w.r.t. performance, energy, etc. Recently, a new kind of storage media called phase change memory (PCM) introduces new opportunities for advancing computer architectures and big data management, due to its nonvolatility, byteaddressability, high read speed, low energy, etc. As a kind of nonvolatile storage media, PCM has some unique features of DRAM, such as byteaddressability and high readwrite performance, thus can be regarded as a crosslayer storage media for redesigning current storage architecture so as to realize highperformance storage. In this paper, we summarize the features of PCM, and present a survey on PCMbased data management. We discuss the related advances in terms of two aspects, namely that PCM is used as secondary storage and that PCM is used as main memory. We also introduce the current studies on the applications of PCM in various areas. Finally, we propose some future research directions on PCMbased data management so as to provide some valuable references for big data storage and management on new storage architectures.
A GPU-Accelerated Highly Compact and Encoding Based Database System
Luo Xinyuan, Chen Gang, Wu Sai
2015, 52(2):  362-376.  doi:10.7544/issn1000-1239.2015.20140254
Asbtract ( 1599 )   HTML ( 1)   PDF (4925KB) ( 852 )  
Related Articles | Metrics
In the big data era, business applications generate huge volumes of data, making it extremely challenging to store and manage those data. One possible solution adopted in previous database systems is to employ some types of encoding techniques, which can effectively reduce the size of data and consequential improve the query performance. However, existing encoding approaches still cannot make a good tradeoff between the compression ratio, importing time and query performance. In this paper, to address the problem, we propose a new encoding-based database system, HEGA-STORE, which adopts the hybrid row-oriented and column-oriented storage model. In HEGA-STORE, we design a GPU-assistant encoding scheme by combining the rule-based encoding and conventional compression algorithms. By exploiting the computation power of GPU, we efficiently improve the performance of encoding and decoding algorithms. To evaluate the performance of HEGA-STORE, it is deployed in Netease to support log analysis. We compare HEGA-STORE with other database systems and the results show that HEGA-STORE can provide better performance for data import and query processing. It is a much compact encoding database for big data applications.
An Energy Efficient Algorithm for Big Data Processing in Heterogeneous Cluster
Ding Youwei, Qin Xiaolin, Liu Liang, Wang Taochun
2015, 52(2):  377-390.  doi:10.7544/issn1000-1239.2015.20140126
Asbtract ( 1654 )   HTML ( 0)   PDF (5721KB) ( 1108 )  
Related Articles | Metrics
It is reported that the electricity cost to operate a cluster may well exceed its acquisition cost, and the processing of big data requires large scale cluster and long period. Therefore, energy efficient processing of big data is essential for the data owners and users, and it is also a great challenge for the energy use and environment protection. Existing methods powered down some nodes to reduce energy consumption or developed new strategies of data storage in the cluster. However, we can find that much energy is still wasted even minimal nodes are used to process the task, and new storage strategies do not suit for the deployed clusters for the extra cost of data transformation. In this paper, we propose a novel algorithm MinBalance to processing I/O intensive big data tasks energy efficiently in heterogeneous cluster. The algorithm can be divided into two steps, node selection and workload balance. In the former step, four greedy policies are used to select the proper nodes considering heterogeneity of the cluster. While in the latter step, the workloads of the selected nodes will be well balanced to avoid the energy wastes caused by waiting. MinBalance is a universal algorithm and cannot be affected by the data storage strategies. Experimental results indicate that MinBalance can achieve over 60% energy reduction for large data sets over the traditional methods of powering down partial nodes.
Survey on Large-Scale Graph Pattern Matching
Yu Jing, Liu Yanbing,Zhang Yu, Liu Mengya,Tan Jianlong,Guo Li
2015, 52(2):  391-409.  doi:10.7544/issn1000-1239.2015.20140188
Asbtract ( 3337 )   HTML ( 11)   PDF (4874KB) ( 2811 )  
Related Articles | Metrics
In the big data age, there exists close affinities among the great amount of multi-modal data. As a popular data model for representing the relations of different data, graph has been widely used in various fields such as analysis of social network, social security, and biological information. Fast and accurate search over the large-scale graph serves as a fundamental problem in graph analysis. In this paper, we survey the up-to-date development in graph pattern matching techniques for graph search from the application perspective. Graph pattern matching techniques are roughly classified into several categories according to the properties of graphs and the requirement of applications. Meanwhile, we focus on introducing and analyzing the exact pattern matching, including non-index matching, index-based matching and their key techniques, representative algorithms, and performance evaluation. We summarize the state-of-the-art applications, challenging issues, and research trends for graph pattern matching.
Survey of Sign Prediction Algorithms in Signed Social Networks
Lan Mengwei,Li Cuiping, Wang Shaoqing,Zhao Kankan, Lin Zhixia,Zou Benyou, Chen Hong
2015, 52(2):  410-422.  doi:10.7544/issn1000-1239.2015.20140210
Asbtract ( 2204 )   HTML ( 2)   PDF (2032KB) ( 1359 )  
Related Articles | Metrics
According to the potential meaning, the edges in some networks can be divided into positive and negative relationships. When we mark these positive and negative edges with plus and minus signs respectively, a signed network is formed. Signed networks are widespread in sociology, information science, biology and other fields. Nowadays signed networks have become one of research hotspots. Researching on sign prediction problem in signed social networks is valuable to personalized recommendation, abnormal node identification and user clustering in social networks. This paper focus on predicting positive and negative links in signed social networks, and describes domestic and overseas current research status and latest developments. First we introduce the social structural balance theory and status theory. Then we classify several sign prediction algorithms into two categories according to their main ideals: algorithms based on matrix and algorithms based on classification. We introduce the basic idea of these sign prediction algorithms in detail. And then we compare and analyze these algorithms from multiple perspectives such as speed, accuracy, scalability and so on. Finally, we summarize some regularity characteristics and challenges in sign prediction and discuss some possible development directions in signed social networks research.
Multiple Sources Fusion for Link Prediction via Low-Rank and Sparse Matrix Decomposition
Liu Ye,Zhu Weiheng,Pan Yan, Yin Jian
2015, 52(2):  423-436.  doi:10.7544/issn1000-1239.2015.20140221
Asbtract ( 1600 )   HTML ( 2)   PDF (1785KB) ( 1095 )  
Related Articles | Metrics
In recent years, link prediction is a popular research field of link mining in social network and other complex networks. In the problem of link prediction, there usually exist multiple additional sources of information used to improve the performance of predicting the probability of the links in network. Among all the sources, the major source of all the information sources usually plays the most significant role on predicting. It is important to design a robust algorithm to make full use of all the sources and balance the major source and additional sources to get better link prediction result. Meanwhile, the traditional unsupervised algorithms based on topological calculation are mostly useful methods to calculate the scores for solving link prediction problem. In the approach of link prediction methods, the most important step is to construct a precise input seed matrix. Since many real-world network data may be noisy, which decreases the accuracy of most link prediction methods. In this paper, we propose a novel method with the multiple additional sources which take advantage of the leading information seed source matrix and others. And then, the seed source matrix is combined with other sources to construct a better matrix with lower noise and more precise structure than the seed matrix. The new matrix is used as the input matrix to traditional unsupervised topological algorithm. Experiment results show that the new proposed method can get better performance of the link prediction problem in different kinds of multiple sources real-world datasets.
Event Propagation Analysis on Microblog
Zhu Xiang, Jia Yan, Nie Yuanping, Qu Ming
2015, 52(2):  437-444.  doi:10.7544/issn1000-1239.2015.20140187
Asbtract ( 1736 )   HTML ( 8)   PDF (2184KB) ( 1265 )  
Related Articles | Metrics
Event propagation analysis is one of the main research issues in the field of social network analysis. Hotspot outbreaks and spreads through the social network, and it makes a great impact in a short period of time. Meanwhile, it is easier to create a hotspot and spread it in social network than in traditional media, so information diffusion will do harm to social security and property if used by criminals. Traditional influence propagation analysis method can only analyze single microblog (or tweet), so it limits event propagation analysis in social network. In this paper, we review some existing propagation models such as independent cascade model, linear threshold model, etc. After that, we introduce some basic definitions of influence propagation analysis in social network. Then we propose a method combining user deduplication, spammer detection and probabilistic reading based on existing independent cascade model. The main idea of our method is making user deduplication in the event composed of several key microblogs (or tweets) and building event propagation graph. Then we remove spammers in that graph and make influence propagation analysis by using probabilistic reading model. It provides a novel method to make event propagation analysis. Finally, some experiments are conducted and the results demonstrate the correctness and effectiveness of the method.
An Algorithm of Mining TOP-K High Utility Patterns Without Generating Candidates
Wang Le, Feng Lin, Wang Shui
2015, 52(2):  445-455.  doi:10.7544/issn1000-1239.2015.20131184
Asbtract ( 1304 )   HTML ( 0)   PDF (5922KB) ( 858 )  
Related Articles | Metrics
Mining TOP-K high utility pattern from a dataset is an extension of frequent pattern mining, and it aims to mine the patterns whose utilities are higher than a user-specified minimum utility threshold. At present, it has been a topic in data mining. Existing algorithms of mining TOP-K high utility pattern generate candidate itemsets in the mining process and they need multiple scans of a dataset; this hinders their performance of runtime and memory usage, especially when a dataset is large or there are many long transaction itemsets in a dataset. To address this issue, we propose a tree structure called HUP-Tree (high utility pattern tree) to maintain transaction itemsets and their utility values, and we also give an algorithm named TOPKHUP (TOP-K high utility pattern) that mines TOP-K high utility patterns without generating candidates. HUP-Tree ensures efficient retrieval of utility value of each pattern without additional scan of the dataset, so the performance of the algorithm is effectively improved. Seven classical real and synthetic datasets are used in the testing experiments and the results show that the proposed algorithm outperforms state-of-the-art algorithms significantly for both runtime performance and memory usage, and it is more stable along the change of the value K.
Open Web Knowledge Aided Information Search and Data Mining
Wang Yuanzhuo, Jia Yantao, Liu Dawei,Jin Xiaolong, Cheng Xueqi
2015, 52(2):  456-474.  doi:10.7544/issn1000-1239.2015.20131342
Asbtract ( 2267 )   HTML ( 13)   PDF (3434KB) ( 2529 )  
Related Articles | Metrics
Network big data refers to the massive data generated via interaction and fusion of the ternary human-machine-thing universe in the cyberspace and available on the Internet. It has a few typical features, such as multi-sourced, heterogeneous, interactive, bursty, and noisy. It contains mainly unstructured data, and has strong real-timeness. Network big data implicitly contains tremendous highly-interconnected knowledge. Building up open Web oriented large-scale knowledge bases is an effective means for obtaining rich knowledge from network big data. This paper compares both the domestic and international mainstream open Web knowledge bases. We specifically analyze the core techniques and methods for constructing open Web knowledge bases, fusing multi-sourced knowledge, and updating the knowledge bases. Furthermore, we summarize the research status and main issues of open Web knowledge base based information search, data mining, and system applications from different aspects, including user intension understanding, query extension, semantic Q&A, clue mining, relationship referencing, and prediction of relationships and attributes. Finally, we look into the development trends and main challenges of open Web knowledge bases.
A Collaborative Filtering Recommendation Method for UCL in Broadcast-Storage Network
Gu Liang, Yang Peng, Luo Junzhou
2015, 52(2):  475-486.  doi:10.7544/issn1000-1239.2015.20131418
Asbtract ( 1235 )   HTML ( 0)   PDF (3901KB) ( 736 )  
Related Articles | Metrics
Problems like bandwidth congestion, content redundancy exist in the sharing of information resources. Broadcast-Storage network has a particular advantage in solving these issues because of its unique feature of one to infinite by physical broadcast. Uniform content label, UCL, is used to express the needs of users and help users understand the information resources in Broadcast-Storage environment. Due to UCL’s large quantity, how to guide users to get their preferred UCLs efficiently is quite significant. To address this problem, this paper proposes a unifying collaborative filtering method with popularity and timing (UCF-PT) for UCL recommendation. First, a pair of thresholds are set to estimate the sparsity of users and UCLs in the dataset and determine the weights of users and UCLs in recommendation. Then UCF-PT predicts the ratings of users on UCLs based on the weights and generates a recommendation list. Moreover, the method makes popular and new UCLs more likely to be recommended by considering UCL popularity and using exponential decay in recommendation. Experiments show that, compared with traditional recommendation methods, the method proposed in this paper possesses better recommendation accuracy and ensures the popularity and novelty of recommended UCLs. Therefore, it is more suitable for recommending UCLs in Broadcast-Storage environment.
Naming Game on Multi-Community Network
Guo Dongwei, Meng Xiangyan, Liu Miao, Hou Caifang
2015, 52(2):  487-498.  doi:10.7544/issn1000-1239.2015.20131465
Asbtract ( 1019 )   HTML ( 0)   PDF (6864KB) ( 702 )  
Related Articles | Metrics
We propose a new naming game model to imitate the process of human cognizing and naming a new object. Agents cognize an object through different name weights of its various words. The increase and decrease of names weight express that the name memory is enhanced and forgotten in human brain. Deleting names with low weights explains limited memory. On single-community playing our naming game, evolution can converge to global consensus asymptotically. The process of naming a new object is explained qualitatively by analyzing the number of total names, the number of different names and the average success rate. Optimal values of the deleting threshold and attenuation parameter induce the fastest convergence of the population, but very strong influences inhibit the convergence process. There exists a linear relationship between the two parameters to favor the rapid convergence. This paper also proposes a multi-community network model, which is composed of several communities, to simulate the evolution of different languages in various countries. Gaming on multi-community network model, the number of convergence names may be same as the number of communities. The stability of convergence names is related to the strength of communities and average degree, not related to the size of community. Stability analysis of differential equations is used to explain numerical computation. The agents in community hold a name and agents among communities hold several names, which are similar to multilingual and they can communicate with each other among communities.
A Semantic Overlapping Community Detecting Algorithm in Social Networks Based on Random Walk
Xin Yu,Yang Jing, Xie Zhiqiang
2015, 52(2):  499-511.  doi:10.7544/issn1000-1239.2015.20131246
Asbtract ( 1560 )   HTML ( 4)   PDF (5948KB) ( 1399 )  
Related Articles | Metrics
Since the semantic social networks (SSN) is a new kind of complex networks, the community detection is a new investigation relevant to the traditional community detection research. To solve this problem, an overlapping community structure detecting method in semantic social network is proposed based on the random walk strategy. The algorithm establishes the semantic space using latent Dirichlet allocation (LDA) method. Firstly, the quantization mapping is completed by which semantic information in nodes can be changed into the semantic space. Secondly, the semantic influence model and weighed adjacent matrix of SSN are established, with the entropy of nodes in SSN as the semantic information proportion, the distribution ratio of nodes as the weight of adjacent. Thirdly, an improved random walk strategy of community structure detecting in overlapping-SSN is proposed, with the distribution ratio of nodes as parameter, and a semantic modularity model is proposed by which the community structure of SSN can be measured. Finally, the efficiency and feasibility of the proposed algorithm and the semantic modularity are verified by experimental analysis.
Burst Topic Detection Oriented Large-Scale Microblogs Streams
Shen Guowei, Yang Wu, Wang Wei, Yu Miao
2015, 52(2):  512-521.  doi:10.7544/issn1000-1239.2015.20131336
Asbtract ( 1280 )   HTML ( 9)   PDF (4216KB) ( 1253 )  
Related Articles | Metrics
In microblogs, emergent events spread quickly and produce tremendous influence. Burst of public opinion is widely concerned by government and enterprise. Existing burst topic detection methods only consider one type of entity, such as word or tag. However, Chinese microblogs contain not only new or colloquial words, but also contain some pictures and links, burst patters of which are difficult to detect. To tackle this problem, we propose a real-time burst topic detection framework for multi-type entites. Different from existing method, our method does not require Chinese word segmentation, but generates new words lastly. In this framework,the window size is adjusted based on the microblogs streams dynamically. In order to measure the burst weight of entity, the spread influence of entity is calculated. Moreover, the high order co-clustering algorithm based on non-negative matrix decompostition is used to cluster two types of entities, message and user simultaneously. While the detection of burst topic, we can also obtain the related messages and participating users, which can be used to analyze the cause of burst topic. Experimental on a large Sina Weibo dataset show that our algorithm has higher accuracy and earlier detection of the burst topic compared with the existing algorithms.
Growth Law of User Characteristics in Microblog
Yuan Weiguo, Liu Yun
2015, 52(2):  522-532.  doi:10.7544/issn1000-1239.2015.20131273
Asbtract ( 1240 )   HTML ( 4)   PDF (5397KB) ( 1125 )  
Related Articles | Metrics
Based on the actual data crawled from Sina Microblog, this paper mainly analyzes the growth law of three user characteristics: the number of followers, friends and statuses. They all increase linearly with time and the growth rate in round figures obeys the power-law distribution. It is found that these characteristics are mainly in sustainable and explosive growth patterns. Moreover, the user with the explosive growth pattern can be divided into four main categories, such as early-stage growth pattern, middle-stage growth pattern, later-stage growth pattern, and step-stage growth pattern. Furthermore, the users’ number of different growth patterns can be counted using the K-means clustering algorithm, which is based on the vector cosine similarity. The growth patterns of user characteristics are observed by cluster analysis of the actual time series, which are grouped by different sorting methods and initial scales. It is observed that the users with higher growth rate are mainly in explosive growth pattern, and the users with higher initial number tend to be in sustainable growth pattern. Finally, based on the analysis of the explosive growth process of the number of followers, the relationships between the growth of the numbers of retweet and comment are compared, and the reasons for the explosive growth of the users are proposed.
Influence Maximization Based on Information Preference
Guo Jingfeng, Lü Jiaguo,
2015, 52(2):  533-541.  doi:10.7544/issn1000-1239.2015.20131311
Asbtract ( 1395 )   HTML ( 1)   PDF (1414KB) ( 893 )  
Related Articles | Metrics
The empirical research shows that individuals in real social network have different preference for the information with different themes, which plays an important role in information diffusion in social network. Influence maximization is a fundamental issue to find a subset of influential individuals in a social network such that targeting them initially (e.g. to adopt a new product) will maximize the spread of the influence (further adoptions of the new product).Most previous work of the influence maximization problem doesn’t take users’ preference for information theme into account, which greatly reduces the accuracy of result. To further improve the efficiency and performance of influence maximization algorithm, we propose a two-stage L_GAUP algorithm. In the first stage, based on the node’s preference for the information theme, we can get a sub-graph. Compared with other nodes in the network, the nodes in sub-graph have higher preference values for the given information theme. Then, in the second stage, based on the greedy strategy, we find the top-k influential nods in the sub-graph. In experiments, we conduct algorithm L_GAUP, GAUP and CELF in a real word dataset douban. As for three metrics runtime, IS and ISST, experimental results show that L_GAUP outperforms the benchmark algorithm GAUP greatly.
Survey of Contextual Computing
Li Weiping,Wang Wusheng,Mo Tong,Zhang Zhichao,Chu Weijie, Wu Zhonghai
2015, 52(2):  542-552.  doi:10.7544/issn1000-1239.2015.20131266
Asbtract ( 2780 )   HTML ( 5)   PDF (1652KB) ( 2396 )  
Related Articles | Metrics
As an emerging computing mode, contextual computing has been drawing more and more attention in both academic and industrial community. With the continuous evolving and maturity of related technologies, such as Internet of things, cloud computing, big data and social computing, contextual computing is growing at a rapid pace. Contextual computing is a computing mode that figures out the required services for particular users by acquiring and analyzing their context information, and provides the corresponding context-aware services actively. This new computing mode brings great comfort and convenience to users’ work and life. In this paper, we present the background of contextual computing together with its key concepts including context information, contextual computing, context-awareness, context-aware system and context-aware service. The important research areas such as context data acquisition, context modelling, context reasoning, active service provision, context-aware middleware, information security and privacy, as well as the relevant technologies are summarized. Finally, we highlight the topics that the further research will focus on in contextual computing based-on the general architecture of contextual computing proposed in this paper.