ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 November 2016, Volume 53 Issue 11
Security Analysis and Enhancement of Third-Party Android Push Service
Lu Yemian, Li Yifu, Ying Lingyun, Gu Yacong, Su Purui, Feng Dengguo
2016, 53(11):  2431-2445.  doi:10.7544/issn1000-1239.2016.20150528
Asbtract ( 1044 )   HTML ( 5)   PDF (2832KB) ( 636 )  
Related Articles | Metrics
Push service is becoming a basic service for smartphone applications. Many companies, including official and third parties, have released their push services. In order to reduce resource cost, some third-party push services share push channels among applications running on the same device and using the same push service, which means that the background push component of one application acts as the push data distribution center for other applications. Due to the lack of considering security attributes such as confidentiality and integrity, the distribution part faces a variety of attacks. In this work we analyze the security issues in the data distribution part of third-party push services on Android. We design a corresponding attack model and implement attacks including eavesdropping, data tampering, forgery and replay attacks. During our experiments, it shows that most of the third-party Android push services using shared channels are subject to these attacks. It may cause some security hazards such as user privacy leakage and phishing attack. To mitigate the above threats, we propose SecPush which is a security enhancement scheme for Android push service. SecPush secures data distribution by introducing encryption and HMAC algorithm. Experimental results show that SecPush can effectively protect push data against eavesdropping, data tampering, forgery and replay attacks.
Authentication and Key Agreement Protocol for Multi-Server Architecture
Wan Tao, Liu Zunxiong, Ma Jianfeng
2016, 53(11):  2446-2453.  doi:10.7544/issn1000-1239.2016.20150107
Asbtract ( 770 )   HTML ( 2)   PDF (876KB) ( 685 )  
Related Articles | Metrics
With the rapid growth of Internet applications, the architecture of server providing resources to be accessed over the network often consists of many different servers. Authentication and key agreement protocol play an important role to authenticate remote users for multi-server architecture. In recent years, several authentication and key agreement protocols for multi-server architecture have been developed. Single registration is the most important feature in a multi-server architecture which may help users take desired services without repeating registration to each service provider. Employing a dynamic ID for each login may efficiently preserve privacy. Recently, Chuang et al. presented an anonymous multi-server authenticated key agreement scheme based on trust computing using smart cards and biometrics. They claimed that their protocol not only supported multi-server environments but also achieved many security requirements. A cryptanalysis on Chuang et al.’s scheme shows that their scheme cannot provide the anonymity and is vulnerable to server masquerade attack and smart card loss attack. To overcome these security flaws, an improved protocol is proposed by choosing different secret parameters for each application server. This protocol can be proved to be secure against server masquerade attack, smart card loss attack, impersonation attack, eavesdropping attack, replay attack and so on. Besides, the improved protocol maintains the feature of simple operation.
A Privacy Protection Mechanism for Dynamic Data Based on Partition-Confusion
Zhang Honglei, Shi Yuliang, Zhang Shidong, Zhou Zhongmin, Cui Lizhen
2016, 53(11):  2454-2464.  doi:10.7544/issn1000-1239.2016.20150553
Asbtract ( 720 )   HTML ( 1)   PDF (2687KB) ( 549 )  
Related Articles | Metrics
Under the cloud computing environment, the privacy protection in the plaintext state can be realized, by the partition-confusion-based privacy protection mechanism which effectively combines tenants personalized privacy protection requirements and application performance. However, as the multi-tenant applications continue to run, on the one hand, the insertion, deletion, modification and other business operations of the tenant data can affect the distribution of the underlying data storage, making the relationships between the chunks in a significant risk of leakage due to the uneven data distribution; on the other hand, the attacker can still analyze a part of private information by the operation log of every chunk and the snapshot of the corresponding data in the local time. In response to these challenges, the present paper proposes a dynamic data privacy protection mechanism for partition confusion on the basis of the tripartite security interaction model. This mechanism can cache the data newly inserted and modified by a trusted third party and then group and upload it under the proper conditions; retaining key fragmentation in the deletion operation can ensure the privacy of the deleted and remained data; the falsifying data collection mechanism can achieve lower consumption of resources storage and optimize the application performance. The experimental result proves that the dynamic data privacy protection mechanism proposed in this paper has better feasibility and practicality.
A White-Box-Cryptography-Based Scheme for the Secure Chip of DCAS Terminal
Xu Tao, Wu Chuankun, Zhang Weiming
2016, 53(11):  2465-2474.  doi:10.7544/issn1000-1239.2016.20150546
Asbtract ( 697 )   HTML ( 1)   PDF (3114KB) ( 386 )  
Related Articles | Metrics
In the technical specification of downloadable conditional access system (DCAS) issued by the State Administration of Radio, Film and Television of China (SARFT) in 2012, all cryptographic operations in a terminal are built into a secure chip and protected with hardware-based security technologies. Too much protected black-box contents in the secure chip, however, will lower the universality and flexibility of the chip, and add the cost of research and development. Thus, an improved scheme for the secure chip of DCAS terminal is proposed, which is based on white-box cryptography. The main idea is to replace the key ladder inside the chip by a software-based white-box decryption module outside the chip and an external encoding inside the chip. An algorithm of generating external encoding is put forward, which is executed in the secure chip and based on the protected secret key and the external input parameters. The decryption and authentication processes in the terminal are redesigned. Compared with the original scheme in the DCAS technical specification, the improved scheme not only overcomes the aforementioned deficiencies, but also provides two extra benefits: the decryption algorithm can be renewed while the service key is being downloaded from the network; the new authentication process can verify the legitimacy as well as the uniqueness of a DCAS terminal.
An Efficient 1-out-of-n Oblivious Transfer Protocol with Full Simulation
Wei Xiaochao, Jiang Han, Zhao Chuan
2016, 53(11):  2475-2481.  doi:10.7544/issn1000-1239.2016.20150505
Asbtract ( 1125 )   HTML ( 4)   PDF (837KB) ( 478 )  
Related Articles | Metrics
Oblivious transfer (OT) is an important basic cryptographic tool, which can be used in the constructions of many other cryptographic protocols, such as secure multi-party computation (SMPC) protocols, private information retrieval (PIR) and so on. The 1-out-of-n oblivious transfer (OT\+1\-n) setting involves two parties, the sender S and the receiver R. More specificly, the sender has n values and the receiver wants to obtain only one value from them. At the same time, the receiver’s choice is unknown to the sender and the receiver gets no extra information about the other values he doesn’t choose. In this paper, we firstly propose an efficient OT\+1\-n protocol based on the decisional Diffie-Hellman (DDH) hard problem assumption with full simulation in the standard malicious model. The full simulation means that the protocol can be simulated when the receiver and the sender are corrupted respectively under the ideal/real simulation paradigm, and also this is the highest security level in the standard stand-alone model. The idea behind the protocol mainly benefits from the dual-mode cryptosystem and the combination of zero-knowledge proof of knowledge (ZKPOK) of Diffie-Hellman tuples. The protocol has constant number of interactive complexity, and the computation and communication complexity is just liner of n.
An Identity-Based Authenticated Key Exchange Protocol from RLWE
Zhao Xiufeng, Gao Haiying, Wang Ailan
2016, 53(11):  2482-2490.  doi:10.7544/issn1000-1239.2016.20150547
Asbtract ( 1076 )   HTML ( 3)   PDF (910KB) ( 506 )  
Related Articles | Metrics
Key exchange protocol allows two or more users to compute share session key via exchange information in the open communication channel, and uses the session key to finish cryptography tasks, such as secure communication and authentication. Recently, it becomes a hotspot research question that how to design authenticated key exchange protocol with lattice-based one-way function. Several lattice-based two-party authenticated key exchange protocols have been proposed. However, how to extend them to the identity-based cryptography background still remains open question. In this paper, an identity-based authenticated key exchange protocol from the learning with errors (LWE) problem over cyclotomic ring is proposed. The protocol generates master key by ring LWE (RLWE) sample algorithm, and further extracts the users’ secret key, and computes key materials which derive the share session key via exchanging Diffie-Hellman ephemeral key. The protocol introduces error item, uses encoding bases of ideal lattice as the tool for analyzing error tolerance, and makes reasonable suggests for parameters setting. The protocol achieves provable AKE secure and PKG forward secure in the ID-BJM model. Furthermore, the session key is also secure even if both long private keys are leaked or both ephemeral private key are leaked or A’s ephemeral key and B’s long private key are leaked.
Reputation-Based Defense Scheme Against Pollution Attacks on Network Coding
Wang Tiefeng, Cai Ying, Zhang Yujie
2016, 53(11):  2491-2499.  doi:10.7544/issn1000-1239.2016.20150502
Asbtract ( 642 )   HTML ( 0)   PDF (1620KB) ( 571 )  
Related Articles | Metrics
Network coding is to apply innovative error-correction coding techniques in the network layer to improve network performance in both wired and wireless networks. It has been theoretically shown and experimentally demonstrated that if it is properly applied, it can significantly improve end-to-end network throughput, and hence has attracted tremendous attention in the last fifteen years. Unfortunately, this technique also has some serious drawbacks. One of the major problems is its vulnerability to pollution attacks, where malicious nodes can inject corrupted packets to mess up with the decoding process. To deal with this serious problem, many schemes have been proposed in the literature, but most of them are centralized in the sense that a trusted central authority may be required. In this paper, we propose a novel distributed defense scheme based on some reputation mechanism by taking advantage of node mobility. The fundamental idea is to apply an effective reputation mechanism to locate potential malicious nodes whenever suspected polluted packets are detected. We have conducted extensive comparison studies of our proposed scheme and the existing ones, and demonstrated that the proposed scheme can achieve high successful packet delivery ratio by effectively locating and isolating the malicious nodes, even when there exist multiple malicious nodes in the network.
MapReduce-Based Network Property Verification Technique for OpenFlow Network
Liu Yi, Lei Cheng, Zhang Hongqi, Yang Yingjie
2016, 53(11):  2500-2511.  doi:10.7544/issn1000-1239.2016.20150521
Asbtract ( 673 )   HTML ( 0)   PDF (2577KB) ( 568 )  
Related Articles | Metrics
Aimed at the problem of configuration errors of flow tables resulting from automatic change of data-plane state by software in OpenFlow network, a MapReduce-based network property verification technique is proposed. Firstly, by exploiting the separation of logic control from data forwarding in OpenFlow network, a novel technical framework providing non-real-time and real-time verification is designed. Further, on the basis of the advantage of parallel computing in MapReduce, a non-real-time verification algorithm is presented, which can verify network properties in parallel in two phases. In Map phase, it slices network into equivalence classes. In Reduce phase, it builds network forwarding graph with switch port predicates and conducts network reachability analysis. Meanwhile, with the help of atomic predicates, it can not only eliminate the redundancy of the set of switch port predicates, but also convert highly computation-intensive operations on predicates to those on sets of integers, speeding up computation of network reachability further. Based on it, a real-time verification algorithm is proposed. According to different network update events, it applies different changes to the results of non-real-time verification in order to incrementally verify properties. Finally, theoretical analysis and experimental results show the low time and storage overhead of the proposed technique. Additionally, its effect on the time of building TCP connection is also analyzed.
Survey of MPTCP-Based Multipath Transmission Optimization
Xue Kaiping, Chen Ke, Ni Dan, Zhang Hong, Hong Peilin
2016, 53(11):  2512-2529.  doi:10.7544/issn1000-1239.2016.20150589
Asbtract ( 1787 )   HTML ( 15)   PDF (3260KB) ( 870 )  
Related Articles | Metrics
The newly developed networks hope to bring some specific features, such as wide coverage, high bandwidth, and provide varieties of media services. Fortunately, multiple access techniques are at there to help. Nowadays,terminals are always equipped with multiple interfaces to adapt to the heterogeneous networks. Thus, there are more than one paths existing between two endpoints. MPTCP (multipath TCP) is proposed by IETF MPTCP working group, in addition to compatible with traditional TCP, which utilizes multiple paths to transmit data. It improves the efficiency of bandwidth usage and the resilience of the connection. What’s more, it can adaptively move data from more congested path to less congested path. In recent years, many researches have been carried out to gradually make MPTCP from concept to practice. This paper introduces the technical details of MPTCP standard, and expounds the present situation of the related aspects, including simulation and actual deployment, out-of-order control, coupled congestion control, energy consumption maintenance, security, and other aspects such as path selection, mobility and QoS, the data center scenario, et al. Finally, this paper gives a summary and expectation.
A Multi-Level Fault Tolerance Mechanism for Disk-Resident Pregel-Like Systems
Bi Yahui, Jiang Suyang, Wang Zhigang, Leng Fangling, Bao Yubin, Yu Ge, Qian Ling
2016, 53(11):  2530-2541.  doi:10.7544/issn1000-1239.2016.20150619
Asbtract ( 621 )   HTML ( 0)   PDF (3162KB) ( 309 )  
Related Articles | Metrics
The BSP-based distributed frameworks, such as Pregel, are becoming a powerful tool for handling large-scale graphs, especially for applications with iterative computing frequently. Distributed systems can guarantee a flexible processing capacity by adding computing nodes, however, they also increase the probability of failures. Therefore, an efficient fault-tolerance mechanism is essential. Existing work mainly focuses on the checkpoint policy, including backup and recovery. The former usually backups all graph data, which leads to the cost of writing redundant data since some data are static during iterations. The latter always loads backup data from remote machines to recovery iterations, ignoring the usage of data in the local disk in special scenarios, which incurs network costs. It proposes a multi-level fault tolerant mechanism, which distinguishes failures into computing task failures and node failures, and then designs different strategies for backup and recovery. For the latter, considering that the volume of data involved in computation varies with iterations, a complete backup policy and an adaptive log-based policy are presented to reduce the cost of writing redundant data. After that, at the stages of recovery, we utilize the local graph data and the remote message data to handle the recovery for task failures, but the remote data are used for node failures. Finally, extensive experiments on real datasets validate the efficiency of our solutions.
Caption Generation from Product Image Based on Tag Refinement and Syntactic Tree
Zhang Hongbin, Ji Donghong, Yin Lan, Ren Yafeng, Niu Zhengyu
2016, 53(11):  2542-2555.  doi:10.7544/issn1000-1239.2016.20150906
Asbtract ( 878 )   HTML ( 0)   PDF (2820KB) ( 429 )  
Related Articles | Metrics
Automatic caption generation from product image is an interesting and challenging research task of image annotation. However, noisy words interference and inaccurate syntactic structures are the key problems that affect the research heavily. For the first problem, a novel idea of tag refinement (TR) is presented: absolute rank (AR) feature is applied to strengthen the key words weights. The process is called the first tag refinement. The semantic correlation score of each word is calculated in turn and the words that have the tightest semantic correlations with images content are summarized for caption generation. The process is called the second tag refinement. A novel natural language generation (NLG) algorithm named word sequence blocks building (WSBB) is designed accordingly to generate N gram word sequences. For the second problem, a novel idea of syntactic tree (ST) is presented: a complete syntactic tree is constructed recursively based on the N gram word sequences and predefined syntactic subtrees. Finally, sentence is generated by traversing all leaf nodes of the syntactic tree. Experimental results show both the tag refinement and the syntactic tree help to improve the annotation performance. More importantly, not only the semantic information compatibility but also the syntactic mode compatibility of the generated sentence is better retained simultaneously. Moreover, the sentence contains abundant semantic information as well as coherent syntactic structure.
Algorithm of Computing Minimal Hitting Set Based on the Structural Feature of SE-Tree
Liu Siguang, Ouyang Dantong, Wang Yiyuan, Jia Fengyu, Zhang Liming
2016, 53(11):  2556-2566.  doi:10.7544/issn1000-1239.2016.20150396
Asbtract ( 642 )   HTML ( 2)   PDF (3268KB) ( 335 )  
Related Articles | Metrics
During the process of computing minimal hitting set (MHS) by SE-Tree, it will generate many redundant nodes that cannot be pruned by current SE-Tree based algorithms, which affects the efficiency of these algorithms, i.e., the higher the ratio of redundant nodes is, the greater likely the impact of algorithms has. In this paper, firstly we introduce the definition of redundant nodes by analyzing the characteristic of leaf-node in SE-Tree and the redundant nodes in solution space in existent algorithms. Secondly, on the basis of analyzing the structural feature of SE-Tree and the theory that the subset of non-hitting set is non-hitting set, we propose the concept of assistant pruning tree. Specially, the decision nodes are added into the assistant pruning tree, which can largely reduce the visit of non-solution space. Furthermore, in order to decrease the visit of non-solution space when computing larger problem as much as possible, the algorithm of computing minimal hitting set combining with multi-level assistant pruning tree is proposed. Finally, we set a reasonable termination condition to make our algorithm stop without losing correct solution as early as possible, and then prove its correctness. Results show that the proposed algorithm is easily implemented and efficient. Moreover, our algorithm significantly outperforms a state-of-the-art minimal hitting set algorithm Boolean, even up to one order of magnitude for some instances.
Kernel Sparse Representation Classification with Group Weighted Constraints
Zheng Jianwei, Yang Ping, Wang Wanliang, Bai Cong
2016, 53(11):  2567-2582.  doi:10.7544/issn1000-1239.2016.20150743
Asbtract ( 682 )   HTML ( 0)   PDF (4779KB) ( 461 )  
Related Articles | Metrics
A new classification method called KWGSC (kernel weighted group sparse representation classifier) is proposed for pattern recognition. KWGSC integrates both group sparsity and data locality in the kernel feature space rather than in the original feature space. KWGSC can learn more discriminating sparse representation coefficients for classification. The iteratively update solution of the l\-2,p-norm minimization problem for KWGSC is also presented. There are several appealing aspects associated with KWGSC. Firstly, by mapping the data into the kernel feature space, the so-called norm normalization problem that may be encountered when directly applying sparse representation to non-normalized data classification tasks will be naturally alleviated. Secondly, the label of a query sample can be inferred more precisely by using of distance constraints and reconstruction constraints in together. Thirdly, the l\-2,p regularization (where p∈(0,1]) is introduced to adjust the sparsity of collaborative mechanism for better performance. Numeric example shows that KWGSC is able to perfectly classify data with different normalization strategy, while conventional linear representation algorithms fail completely. Comprehensive experiments on widely used public databases also show that KWGSC is a robust discriminative classifier with excellent performance, being outperforming other state-of-the-art approaches.
The DNA Self-Assembly Computing Model for Solving Perfect Matching Problem of Bipartite Graph
Lan Wenfei, Xing Zhibao, Huang Jun, Qiang Xiaoli
2016, 53(11):  2583-2593.  doi:10.7544/issn1000-1239.2016.20150312
Asbtract ( 623 )   HTML ( 1)   PDF (3782KB) ( 532 )  
Related Articles | Metrics
Biological systems are far more complex and robust than systems we can engineer today, and Because of its advantages of stability and specificity, DNA molecules have been used for the construction of nano-scale structures. With the development of DNA molecule self-assembly strategy, lots of programmable DNA tiles are constructed and used for solving NP problems. The tile assembly model, a formal model of crystal growth, is a highly distributed parallel model of natures self-assembly with the traits of highly distributed parallel, massive storage density and low power consumption. In the system, a function can be computed by deterministic assembly and identified two important quantities: the number of tile types and the assembly speed of the computation. Here a DNA self-assembly model for perfect matching problem of bipartite graph is demonstrated, and a 10-vertices bipartite graph is used as an example to illustrate this model. The computation is nondeterministic and each parallel assembly is executed in time linear in the input. The result shows that the successful solutions can be found among the many parallel assemblies, and it requires only a constant number of different tile types: 14.
Multiobjective Clustering Algorithm with Fuzzy Centroids for Categorical Data
Zhou Zhiping, Zhu Shuwei, Zhang Daowen
2016, 53(11):  2594-2606.  doi:10.7544/issn1000-1239.2016.20150467
Asbtract ( 818 )   HTML ( 0)   PDF (3152KB) ( 476 )  
Related Articles | Metrics
It has been shown that most traditional clustering algorithms for categorical data that only optimize a single criteria suffer from some limitations, thus a novel multiobjective fuzzy clustering is proposed, which simultaneously considers within-cluster and between-cluster information. The lately reported algorithms are all based on K-modes, and the more accurate algorithm fuzzy centroids is utilized as the base algorithm to design the proposed method. Fuzzy membership is used as chromosome that is different from traditional genetic based hybrid algorithms, and a set of optimal clustering solutions can be produced by optimizing two conflicting objectives simultaneously. Meanwhile, a termination criterion in advance which can reduce unnecessary computing cost is used to judge whether the algorithm is steady or not. To further improve the efficiency of the proposed method, fuzzy centroids can be calculated using a subset of the dataset, and then the membership matrix can be calculated by these centroids to obtain the final clustering result. The experimental results of 10 datasets show that the clustering accuracy and stability of the proposed algorithm is better than the state of art multiobjective algorithm, and also the computing efficiency is improved to a large extern.
Analysis of Concept Drifting and Uncertainty in an Information Table
Deng Dayong, Miao Duoqian, Huang Houkuan
2016, 53(11):  2607-2612.  doi:10.7544/issn1000-1239.2016.20150803
Asbtract ( 669 )   HTML ( 2)   PDF (658KB) ( 363 )  
Related Articles | Metrics
Concept drifting detection is one of hot topics in data stream mining, and analysis of uncertainty is dominant in rough set theory. Combined with the ideas of data stream, concept drifting, rough sets and F-rough sets, a lot of concepts such as concept drifting of upper approximation, concept drifting of lower approximation, concept coupling of upper approximation and concept coupling of lower approximation etc are defined. The change of concepts in an information system is analyzed with these definitions. With the positive region, integral concept drifting, integral concept coupling are defined. The analysis and measurement for the change of concept uncertainty are conducted. From the view of epistemology, the concept of cognition convergence is defined from the ways of idealism and realism. It provides heuristic information for realizing the world of human beings from the viewpoints of granular computing and rough sets.
Under-Sampling Method Based on Sample Weight for Imbalanced Data
Xiong Bingyan, Wang Guoyin, Deng Weibin
2016, 53(11):  2613-2622.  doi:10.7544/issn1000-1239.2016.20150593
Asbtract ( 1032 )   HTML ( 8)   PDF (1155KB) ( 784 )  
Related Articles | Metrics
Imbalanced data exists widely in the real world, and its classification is a hot topic in data mining and machine learning. Under-sampling is a widely used method in dealing imbalanced data set and its main idea is choosing a subset of majority class to make the data set balanced. However, some useful majority class information may be lost. In order to solve the problem, an under-sampling method based on sample weight for imbalance problem is proposed, named as KAcBag (K-means AdaCost bagging). In this method, sample weight is introduced to reveal the area where the sample is located. Firstly, according to the sample scale, a weight is made for each sample and is modified after clustering the data set. The samples which have less weight in the center of majority class. Then some samples are drawn from majority class in accordance with the sample weight. In the procedure, the samples in the center of majority class can be selected easily. The sampled majority class samples and all the minority class samples are combined as the training data set for a component classifier. After that, we can get several decision tree sub-classifiers. Finally, the prediction model is constructed based on the accuracy of each sub-classifiers. Experimental tests on nineteen UCI data sets and telecom user data show that KAcBag can make the selected samples have more representativeness. Based on that, this method can improve the the classification performance of minority class and reduce the scale of the problem.
Uncorrelated Locality Preserving Discriminant Analysis Based on Bionics
Ning Xin, Li Weijun , Li Haoguang, Liu Wenjie
2016, 53(11):  2623-2629.  doi:10.7544/issn1000-1239.2016.20150630
Asbtract ( 655 )   HTML ( 0)   PDF (1283KB) ( 340 )  
Related Articles | Metrics
Imagery thinking model is an essential way of thinking for human being. It cognizes the regularity of things through various human senses, and then extracts the representative features. Therefore, using the method of imagery thinking to extract the essential characteristics of things is in conformity with the law of human cognition. According to the problem of feature extraction in face recognition technology, we propose an uncorrelated space locality preserving discriminant analysis algorithm—BULPDA based on the theory of unsupervised discriminant projection and image cognitive law. On the basis of the characteristics of human image cognitive, the proposed algorithm first builds a new construction method of similarity coefficient. Then, it applies uncorrelated space concepts to ensure the non-relevance of vector space. Finally, it gives the solution of the proposed algorithm based on singular value decomposition. The algorithm presents a new idea of feature extraction. The experimental results on the standard face database show that the proposed algorithm is better than the traditional preserving projection algorithms.
A Method for Social Network User Identity Feature Recognition
Hu Kaixian, Liang Ying, Xu Hongbo, Bi Xiaodi, Zuo Yao
2016, 53(11):  2630-2644.  doi:10.7544/issn1000-1239.2016.20150219
Asbtract ( 1014 )   HTML ( 3)   PDF (5372KB) ( 559 )  
Related Articles | Metrics
Social network is an important part of modern information society. The anonymity of social network users brings a series of problems concerning social security. This paper presents a method to recognize social network user identity feature by location-based social network (LBSN) and social relationships, and combine the results of those two to infer social network user true identity. The method of geo-location uses approximation weight which is calculated by computing full match weight and basic match weight based on Chinese segmentation and bi-word segmentation to evaluate the possibility that the entity is where the user studies or works, and the method uses entity name aggregation algorithm to optimize the result of approximation weight calculation. According to the observation that friend relationship between users on social network tends to indicate a certain same identity features or a share of common interests, the method of social relationships uses majority voting scheme to count users friends identity features to infer user address, entity information and interests. Based on microblog data, we conduct experiments on two samples which cover 1 000 users and 10 000 users respectively and involve a total of more than 2.5 million users relationships. Results shows that our method has a high rate of precision and recall. Compared with the existing methods, our method focuses on individual user identity feature and is valuable in practice.
Multi-User Location Proximity Prediction in Offline Ephemeral Social Networks
Liao Guoqiong, Wang Tingli, Deng Kun, Wan Changxuan
2016, 53(11):  2645-2653.  doi:10.7544/issn1000-1239.2016.20150388
Asbtract ( 569 )   HTML ( 0)   PDF (2307KB) ( 484 )  
Related Articles | Metrics
Offline ephemeral social network (OffESN) is defined as a new kind of offline social networks created at a specific location for a specific purpose temporally, and lasting for a short period of time. With the popularity of mobile intelligent terminals and the development of short distance communication technologies such as Bluetooth and RFID, the OffESN is receiving more and more attentions from industry and academic communities. Location proximity relations are encounter relations of the users in the OffESN. Aiming to the characteristics such as dynamic change and short duration time, this paper intends to study the problem of multi-user location proximity in the OffESN. First of all, the paper puts forward relevant concepts in the OffESN and defines the problem to be solved. Then, it designs the overall framework of multi-user location proximity prediction, including network segments collection, overlay networks construction, network filter and maximal close subgraphs discovery. Based on the framework and the splitting idea, the paper suggests a maximal close subgraph discovery algorithm for predicting multi-user location proximity. The algorithm uses weighted edge betweenness (WEB) as the basis of splitting, and uses the aggregate density as the termination condition of spitting, which can effectively solve the problem that both numbers of location proximity relations and the users in each location proximity are uncertain. Finally, the experiments on two real datasets verify the feasibility and efficiency of the suggested prediction strategy.
Collaboration Algorithm in Social Networks Based on Tasks with Partial Relation
Liu Yong, Han Xue, Li Jinbao, Ren Qianqian, Wang Nan
2016, 53(11):  2654-2665.  doi:10.7544/issn1000-1239.2016.20150617
Asbtract ( 572 )   HTML ( 2)   PDF (2898KB) ( 477 )  
Related Articles | Metrics
The collaboration problem in social networks has attracted lots of interests among the data mining community. Previous work focused on finding a team with the lowest communication cost to complete all tasks in a project. However, tasks in the realistic projects usually have partial ordering relations. Existing methods cannot deal with partial ordering relations, and thus are not capable of obtaining an effective team. In this paper, we study how to complete the tasks with partial ordering relations effectively, and propose a novel collaboration problem in social networks, named CSN-TPR (collaboration problem in social networks based on tasks with partial ordering relations). Specifically, we investigate how to select an appropriate team in social networks to complete the tasks with partial ordering relations so as to minimize the total cost which is composed of communication cost, time cost and budget cost. We firstly prove that CSN-TPR is NP-hard, and then adopt hill-climbing method, branch and bound strategy and dynamic programming method to propose an approximation algorithm called HillClimbingTF_BBS. HillClimbingTF_BBS can not only acquire an effective team, but also obtain the task allocation of each team member and the start time of each task. The experimental results on real data show that HillClimbingTF_BBS can solve CSN-TPR effectively and efficiently.