ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 December 2013, Volume 50 Issue 12
Social Computing in the Era of Big Data: Opportunities and Challenges
Meng Xiaofeng, Li Yong, and Jonathan J. H. Zhu
2013, 50(12):  2483-2491.  doi:10.7544/issn1000-1239.2013.20130890
Asbtract ( 2430 )   HTML ( 20)   PDF (1674KB) ( 2033 )  
Related Articles | Metrics
With the rapid development of information technology, especially sweeping progress in the Internet of things, cloud computing, social networks and social media, the era of big data is coming. As a data-intensive science, social computing is an emerging thing that leverages the capacity to collect and analyze data with an unprecedented breadth, depth and scale. It represents a new computing paradigm and an interdisciplinary research and application field. A broad comprehension of major topics involved in social computing is important for both scholars and practitioners. In this paper, we give a brief survey of the various research fields in social computing. We present key concepts and analyze state-of-the-art of the field. The article not only sheds insights on social computing, but also affords conduit for future research in the field. Social computing has two distinct trends: One is on the social science issues, such as computational social science, computational sociology, social network analysis, etc; The other is on the use of computational techniques, such as social use, hedonic use and generative use. Finally some new challenges ahead are summarized, including interdisciplinary cooperation and training, big data sharing for scientific data mashups, and privacy protect.
Structure Inference and Prediction in the Co-Evolution of Social Networks
Wang Li, Cheng Suqi, Shen Huawei), and Cheng Xueqi
2013, 50(12):  2492-2503. 
Asbtract ( 975 )   HTML ( 2)   PDF (2018KB) ( 914 )  
Related Articles | Metrics
The co-evolution theory of relationship structure and interaction structure is an important issue in social networks. From this theory we can understand the change of relationship structure, evolution of interaction topology and the mutual influence between them, furthermore it can help us predict public sentiment and then control the network development. Because there are numerous interactions that could be observed and some real and latent relationships always cannot be gotten, inferring hidden relationship structure and predicting future relationships or actions become hot topics, which are also one way to uncover the principle of co-evolution. We summarize the research work of structure inference and prediction on social networks in this paper. We firstly give the definitions of co-evolution, structure inference and prediction, and analyze the relationship between them. Then we introduce and analyze the key technologies of structure inference and prediction. At last some challenges are given and future research topics are presented.
Compressing Online Social Networks by Calibrating Structure Redundancy
Yang Hailu, Zhang Jianpei, and Yang Jing
2013, 50(12):  2504-2519. 
Asbtract ( 428 )   HTML ( 1)   PDF (4190KB) ( 442 )  
Related Articles | Metrics
Online social networks are growing in unprecedented speed and emerge strong evolutionary characteristic which is caused by emerging new services such as social network and mobile Internet. Network compression is a new technique for representing a large-scale network by a concise network that can capture the structural and attributive information of the original large network. Network compression plays an important role in data storage and the visualization of networks. The previous network compression algorithm leads to exorbitant costs by comparing the difference between the original network and the compression one repeatedly when optimizing the compression loss. Furthermore, it can only deal with static network but not the evolutionary ones. To solve these problems, this paper proposes an efficient algorithm to deal with the evolutionary network compression issue. A function to measure structure merging contribution based on localized decision and its quick-adjusting algorithm are first designed, its time complexity of initial network compression is controlled between O(n) and O(mn). Then a dynamic calibrating algorithm for evolutionary network compression is proposed, which takes advantage of the changes of the topological structure in each snapshot to adjust the previous compression expression, and avoids redundantly compressing while achieving high efficiency. Finally, competitive experiments on real datasets demonstrate the effectiveness and efficiency of the proposed algorithm.
Density-Based Link Clustering Algorithm for Overlapping Community Detection
Zhu Mu, Meng Fanrong, and Zhou Yong
2013, 50(12):  2520-2530. 
Asbtract ( 1236 )   HTML ( 7)   PDF (3265KB) ( 1108 )  
Related Articles | Metrics
For detecting overlapping communities efficiently and effectively in various real-world social networks, we propose a novel density-based link clustering algorithm called DBLINK. The proposed algorithm firstly partitions the edge set of the network into disjoint link communities, which will be then transformed into the final node communities. The overlapping nodes will be linked with the edges that are assigned into different link communities. Furthermore, for obtaining the overlapping community structure with high quality and without excessive overlap, DBLINK utilizes the density-based algorithm as the clustering method for the edge set, which has the ability of identifying the isolated edges that are not satisfied with certain conditions and assigning them into no-link community. An empirical evaluation of the method using both synthetic and real datasets demonstrates that DBLINK not only has satisfying time efficiency, but also plays better performance than the state-of-the-art methods at the community detection quality aspect.
Social Awareness Computation Methods for Location Based Services
Guo Chi, Fang Yuan, Liu Jingnan, and Wan Yi
2013, 50(12):  2531-2542. 
Asbtract ( 820 )   HTML ( 1)   PDF (3305KB) ( 680 )  
Related Articles | Metrics
Positioning technologies, especially outdoor positioning, have been extensively developed recently. There has been massive location and user data which can reflect abundant social information. With the information, location based services can be intelligent and personalized. Thus the technology for location based awareness computation is needed urgently. Social awareness computation in location based services is a kind of computational technology. This technology utilizes the mass position devices deployed in human social life space to achieve following goals: 1) Analyze and recognize the behaviors of social individuals; 2) Analyze the characteristics and laws of social community interaction; 3) Guide individual social behavior; and 4) Support community interaction, communication, and collaboration. Social awareness computation is the key of transferring location based service from simple positioning service to social computation pattern. This paper focuses on these methods, and thus addresses three issues: First of all, what is location based social awareness and its framework; Secondly, what is the relationship between social properties of locations and human behaviors; And at last, what are the common awareness and data mining methods in real analysis and system application, especially in big data analysis of locations. This paper gives details from four parts: social semantic awareness of locations, location based relationship awareness of users, mobility awareness of users, and location based social characteristic awareness of users. This paper systematically classifies and generalizes those methods from the two aspects: computation models and evaluation methods.
Path Planning for Mobile Robots Based on Social Group Search Algorithm
Feng Xiang, Ma Meiyi, Shi Yin, and Yu Huiqun
2013, 50(12):  2543-2553. 
Asbtract ( 568 )   HTML ( 8)   PDF (2731KB) ( 639 )  
Related Articles | Metrics
With the rapid development of technology, “A robot in every home” will come true in the near future. Path planning for mobile robots, as an important object of robotics, has drawn amount of attentions. Inspired by the strict hierarchy and division of responsibilities by social group of animals, social group search algorithm is proposed in this paper. Individuals are classified into four groups with different search strategies according to leader-follower model. When the individuals are searching for optimal object, head leader is the best candidate and searching for better position along with leaders, which are the subordinates with well performance. Followers find the proper position by following the leaders and searching better one around them. Meantime, dispersed numbers are the also-runs which perform the worst and should be replaced by new-borns. Leaders and followers are responsible for searching optimal position, while dispersed numbers ensure the algorithm beyond the local optimum. Furthermore, crossover and mutation, as well as elimination mechanisms are introduced to our algorithm to enlarge the search scope. As a result, comparing with genetic algorithm and particle swarm optimizer, the possibility of premature and local optimum is reduced. The convergence is verified mathematically and experimentally. Via numerous simulations and comparison with other classical algorithms, the characters of high efficiency and effectiveness for path planning problem are illustrated, which are especially of great significance for the further research in robotic navigation.
Sentiment Computing of Web Financial Information Based on the Part-of-Speech Tagging and Dependency Parsing
Wan Changxuan, Jiang Tengjiao, Zhong Minjuan, and Bian Hairong
2013, 50(12):  2554-2569. 
Asbtract ( 910 )   HTML ( 0)   PDF (2287KB) ( 874 )  
Related Articles | Metrics
Sentiment orientation analysis has attracted a great deal of attention recently due to many practical applications and challenging research problems. Traditional text sentiment analysis method based on bag of words model does not take into account the syntactic structure of the sentence, while the method of text sentiment analysis based on dependency parsing model tries to solve this problem. At present, the existed methods based on dependency parsing mainly focus on the author’s observation and lead to be subjectivity and arbitrariness when selecting dependency pair. Therefore, this paper firstly finds out 4 kinds of parts of speech which could affect sentence sentiment, such as adjectives, verbs, adverbs and nouns, from the original polarity, modified polarity and dynamic polarity of emotion words. Secondly, we analyze the effects of 24 kinds of dependency pair on the sentence sentiment computation and select 8 kinds of dependency pair according to the part of speech and Chinese modification understanding. Thirdly, 6 kinds of sentiment computing rules are designed from the combination of the part of speech of the above 8 kinds of dependency pair and then the sentiment computation method based on two binary tree is proposed, which includs the construction method and the sentiment computation method. Finally, the experiments are carried out on the Web financial information and the results prove the effectiveness of this method.
Semi-Supervised Sentiment Classification Based on Sentiment Feature Clustering
Li Suke and Jiang Yanbing
2013, 50(12):  2570-2577. 
Asbtract ( 937 )   HTML ( 0)   PDF (1416KB) ( 946 )  
Related Articles | Metrics
Sentiment classification for text is an important aspect of opinion mining. This paper proposes a semi-supervised sentiment classification method based on sentiment feature clustering. The method only requires a small number of labeled training data instances. Firstly, the method extracts common text features and sentiment features. Common text features can be used to train the first sentiment classifier. Then the spectral clustering-based algorithm is employed to map sentiment features into extended features. The extended features and common text features are combined together to form the second sentiment classifier. The two classifiers select instances from the unlabeled dataset into the training dataset to train the final sentiment classifier. Experimental results show that the proposed method can reach higher sentiment classification accuracy than both the self-learning SVM-based method and the co-training SVM-based method.
An Emotion Contagion Simulation Model for Crowd Events
Liu Zhen, Jin Wei, Huang Peng, and Chai Yanjie
2013, 50(12):  2578-2589. 
Asbtract ( 868 )   HTML ( 0)   PDF (6356KB) ( 724 )  
Related Articles | Metrics
With the advancement of the process of urbanization in China, making unconventional emergent plan for dense crowd has important practical meanings. In a crowd event, the behavior of crowd is different from usual crowd movement, which can easily lead to a panic crowd trample. Emotion contagion among individuals is considered insufficiently in current crowd simulation methods; there is little research on extreme consequences for human crowd. A train station is selected as the example of study, and individual behavior with emotion is modeled in intelligent agent method. The details of calculation methods of emotion contagion are presented depending on whether there are administrators in the crowd. A prototype system of crowd emotion contagion is built, the 3D scene simulations for emotion contagion, collision avoidance and fall behavior are realized. The experimental results show that the negative emotion of crowd is related to not only the initial emotion distribution, but also the numbers and locations of administrators, so controlling the crowd negative emotion contagion is an important method of preventing crowd events. Users can adopt the model to reveal the emotion contagion in visualization manner, and the model can be further extended to other simulations of crowd scenes.
Computational Modeling of a Social Phenomenon: Evolution of Cultural Identity and Cultural Territory
Yun Jian, Liu Xiangdong, and Liu Yongkui
2013, 50(12):  2590-2602. 
Asbtract ( 669 )   HTML ( 0)   PDF (4322KB) ( 641 )  
Related Articles | Metrics
Cultural identity is one of the core problems in cultural field. Cultural identity problem refers to the phenomenon of cultural erosion and cultural integration appearing on the cultural territory inconsistent with geological boundaries. This phenomenon is caused by the evolution of the sense of cultural identity. Before the social computing methods are introduced in, this problem is a typically social science problem which can’t be applied to computing experiments. In this paper, we apply social computing methods to this problem. The experimental research of it is brought into a risk-free space. Using the artificial social modeling techniques based on the multi-agents, we have identified several computing factors. These computing factors can characterize the cross-cultural communication and the internal stability of the cultural environment. The factors such as cultural resist-digestion inertia factor, cultural identity status factor, geological distance factor among different populations and the psychological and behavior characteristics factors of “hate-to-leave” are included. Based on these factors, the computational model is given to describe the phenomenon of cultural erosion, cultural integration and cultural protection. We make a lot of social computing experiments and the results show that all the selected computing factors affect the evolution of cultural identity and cultural territory in different ways, and the evolution law can be derived from this computational model. In future, this model can play a role in the scheme design of cultural protection and effectiveness evaluation of solutions.
A Two-Step Utility Query Recommendation Method Based on Absorbing Random Walk
Zhu Xiaofei, Guo Jiafeng, Cheng Xueqi, and Lan Yanyan
2013, 50(12):  2603-2611. 
Asbtract ( 651 )   HTML ( 2)   PDF (1632KB) ( 509 )  
Related Articles | Metrics
Search engine has become an essential way for satisfying users’ daily information needs, however, formulating a proper query for search is difficult for users. To alleviate users’ search burden, query recommendation has been proposed and considered as a prominent ingredient of modern search engines. Traditional recommendation approaches have paid great attention to recommend relevant queries, which attempt to find alternative queries with close search intent to the original query. However, the ultimate goal of query recommendation is to assist users to accomplish their search task successfully, while not just find relevant queries in spite of they can sometimes produce useful search results. To better match user search objective in the real world, a more straight way of query recommendation is to recommend users high utility query, i.e., queries that can better satisfy users’ information needs. In this paper, we propose a two-step utility query recommendation method based on absorbing random walk, which can infer query’s utility by simultaneously modeling both users’ reformulation behaviors and click behaviors. Extensively experiments are conducted on a real query log, and the results show that this method significantly outperforms five baseline methods under the evaluation metric query relevant ratio (QRR) and mean relevant document (MRD).
Recognition and Retrieval of Time-sensitive Question in Chinese QA System
Hou Yongshuai, Zhang Yaoyun, Wang Xiaolong, Chen Qingcai, Wang Yuliang, and Hu Baotian
2013, 50(12):  2612-2620. 
Asbtract ( 906 )   HTML ( 4)   PDF (1632KB) ( 2010 )  
Related Articles | Metrics
Currently, question-answering(Q&A) systems such as Baidu Zhidao, SoSo WenWen, etc., have been able to find out questions semantically relevant to most queries. However, for questions with time constraint, the performance of searching results is much worse than that of the queries without such constraint. To solve this problem, an automatical recognition and retrieval method for time-sensitive questions are proposed. At first, time-sensitive questions is recognized by using classification algorithms; next, time-range of the time-sensitive question is resolved; finally, the question search results are filtered by resolved time-range. To recognize time-sensitive questions, lexical, syntactic and semantic features are extracted; machine learning methods including the decision-tree, naveBayes and SVM are employed; and AdaBoost algorithm is also adopted to solve the corpus imbalance issue. A resolving method is proposed to calculate question time-range. Based on those, a prototype system of question retrieval is used for validation, which is built from question and answer pairs of financial domain collected from Web. Experimental results show that, by using the C5.0 decision tree algorithm, the precision of time-sensitive questions recognition reaches 0.901; the mean average precision(MAP) of the retrieval result for time-sensitive questions is enhanced 0.039 2 compared with SoSo WenWen, and is enhanced 0.195 6 compared with Baidu Zhidao, increasing by 74.24% and 197.58% respectively. The average system response time of the question retrieval prototype system is 0.628 7 s.
Advertiser Status Modeling in Sponsored Search
Jiang Changhao, Zhang Min, Gao Bin, Liu Yiqun, and Ma Shaoping,
2013, 50(12):  2621-2628. 
Asbtract ( 416 )   HTML ( 0)   PDF (1673KB) ( 478 )  
Related Articles | Metrics
Sponsored search is a successful business model currently on the Internet. It has become the main income source of search engine companies and has offered great opportunities for advertisers. Search engine, advertiser and user are the three main components in sponsored search. Search engine offers technology and advertisement service, advertiser offers advertisement contents while user views and clicks advertisements. Related technologies of search engine and user behavior have been studied and developed by many researchers. However, advertisers especially the status of advertisers has not been studied well in the area. Based on such situation, we conducted deep analysis on the status of advertisers. Technically, such status is demonstrated by the impression number, the click number and the cost of the advertiser’s advertisements. Then hidden Markov model is utilized in demonstrating such sequential status of advertisers. The focus is on introducing methodologies of machine learning and data mining into advertiser modeling. At the same time, we get high precision when predicting advertiser status with our model, which convinces us that such method is suitable.
Research on Case Index BCS-Tree and Its Constructing Method
Fan Haixiong, Liu Fuxian, and Xia Lu
2013, 50(12):  2629-2641. 
Asbtract ( 392 )   HTML ( 0)   PDF (5016KB) ( 454 )  
Related Articles | Metrics
Aiming at the existing problems in case index study, a new method, called BCS-Tree, is proposed. Firstly, the GRC algorithm is improved in self-adaptive way, which can solve deficiencies of being seriously affected by initial values and just applying to convex cluster based on existing cluster methods. Then the handling ability of MBR method for the nonlinearity and no normality data is enhanced by integrating KICA with MBR. After analyzing existing method, the dual reference point selection method is designed. Moreover, the BCS-Tree constructing method is presented based on the improved GRC algorithm and dual reference point clustering splitting. Finally, based on comprehensive analyzing the possible distributing relation between query point and case data, the BCS-Tree query algorithm is designed. Furthermore, the BCS-Tree and query algorithm are analyzed by theory derivation and instance verification. The result proves that the BCS-Tree index constructing method presented in this paper is of better robustness and applicability, and BCS-Tree together with the query algorithm is of better searching efficiency. For case-based reasoning and case index research domain, the BCS-Tree provides effective method supporting and new research thoughts.
Classification in Networked Data Based on the Probability Generative Model
Wang Zhenwen, Xiao Weidong, and Tan Wentang
2013, 50(12):  2642-2650. 
Asbtract ( 667 )   HTML ( 2)   PDF (2131KB) ( 458 )  
Related Articles | Metrics
Classification in networked data, which classify entities based on their relationship information, is an important research issue of the data mining field. The previous methods usually assign a class to a node based on the classes of its neighbor nodes. These methods have high performance of classification in the networks with high. However, there are many networks with low homophily in the real world. In the networks with low homophily, there are a majority of connected nodes whose classes are different from each other. The previous methods cannot assign the correct classes to the nodes in such networks. Therefore, a novel method of classification in networked data is proposed in this paper. The main idea of the proposed method is to build a new generative model for networks, in which the edges of networks are observed variables and the classes of the nodes whose classes are unknown are latent variables. The values of latent variables can be calculated by fitting the generative model to the network. Consequently, the classes of the nodes whose classes are unknown are obtained. Experimental results on the real datasets show that the proposed method can provide better performance than the previous methods in the networks with low homophily.
Co-residency Detection Scheme based on Shared Cache in the Cloud
Yu Si, Gui Xiaolin, Zhang Xuejun, Lin Jiancai,, and Wang Junfei,
2013, 50(12):  2651-2660. 
Asbtract ( 854 )   HTML ( 0)   PDF (2553KB) ( 465 )  
Related Articles | Metrics
Cloud computing, an emerging computing and service paradigm, where the computing and storage capabilities are outsourced on demand, offers the advanced capabilities of sharing and multi-tenancy. However, it also introduces a range of new vulnerabilities, such as side channel attacks. Malicious users can extract sensitive information from other users covertly via side channel attack, which breaks the isolation between the co-resident virtual machines (VMs). In the existing works, interferences introduced by other co-resident VMs are not considered sufficiently. However, they are realistic in the multi-tenancy cloud. Based on the existing results, we propose the co-residency detection scheme via cache-based side channel attacks in the virtual computing environment, considering the interferences of the VMs. In the scheme, we investigate the use of expectation and entropy to describe the cache load characteristics relating to the location of victim VM. Then, the algorithm based on clustering technique is used to extract the cache load characteristics, and the VMs co-residency detection rules are proposed to complete detection. The experimental results show that the scheme can obtain the load profile efficiently and accurately, and realize co-residency detection with high true detection rate. It further demonstrates that side channel attack is a significant security challenge faced by cloud computing.
Generation of Test Cases Based on Symbolic Execution and LTL Formula Rewriting
Chen Donghuo, and Liu Quan,
2013, 50(12):  2661-2675. 
Asbtract ( 833 )   HTML ( 0)   PDF (2853KB) ( 472 )  
Related Articles | Metrics
It is a breakthrough to use model checking technique to automatically generate test cases. For infinite states systems with input/output domains defined on unbounded and abstract types, the function of general model checking technique is very restricted in generating test cases. This paper presents the idea of auto-generation of test cases based on symbolic execution and LTL temporal formula rewriting method. The method proceeds with building the symbolic representation of program execution model, such that it can avoid explicitly building the model of infinite states systems with the enumeration of value of input and output or state explosion problem. Then temporal formula (test purposes) rewriting is applied to the symbolic execution model of program to generate complex constraint requirements according to the counterexample patterns related to test purposes and the suitable SMT(satisfiability modulo theory) solver is called for generating test cases. At the level of syntax, IOSTS and LTL formulae are used to specify interactive behaviors of systems under testing and test purposes, respectively. Therefore, the approach has the advantages of modeling the abstract behaviors and data dependence as well as defining all kinds of test requirements concisely and naturally in a rigorous means. Further, the paper designs the algorithm for LTL formula rewriting engine, which computes the necessary condition satisfied by test cases related to the targeted test purposes. At last, a case study is conducted for showing the usefulness of the method.
Corecursive Operations with Parameters and the Associated Calculational Laws
Su Jindian and Yu Shanshan
2013, 50(12):  2676-2690. 
Asbtract ( 474 )   HTML ( 0)   PDF (1780KB) ( 403 )  
Related Articles | Metrics
The unfold defined on coinductive datatypes is not able to describe those coinductive operations with parameters. To solve the problem, we prove that the final coalgebras for finitely extended polymonial functors on cartesian closed category are strongly final under the conditions of fixed or accumulating parameters, which yields the definitions of strong coinductive datatypes and coinductive operations with fixed or accumulating parameters-named punfold and aunfold respectively. We extend the researches of Pardo on strong inductive datatypes and recursions with parameters, named pfold and afold, to coinductive datatypes. This unfold can directly carry extra parameters as the inputs of calculations or store temporal values produced by programs, which as a result avoids using higher-order functions. We discuss the properties and the associated calculational laws for punfold and aunfold in a categorical and abstract sense, and present their implementations by the functional programming language Haskell. We also point out their applications in the fields of reasoning, transformations and optimizations of programs.
Algebraic Properties of Probabilistic Finite State Automata
Xie Zhengwei, Zhai Ying, Deng Peimin, and Yi Zhong
2013, 50(12):  2691-2698. 
Asbtract ( 724 )   HTML ( 1)   PDF (696KB) ( 554 )  
Related Articles | Metrics
Probabilistic finite state automata(PFA)is widely used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked:computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition, and machine translation are some of them. In this paper, algebraic properties of PFA are studied by using algebraic tools such as matrices, homomorphisms, isomorphisms, congruences, and so on. Firstly, the concept of congruences of two strings over input alphabet is defined, and some equivalent characterizations of congruences of two strings are given by probabilistic transition matrices.Secondly, notions of homomorphisms and isomorphisms of PFA are introduced, and homomorphism theorem of PFA is shown. It is also proved that two PFA are isomorphic if and only if their probabilistic transition matrices can be transformed into each other by the first kind of elementary transformation of rows and columns. The concept of products and sums of PFA is defined, and the homomorphism relationships between products and sums of PFA are investigated. Finally, the concept of commutativity of fuzzy finite state machines is introduced for PFA, and several equivalent characterizations of commutativity of PFA are obtained as well as the sufficient and necessary conditions of the commutativity of products and sums of PFA by probabilistic transition matrices.
Fast Building Method of Advanced AC Automaton
Fan Hongbo and Yao Nianmin
2013, 50(12):  2699-2706. 
Asbtract ( 583 )   HTML ( 2)   PDF (1397KB) ( 473 )  
Related Articles | Metrics
Advanced AC (AAC) is the most widely used multi-patterns string matching algorithm. The building cost of AAC automaton is high for large-scale matching. In this article, the automaton building method of DFA (a basic single pattern string matching algorithm) is improved and it is expanded into multi-patterns matching field. Thus, an automaton for multi-pattern string matching called Set DFA is proposed and it is proved to be same as AAC automaton. The building method of Set DFA can be described in two steps: 1)Building trie and setting all undefined trans of the initial state point to the initial state. 2)Accessing all states of the trie in level-order, and setting all undefined trans. Let the string along the path from the initial state to the accessed state be u\-1u\-2…u\-k, and let the trans character of the undefined trans be c, this undefined trans should be set to point to the result state after the string of u\-2…u\-kc being inputted from the initial state on existing automaton. This building method can be also improved to liner time building. The experimental results indicate that building Set DFA is about two times faster than building AAC automaton, because AC failure function is not used in buliding Set DFA and each state in Set DFA just is accessed once after being generated.