ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 November 2009, Volume 46 Issue 11
Survey of Naming and Addressing Architecture Based on Locator/Identifier Split
Tu Rui, Su Jinshu, and Peng Wei
2009, 46(11):  1777-1786. 
Asbtract ( 476 )   HTML ( 1)   PDF (1213KB) ( 507 )  
Related Articles | Metrics
With the development of the Internet, some drawbacks of the original TCP/IP architecture on mobility, scalability and security have been exposed gradually. One of the most important reasons is the problem of semantic overloading of IP address. Current IP address has dual semantic functions, which indicates both the network node’s routing locator and its endpoint identifier. The “IP overload” also limits the development of several new technologies including multi-homing, traffic engineering, etc. To tackle this problem, many researchers believe that it is necessary to build a clean-slate design of the naming and addressing architecture for the future Internet. The reason of the above drawbacks lies in the fact that there is no accurate, unique and permanent identifier to describe a network node. IAB (Internet Architecture Board) announced that in order to resolve the “IP overload”, two name spaces should be introduced to denote a network node’s locator and identifier separately, which is called “locator/identifier split”. In recent years, many new techniques and approaches have emerged based on this principle. The authors analyze some challenging problems in the research of network architecture based on “locator/identifier split”, and give a review of related works. Comments on some representative research proposals are also presented. Finally, some future research directions and further discussion are listed.
Uplink Resource Allocation in Wireless MIMO Systems
Pang Di, Zhou Jihua, Hu Jinlong, Dong Jiangtao, and Shi Jinglin,
2009, 46(11):  1787-1796. 
Asbtract ( 493 )   HTML ( 0)   PDF (1776KB) ( 542 )  
Related Articles | Metrics
Multiple input multiple output (MIMO) technologies can improve spectrum efficiency, and are considered to be one of the core technologies in future wireless communication systems. In a wireless MIMO system taking into account co-channel interference (CCI), resource allocation for scheduling services with multiple QoS requirements is a challenging problem. The CCI suppression, bandwidth and slot allocation problem in the uplink of a MIMO system are studied in this paper. Regarding the improvement of system throughput as an optimization target and QoS requirements and system fairness as constraints, the authors propose a practical SDMA-based greedy resource allocation (SGRA) algorithm. Based on interference management, CCI can be efficiently suppressed and the complicated problem with multiple constraints can be decomposed. Then a two-phase heuristic calculation and searching is carried out in SGRA. In the first phase, greedy resource allocation, primarily involving uplink scheduling and subchannel allocation, is performed in the time-frequency domain. In the second phase, the resource allocation is extended to the space-time-frequency domain. SGRA has low complexity and is applicable to practical wireless communication systems. Simulation results show that compared with conventional algorithms, SGRA can improve system throughput, and better guarantee delay and minimum data rate requirements of real-time services, while at the same time giving consideration to system fairness.
A Fast Algorithm for Inferring AS-Level Path of Internet Topology
Yang Guoqiang and Dou Wenhua
2009, 46(11):  1797-1802. 
Asbtract ( 476 )   HTML ( 0)   PDF (586KB) ( 506 )  
Related Articles | Metrics
Discovering the AS-level path between two end-points is valuable for a wide range of network research like network diagnosis, routing behaviour analysis and protocol optimization. Most existing techniques require direct access to the source, which is difficult for the uncooperative nature of the Internet. For most cases, the routing path between two ASs is the shortest policy path between them. The recently available AS-level connectivity information and AS relationship inference algorithms provide a way for inferring the shortest policy path between any two end-points. Thus it is feasible to determine the AS-level path between two end-points by inferring the shortest policy path between them. The time complexity of the existing algorithm for inferring AS-level path based on this method is O(n\+3), where n is the number of nodes in the graph. Based on the same method, an efficient algorithm for inferring AS-level path of Internet topology is proposed in this paper, which is grounded on the breadth first algorithm for calculating shortest path in undirected graph. The time complexity of the algorithm is O(nm), where n and m is the number of nodes and edges in the graph. Experimental results show that compared with the existing algorithm, this algorithm is much more fast and achieves comparable accuracy.
Data Allocation Algorithms in P2P Streaming
Li Zeping, Lu Xianliang, Nie Xiaowen, and Li Lin
2009, 46(11):  1803-1813. 
Asbtract ( 493 )   HTML ( 4)   PDF (1520KB) ( 455 )  
Related Articles | Metrics
The recently emerging P2P technologies have huge potential on resource usage and system scalability. Providing P2P-based media streaming service, which is an important application over the Internet, has attracted a lot of research interests. In P2P media streaming with the pattern of multiple senders and single receiver, it is still a challenge to optimally allocate streaming rate and media data among multiple senders. To cope with the problem, a new solution is proposed. Firstly, the authors model the optimal rate allocation problem as a non-linear optimization problem by applying queueing theory, and derive the optimal rate allocation formula that computes the optimal solution. Then, a new optimal rate allocation algorithm(ORAA) based on the formula is proposed. The ORAA algorithm can produce the optimal solution and the optimality of its solution is proved. Finally, based on the ORAA algorithm, a dynamic rate allocation algorithm (DRAA) is proposed, which can dynamically adapt to network fluctuation and optimally allocate streaming rate and media data among multiple senders. Because of its short running time, the DRAA algorithm can be used in real time. Extensive simulation results using NS2 show that the proposed DRAA algorithm effectively reduces calculation and communication overheads, and achieves a better performance than the related works with different parameters.
Recommendation-Based Grid Resource Matching Algorithm
Lin Yuan, Luo Siwei, and Yang Liner
2009, 46(11):  1814-1820. 
Asbtract ( 740 )   HTML ( 1)   PDF (1038KB) ( 477 )  
Related Articles | Metrics
Focusing on the problem of applying and matching resources under large-scale users and computing resources in grid environment, a kind of recommendation-based grid resource matching algorithm is presented. Many existing grid resource matching and scheduling algorithms have to search and compare every grid computing resource node without considering features of grid resources and users’ behaviors, while recommendation system as widely used means in e-commerce could solve all of these two problems well. To utilize recommendation mechanism could pretreat information of users and resources by translating features of grid resources to eigenvectors of items in recommendation system and setting up a satisfaction grade system considering history records with features described in resources applying process that reflect the users’ behaviors through the frequency users computed in resource nodes. Then, the authors improve SVD-based (singular value decomposition) collaborative filtering algorithm that can give users recommendation resource sets by computing the best approximate resource features to users’ behavior features matrix. Especially, the grid resource matching algorithm could mine latent features from given data, efficiently overcome the extreme sparsity of user satisfaction grade data and make use of feedback information from resources scheduling. The problem of matching a mass of resources is solved in a novel way from a new perspective.
A Resource Allocating Algorithm in Grid Workflow Based on Critical Regions Reliability
Yu Jiong, Tian Guozhong,5, Cao Yuanda, and Sun Xianhe
2009, 46(11):  1821-1829. 
Asbtract ( 509 )   HTML ( 8)   PDF (1058KB) ( 485 )  
Related Articles | Metrics
Many workflow applications often have the timing constraints such that each processing of a workflow needs to be finished within its deadline. There have been some work to improve the performance of time-constrained workflow processing. Previous work mainly considered to meet the execution time request of the critical path tasks or all of the tasks both on the critical path and on the non-critical path. Few of them, however, have taken into account the fact that successful execution of workflow within its deadline is also affected by “normal state” and “abnormal state” of grid resources occurring in successive turns and by the relative difference in execution time between tasks on the critical path and tasks on the non-critical path. To solve the problems, some new definitions, such as critical region and reliability of critical region are defined, and then a new resource allocating algorithm is proposed in terms of the finite-state continuous-time Markov process through selecting a resource combination scheme which has the lowest expenditure under certain credit level of the resource reliability in the DAG-based workflow. Compared with previous algorithms, this method is much more efficient in resource allocating, and almost no degrading in successful grid workflow execution rate. The simulation shows the validity of the new algorithm.
Continuous Versioning-Based Auditable File System
Huang Rongrong, Shu Jiwu, Chen Kang, and Xiao Da
2009, 46(11):  1830-1838. 
Asbtract ( 515 )   HTML ( 0)   PDF (1578KB) ( 518 )  
Related Articles | Metrics
With the trend of more and more recent federal, state and local legislation mandating the retention and access of electronic records and audit information, the security audit of digital data becomes more and more important. The key requirement of the digital audit is to generate verifiable audit trails on the change of electronic records. Current systems for compliance with digital audit legislation fail to provide the security and trustworthiness of audit trails in the presence of a powerful insider adversary. A continuous versioning-based auditable file system, CV-AFS, is presented. All changes to data are recorded and the system will construct a data history through continuous versioning. A trusted audit agent is introduced to generate corresponding audit trails. At a later time, an auditor may verify the version history of a file according to the audit trails, and thus important data can be protected against insider attacks. The overhead of generating audit trails is reduced through the use of incremental and parallelizable Hash construction. The authors have implemented a prototype of CV-AFS in the ext3cow versioning file system based on Linux and evaluated its performance. Postmark benchmark test shows that the time overhead of CV-AFS is reduced by 43.5% compared with traditional serial Hash construction.
Automated Testing of the Trusted Platform Module
Zhan Jing, and Zhang Huanguo,
2009, 46(11):  1839-1846. 
Asbtract ( 545 )   HTML ( 2)   PDF (1858KB) ( 569 )  
Related Articles | Metrics
Trusted computing is a new paradigm to improve client security on today’s general architecture platforms and it uses a hardware chip as a key component, called trusted platform module (TPM) to achieve the goal. As there are already many related products on the market, it is very necessary to have conformance testing. However, traditional testing methods and experiences can’t meet the testing requirements in a form that make product evaluation to be precisely and automatically processed by machines. A conformance testing model based on the state machine theory is proposed as a solid foundation for testing correctness. Furthermore, related testing strategies based on specification are proposed in order to deal with the state space explosion problem which is a major obstacle reducing practicality of the state machine theory based testing methods. Automated testing tool is also designed and implemented correspondingly to improve the efficiency and accuracy. The tool can generate test suits automatically based on database and customize testing sequences based on the state graph according to user’s requirements. It can achieve process visualization and testing results for the conformance testing that can be used by future security evaluation on trusted computing platform products.
Identification of Anomalous Traffic Clusters for Network-Wide Anomaly Analysis
Yang Yahui and Du Keming
2009, 46(11):  1847-1853. 
Asbtract ( 589 )   HTML ( 1)   PDF (758KB) ( 770 )  
Related Articles | Metrics
In the field of network security management, a number of recent researches have been dedicated to network-wide anomaly detection. But little attention has been paid to further identifying the anomalous traffic clusters which have been involved in the anomaly. Automatic identification of anomalous traffic clusters helps ISP providers to analyze and locate network anomalies for network and security management. The authors propose a method to detect and identify anomalous traffic clusters based on the filtered netflow data. The problems to be solved are described and defined formally; The Trie-based solution for detecting heavy hitters in a multi-dimensional tree is adapted and improved; the practical and flexible methods are proposed to calculate the threshold used for detecting specific heavy hitters and splitting value used for guiding the construction of trees to improve the accuracy of the algorithm; The operation for trimming off branches of the trees is integrated with reconstruction of traffic volume to decrease the size of trees to improve the efficiency for searching for heavy hitters; The methods to identify anomalous traffic clusters based on specific heavy hitters are presented. Experiments show that the methods proposed are feasible for network-wide anomaly diagnosis.
Verifying Network Security Policy Based on Features
Tang Chenghua, and Yu Shunzheng
2009, 46(11):  1854-1861. 
Asbtract ( 507 )   HTML ( 2)   PDF (1002KB) ( 634 )  
Related Articles | Metrics
The integrity, validity and consistency of the security policy have important impacts on the safety performance of network information systems. For the purpose of solving the difficult problem of verifying security policy effectively, dynamic verifying model and algorithm of the network security policy based on features are proposed. Firstly, the related concepts and the method of constructing the integrity of security policy are given. Secondly, security domain, protection factor, sensitive factor and safety factor are introduced on the basis of structural integrity, and the assessment model of the validity of security policy is also built. The relationship of defense means, application targets, and information security attribute characteristics is analyzed, the protection factor and sensitivity factor are established, and then the value of security policy safety factor is obtained in order to assess the validity of security policy. Lastly, the consistency detection algorithm is put forward according to the relationship of these features by introducing the associated logo set. It is particularly suitable for the knowledge accumulation situation and real-time consistency detection requirements. Experimental results show that the assessment model can effectively reflect the safety performance of the security policy, and the detection algorithm has higher efficiency, which provides a new solution for verifying network security policy.
Fingerprint-Based Random Sequence Generator
Zhu Hegui, Zhang Xiangde, Yang Lianping, and Tang Qingsong
2009, 46(11):  1862-1867. 
Asbtract ( 606 )   HTML ( 4)   PDF (1410KB) ( 521 )  
Related Articles | Metrics
In this paper, fingerprint characteristic is not used in identification and recognition as usual. On the contrary, the randomness of fingerprint characteristics including uniqueness, unpredictability, and external noises such as lighting and trauma, are all used to develop a new kind of random sequence generator—fingerprint-based random sequence generator, which combines cryptography with biometrics characteristics in a novel way. In order to guarantee the fingerprint-based random sequence generator works well, firstly, the auto-correlation function method and BDS statistical tests are used to verify there are no linear correlation and nonlinear correlation in fingerprint-based random sequence. Consequently, fingerprint-based random sequence obtained in this paper is absolutely and truly random. Furthermore, to verify the fingerprint-based random sequence generator is good and with high cryptographic security, the generated fingerprint-based random sequences are tested at a high level of significance by a variety of modern standard statistical tests. All these tests are from the standard Federal Information Processing Standards (FIPS) 140-1 Tests and National Institute of Standards and Technology (NIST) Special Publication 800-22 Tests which are recommended by NIST. In a word, all these discussions show that there are wide application fields for the fingerprint-based random sequence generator, which not only extends the possibilities of obtaining truly random sequence through biological systems, but also provides random sequences in simple and portable way.
Comparison of Executable Objects Based on Singatures and Properties
Fu Jianming, Qiao Wei, and Gao Debin,
2009, 46(11):  1868-1876. 
Asbtract ( 464 )   HTML ( 0)   PDF (1050KB) ( 432 )  
Related Articles | Metrics
Comparison of executable objects is widely used for software copyright, malware family, updating pattern of abnormity detection and software patch analysis. The traditional comparison methods can not meet the requirements of these applications in terms of speed and accuracy. A function unary structural signature based on adjacency matrix of a CFG, and an unary instruction signature are designed to consider the instruction on a function; according to instruction code and operand, strong/medium/weak signatures about instruction sequences are designed to make instruction comparison easy, and weak signature can handle instruction reorder outweighing small primes product (SPP); three kinds of properties are appended to partition all objects into more groups. And then, comparison methods for functions and basic code blocks are presented using the above signatures and proproties, and the matching policies using both statistical weights and the largest-exclusive are exploited to decrease the false match. Furthermore,the Hash of signtures and properties is used to speed up the match. Finally, a protocol tool PEDiff is implemented using the above methods. Experimental results demonstrate that the method has better performance in terms of matching speed and rich analysis results.
Improved Algorithms for Weighted 3D-Matching
Feng Qilong, Wang Jianxin, and Chen Jianer
2009, 46(11):  1877-1884. 
Asbtract ( 797 )   HTML ( 3)   PDF (798KB) ( 479 )  
Related Articles | Metrics
Matching problem is an important class of NP-hard problems, which has great applications in many fields, such as scheduling and code optimization etc. This paper mainly focuses on the weighted 3D-Matching problem, and gives further analysis for the structure of the problem. In order to solve the weighted 3D-Matching problem more efficiently, the weighted 3D-Matching problem is converted to the weighted 3D-Matching augmentation problem, which is to construct a maximum weighted (k+1)-matching based on a maximum weighted k-matching. The authors firstly provides further theoretical study on the structure of the weighted 3D-Matching augmentation problem and present some special properties of the problem. For the weighted 3D-Matching augmentation problem and for a given maximum weighted k-matching of the instance, there must exist two columns in this maximum weighted k-matching such that at least 2k/3 elements of those two columns are contained in the corresponding columns in maximum weighted (k+1)-matching. On the basis of those special properties and by using color-coding and dynamic programming techniques, an improved parameterized algorithm with time complexity O\+*(4.82\+[3k}) is given. The above improved algorithm results in an improved parameterized algorithm for the weighted 3D-Matching problem, which significantly improves the previous best result O\+*(5.47\+{3k}).
A Detection Algorithm for Rain-Affected Moving Objects
Xu Jing, Liu Peng, Liu Jiafeng, and Tang Xianglong
2009, 46(11):  1885-1892. 
Asbtract ( 550 )   HTML ( 0)   PDF (1731KB) ( 485 )  
Related Articles | Metrics
The degradation effects caused by raindrops are complex, which brings difficulties to image processing algorithms and objects detection. It is necessary to distinguish objects from the various fast moving raindrops or the blur effects caused by rain streaks when capturing videos in the rain. An algorithm is proposed to detect the moving objects in rain-affected videos which are obtained by the outdoor vision system. The chromatic property is used to structure the imaging model of raindrops and an imaging equation that includes the chromatic information of R, G and B channels is presented. The proposed imaging model is a measurement to describe the intensity changes of the pixels which are affected by rain streaks. This imaging equation has capability to explain the limitation of the existing models which can only process some special kind of raindrops, and also the imaging model is able to overcome those limitations. Thereafter, a method based on this imaging model is proposed to detect the rain-affected moving objects. Finally, the experimental results show that the imaging model has capability to describe the imaging properties of raindrops, and it has stronger robustness and better performance for outdoor vision systems than the traditional methods.
Gesture Synthesis Based on Virtual Newscaster for Broadcast TV News Programs
Yan Qingcong, Chen Yiqiang, and Liu Junfa
2009, 46(11):  1893-1899. 
Asbtract ( 682 )   HTML ( 1)   PDF (1319KB) ( 535 )  
Related Articles | Metrics
How to develop avatar displaying system based on Chinese sign language synthesis technology for TV news programs is a novel task. However, the majority of existing displaying systems working on Chinese sign language synthesis concentrate on mapping system which automatically converts Chinese text to some sign that the deaf can understand well and it doesn’t implement such displaying system. In this paper, an improved context-sensitive algorithm of gesture animation is presented. It can effectively utilize the differentiae of the consecutive key frames and make the avatars’ gesture motion more natural and smooth than traditional algorithms. The authors propose a motion retargeting algorithm based on statistics and regulations for avatars’ gesture animation. It uses regulations based on statistics, and considers the differentiae of skeleton in size and the features of gesture motion between virtual models, which have the similar skeleton topology. It can improve the accuracy of gesture motion data when transferring the standard avatar’s gesture motion data to the new model without rectifying the standard model’s gesture motion data by people using specified software. Finally, by extending the forms of basic gesture words’ representation and adopting the alpha blending technique, a displaying system based on Chinese sign language synthesis technology for TV news programs has been effectively implemented.
A Buffer Based Extensional Model for Topological Relation and Its Application
Wang Shengsheng, Liu Jie, Xie Qi, and Liu Dayou
2009, 46(11):  1900-1906. 
Asbtract ( 427 )   HTML ( 2)   PDF (1162KB) ( 454 )  
Related Articles | Metrics
Qualitative spatial reasoning (QSR) is promising for applications in artificial intelligence and other fields. Much research work bas been done on single spatial relation aspect, while little research focused on integration of two or more aspects. This does not accord with the real world applications, where several aspects are usually involved together. Since different aspects of space are often dependent, it is needed to establish more elaborate formalisms that combine different types of information. The researches on combining topology and distance are not sufficient now. And the model which is tractable in basic relations and easy to implement in GIS is lacking. An extensional topology relation model, BERCC, is proposed based on RCC theories. Its main idea is to improve express ability by using the topological relation of buffers. Some distant information is included in the model. The weak composition table of BERCC is deduced. The basic relations of BERCC are proved to be tractable. The tractable subset of BERCC including basic and full relations is given. A constraint satisfaction reasoning algorithm of BERCC is implemented. Finally, an experimental system is developed with the above theories and methods. The correctness and practicability of the model and the algorithm are validated by the system.
Structural Risk Minimization Principle Based on Complex Random Samples
Ha Minghu, Tian Jingfeng, and Zhang Zhiming
2009, 46(11):  1907-1916. 
Asbtract ( 518 )   HTML ( 0)   PDF (982KB) ( 516 )  
Related Articles | Metrics
Statistical learning theory is commonly regarded as a sound framework that handles a variety of learning problems in presence of small size data samples. However, statistical learning theory is based on real random samples (real number valued random variables) and as such is not ready to deal with the statistical learning problems involving complex random samples (complex number valued random variables), which can be encountered in real world scenarios. Structural risk minimization principle is one of the kernel contents of statistical learning theory. The principle is the theoretical fundamental of establishing the support vector machine. Based on the above, the structural risk minimization principle of statistical learning theory based on complex random samples is explored. Firstly, the definitions of annealed entropy, growth function and VC dimension of a set of complex measurable functions are proposed and some important properties are proved. Secondly, the bounds on the rate of uniform convergence of learning process based on complex random samples are constructed. Finally, the structural risk minimization principle of complex random samples is proposed. The consistency of this principle is proven, and asymptotic bounds on the rate of convergence are derived. The investigations will help lay essential theoretical foundations for establishing support vector machine based on complex random samples.
Action Reasoning Independent of Designer
Zhou Shengming, Wang Ju, and Jiang Yuncheng,
2009, 46(11):  1917-1924. 
Asbtract ( 452 )   HTML ( 0)   PDF (873KB) ( 382 )  
Related Articles | Metrics
Action and action reasoning is a basic part of human activities. People must execute some actions when they want to complete some tasks. Similarly, robot needs to execute some actions when she accomplishes a task. High-level intelligent robot is required to be able to sense the external environment and do correct reasoning about actions independently. It is needed that the external designer writes out background axioms, sensing results and related knowledge changes for agent when expressing action reasoning with sensing actions and knowledge in situation calculus action theory. This is a kind of action reasoning depending on the designer. The situation calculus action theory is expanded in proper way, the sensors representation is added into the formal language of action theory, and agent’s new knowledge producing is based on the results of sensor applications in this paper. A platform is provided in which the following issues can be expressed formally: robot is sensing its external environment; the information obtained by robot’s sensors is converted to robot’s knowledge automatically; robot does an action reasoning independent of designer. In this way, the “black box” process of sensing actions will be clear; robot’s knowledge will be linked to the results of sensors; knowledge-fluent will be regarded as a dynamic knowledge base; and robot can update the knowledge base by executing sensing actions. Furthermore, robot can make action planning and execute actions independent of designer.
An Improved Working Set Selection Strategy for Sequential Minimal Optimization Algorithm
Zeng Zhiqiang, Wu Qun, Liao Beishui, and Zhu Shunzhi
2009, 46(11):  1925-1933. 
Asbtract ( 637 )   HTML ( 2)   PDF (945KB) ( 509 )  
Related Articles | Metrics
Working set selection is an important step in the sequential minimal optimization (SMO) type methods for training support vector machine (SVM). However, the feasible direction strategy for selecting working set may degrade the performance of the kernel cache maintained in standard SMO. In this paper, an improved strategy for selecting working set applied in SMO is presented to handle such difficulties based on the decrease of objective function corresponding to second order information. The new strategy takes into consideration both iteration times and kernel cache performance related to the selection of working set in order to improve the efficiency of the kernel cache, which leads to reduction of the number of kernel evaluation of the algorithm as a whole. As a result, the training efficiency of the new method upgrades greatly compared with the older version. On the other hand, the SMO with the new strategy of working set selection is guaranteed to converge to an optimal solution in theory. It is shown in the experiments on the well-known data sets that the proposed method is remarkably faster than the standard SMO. The more complex the kernel is, the higher the dimensional spaces are, and the relatively smaller the cache is, the greater the improvement is.
An SVM Active Learning Algorithm and Its Application in Obstacle Detection
Han Guang, Zhao Chunxia, and Hu Xuelei
2009, 46(11):  1934-1941. 
Asbtract ( 492 )   HTML ( 0)   PDF (1568KB) ( 694 )  
Related Articles | Metrics
Obstacle detection is one of the tasks which are solved for intelligent robot in the unstructured complicated environment perception. Large amounts of training data are usually necessary in order to achieve satisfactory generalization, and attaining these training data is also relatively easy. While manually labeling data is an expensive and tedious process. The current research work related to the solutions of the above problems is also very limited. Active learning algorithm is introduced to obstacle detection here. Aiming at the problems and limitations in the process of applying general active learning algorithm, two strategies are used to improve general SVM active learning algorithm. These two strategies use a dynamic clustering to select the best representative samples and, according to the difference of experts labeling and current SVM classification results, to tune the SVM hyperplane location. At the same time, a new SVM active learning algorithm is proposed, that is KSVMactive. Experiments are carried out in real wilderness environment image database. Experimental results demonstrate: very good detection results are obtained using KSVMactiv algorithm with only 81 samples, which can show that it can significantly reduce the workload of labeling data, and its convergence is better than other active learning algorithms.
Semi-supervised Learning of Promoter Sequences Based on EM Algorithm
Wang Lihong, Zhao Xianjia, and Wu Shuanhu
2009, 46(11):  1942-1948. 
Asbtract ( 413 )   HTML ( 0)   PDF (787KB) ( 499 )  
Related Articles | Metrics
The eukaryotic promoter prediction is one of the most important problems in DNA sequence analysis. Promoter is a short sub-sequence before a transcriptional start site(TSS) in a DNA sequence. The prediction of the position of a promoter may approximately describe the position of a TSS, and gives help to biology experiments. Most proposed prediction algorithms are based on some search strategies, such as search by signal, search by content or search by CpG island, their performances are still limited by low sensitivities and high false positives. The promoter classification algorithm based on Markov chain has been proved to be effective in promoter prediction, where parameters such as transition probabilities are calculated by statistics on the labeled samples. In this paper, semi-supervised learning is introduced in promoter sequence analysis to improve classification accuracy with a combination of labeled and unlabeled sequences, and the maximum likelihood estimation formulas for transition probabilities are deduced. In simulating experiments, each long genomic sequence is truncated to short segments, which are mixed with labeled data, and classified according to the calculated probabilities. Comparison with some known prediction algorithms show that semi-supervised learning of promoter sequences based on EM algorithm is efficient when the number of labeled data is small, and the value of F\-1 is higher than that of predictions based on labeled samples.
Principal Component Linear Regression Analysis on Performance of Applications
Li Shengmei, Cheng buqi, Gao Xingyu, Qiao Lin, and Tang Zhizhong
2009, 46(11):  1949-1955. 
Asbtract ( 641 )   HTML ( 0)   PDF (1230KB) ( 517 )  
Related Articles | Metrics
The factors influencing application performance are various and the extents of influence are different. Analyzing and distinguishing the extents of influence caused by various factors can guide the architects in the architecture design and help programmers in the optimization. However, it is not easy to distinguish the extents of influence because the factors may correlate each other themselves. In this paper, a principal component linear regression model aiming at performance of SPEC CPU2006 integer benchmarks is set up. Cycles per instruction(CPI) is used to represent the application performance and the performance events monitored by performance monitor unit (PMU) are used to represent the influencing factors. Principal component analysis is implemented to eliminate the linear correlation among performance events. Then linear regression model is set up which uses CPI as the dependent variable and principal components as the independent variables. This model can analyze the influence on CPI caused by the performance events i.e. L1 data cache miss, L2 cache miss, DTLB miss, branch mis-prediction, micro-fusion, memory disambiguation events quantitatively. The model is validated by the t test and F test with goodness of fit over 90%. The average relative prediction error of the model is 15%. The results show quantitatively how L1 and L2 cache misses dominate the performance of the applications.
Reductive Data Cube Based on Formal Concept Analysis
Shi Zhibin and Huang Houkuan
2009, 46(11):  1956-1962. 
Asbtract ( 623 )   HTML ( 2)   PDF (1072KB) ( 496 )  
Related Articles | Metrics
A comparative examination of data cube lattice and formal concept lattice shows that both of them are based on order-structure; and furthermore, equivalent character groups based on formal concept analysis (FCA) and cover equivalence classes of cells in data cube have the same partitions for data. The theories of FCA and concept lattice are introduced into the research of data cube and aggregate concept lattice(ACL) is firstly brought forward in this paper. The ACL has the same structure with original concept lattice. It can reduce data cube in same ratio with quotient cube and contain all aggregations of cube integrally. But ACL is still complex. So another reductive structure called reductive aggregate concept lattice(RACL) is proposed, which contains only non-object concepts but not all the concepts. The RACL combined with base table is still a fully pre-computed cube and can realize the reduction of data cube in higher ratio. An efficient method of query answering in ACL and RACL is proposed at the same time. Experiments are also presented by using both the synthetic and real-world data sets. The theoretic and experimental results show that the space saving in RACL is more efficient than previous ones and query is efficient too.