ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 April 2013, Volume 50 Issue 4
A Log Anomaly Detection Algorithm for Debugging Based on Grammar-Based Codes
Wang Nan, Han Jizhong, and Fang Jinyun
2013, 50(4):  677-685. 
Asbtract ( 694 )   HTML ( 2)   PDF (2255KB) ( 691 )  
Related Articles | Metrics
Debugging non-deterministic bugs has long been an important research area in software development. In recent years, with the rapid emerging of large cloud computing systems and the development of record replay debugging, the key of such debugging problem becomes mining anomaly information from text console logs andor execution flow logs. Anomaly detection algorithms can therefore be used in this area. However, although many approaches have been proposed, traditional anomaly detection algorithms are designed for detecting network attacking and not suitable for the new problems. One important reason is the Markov assumption on which many traditional anomaly detection methods are based. Markov-based methods are sensitive to harshly trashing in event transitions. In contrast, the new problems in system diagnosing require the abilities of detecting semantic misbehaviors. Experiment results show the powerless of Markov-based methods on those problems. This paper presents a novel anomaly detection algorithm which is based on grammar-based codes. Different from previous approaches, our algorithm is a non-Markov approach. It doesnt rely on statistic modeling, probability modeling or machine learning. Its principle is simple, and the algorithm is easy to implement. The new algorithm is tested on both generated sequences and real logs, and all tests results are positive. Compared with traditional methods, it is more sensitive to semantic misbehaviors.
Fault Localization Based on Kolmogorov-Smirnov Testing Model
Ye Gang, Yu Dan, Li Zhongwen, Li Xianjun, Yin Jie, Lü Jianghua, and Ma Shilong
2013, 50(4):  686-699. 
Asbtract ( 703 )   HTML ( 4)   PDF (4658KB) ( 509 )  
Related Articles | Metrics
Software debugging is time-consuming and is often a bottleneck in the software development process. Techniques that can reduce the time required to locate faults can have a significant impact on the cost and quality of software development and maintenance. Among these techniques, the methods based on predicate evaluations have been shown to be promising for fault localization. Many existing statistical fault localization techniques based on predicate compare feature spectra of successful and failed runs. Some of these approaches test the similarity of the feature spectra through parametric hypothesis testing models based on the central limit theorem. However, our finding shows that precondition for the central limit theorem and assumption on feature spectra forming normal distributions are not well-supported by empirical data. This paper proposes a non-parametric approach, the Kolmogorov-Smirnov test, to measure the similarity of the feature spectra of successful and failed runs. We also compare our approach with SOBER (a method based on the parametric hypothesis testing model). The empirical results on the Siemens suite and large programs show that our approach can outperform SOBER, especially on small samples and non-normal distributions. If a predicate is never evaluated in a run, SOBER sets its evaluation bias to 0.5. In this paper, we also investigate the effectiveness of this setting for fault localization. The empirical results on the Siemens suite show that for the method based on Kolmogorov-Smirnov test, the performance with the setting of 0.5 is better than that without the setting for fault localization.
A Modeling and Verification Method of CPS Based on Differential-Algebraic Dynamic Logic
Chen Qiaoqiao, Li Bixin, and Ji Shunhui
2013, 50(4):  700-710. 
Asbtract ( 542 )   HTML ( 0)   PDF (1769KB) ( 840 )  
Related Articles | Metrics
With the wide use of cyber physical systems in industrial control, intelligent transportation system, intelligent medical and other areas, safety has been the core problem of the recent cyber physical systems theory and application research. In this paper, a safety verification method based on differential-algebraic dynamic logic is proposed. The method firstly transforms HybridUML model to differential-algebraic program, then realizes the specification of system safety using differential-algebraic program, and finally the cyber physical systems safety is verified according to differential-algebraic program reasoning rules. Through the case study of airplane collision avoidance system, it is shown that the method can verify the validity of the collision avoidance manoeuver effectively and guarantee the safety of airplane collision avoidance system.
Frame Refinement: Combining Frame-Based Software Development with Stepwise Refinement
Zhou Jingang, Zhao Dazhe,, Xu Li, and Liu Jiren,
2013, 50(4):  711-721. 
Asbtract ( 443 )   HTML ( 0)   PDF (2029KB) ( 422 )  
Related Articles | Metrics
Frame-based software development and stepwise refinement (SWR) are two powerful paradigms to implement systematic software development, e.g., software product lines. However, these two approaches are developed independently with different assumptions and their relationship hasnt been fully explored. To fill this gap and leverage the strengths of both approaches in a unified software development process, we explore their relation, the necessity of their combination, and some interesting issues like “alternative combination strategies”, “unification of directives from different paradigms”, etc, when performing integration or fusion of these two approaches. To support our idea, we integrate SWR technology into frames design and propose the Frame++ approach which is built on XVCL and supports frame refinement with AHEAD(XAK). By integrating SWR into frames, we can (re)organize frames into frames and their associated refinements according to features (as well as taking into account the elimination of clones) and make them easy to evolve and reuse. At the same time, the generic refinements inherited from frames make it easy to deal with fine granularity variability features. We use a person maintenance module product line to illustrate our approach. Such a flexible approach with the feature-oriented perspective can improve separation of concerns in frames design and facilitate systematic software development.
An Efficient Representation of Network Flow Record
Zhang Guangxing, Qiu Feng, Xie Gaogang, and Tong Hongxia
2013, 50(4):  722-730. 
Asbtract ( 820 )   HTML ( 1)   PDF (1924KB) ( 563 )  
Related Articles | Metrics
Complexity and diversity of traffic are constantly increasing, and the demand for continuous detailed network traffic monitoring is growing. The NetFlow method has its characteristics of low efficient data representation, which causes large transmission bandwidth consuming and storage explosion. And aggregated NetFlow records lose lots of information, which result in being unable to do traffic analysis refinedly. To find an efficient network traffic information representation method is of important significance to satisfy the new demand for network monitor. This paper proposes a new flow of information representation—TABSI(target-and-application-based statistical interpretation). This method cycles per unit time monitoring of the objects in various different applications, using the flow statistics, flow number statistics and the distribution as the number of the basic description for the traffic information unit, and periodically exports the information to describe the flow characteristics of the link. Theoretical analysis shows that the TABSI can give a good description of network behavior, and the TABSI contain more information content compared with the polymer-based NetFlow, and the effectiveness of historical data is better. The results of running on real network show that the TABSI method has low data transmission, more efficient analysis performance, and large reduction in storage space.
A Method of Extracting Heavy-Hitter Flows Efficiently
Wang Fengyu, Guo Shanqing, Li Liangxiong, and Yun Xiaochun
2013, 50(4):  731-740. 
Asbtract ( 471 )   HTML ( 0)   PDF (2078KB) ( 615 )  
Related Articles | Metrics
Along with the continuous improvement of network bandwidth, identifying heavy-hitter flows on-line is more significant for some network application, such as congestion controlling, anomaly detecting and so on. A novel algorithm FEFS (flow extracting by frequency & size) is proposed to extract heavy-hitter flows online. Through online identification and elimination of small flows, the information of heavy-hitter flows is stored and updated in the limited high-speed memory, so heavy-hitter flows can be extracted rapidly and accurately. FEFS locates the flows with low update frequency using LRU (least recently used) mechanism, and furthermore it labels the relatively small flows in storage space with a flow size factor s and an adaptive modulating factor M. Taking into account both the recent update frequency and the cumulative number of packets, FEFS can identify heavy-hitter flows precisely online. Both LRU policy and size factors have taken advantage of the heavy-tail distribution characteristics of flow size, and therefore heavy-hitter flows can be handled with very low computing and storage costs. The simulation results show that in the limited storage conditions, average relative error rate of FEFS is significantly lower than that of the classic multi-stage filter algorithm, while the average packet processing time is also shorter than that of the multi-stage filter algorithm.
Degree Correlation and Its Features of Spreading on Networks
Ge Xin, Zhao Hai, and Zhang Jun
2013, 50(4):  741-749. 
Asbtract ( 573 )   HTML ( 3)   PDF (3876KB) ( 564 )  
Related Articles | Metrics
There are multiple correlations in the connection of real-world networks, which have significant impact on topology, dynamical behavior of network, etc. Aiming at degree correlation, we propose a maximum weighted matching algorithm based on certain networks or degree sequence in order to construct networks with maximum and minimum degree correlation coefficient. And we analyze the relationship between network structure and degree correlation coefficient. Then we study the influence of mixing pattern on virus spreading such as spreading speed, threshold, and stable infected ratio, based on the networks with continuous correlation coefficient. Results show that disassortative network accelerates virus spreading while spread speed is more sensitive to assortative networks. Besides, study from the angle of immunization strategy indicates that the strategies that aim at nodes with higher degree are more efficient for disassortative networks. While in real condition, the immunization should be comprehensively considered according to the effective infected ratio, correlation coefficient, objective of immunization and so on.
An Algorithm for Spectrum Allocation in Multi-Interface Vehicular Ad Hoc Networks
Peng Xin, Wang Dong, Li Renfa, Zeng Fanzi, and Fu Bin
2013, 50(4):  750-757. 
Asbtract ( 463 )   HTML ( 0)   PDF (2281KB) ( 410 )  
Related Articles | Metrics
Elevating the capability of simultaneous transmitting is one of the key issues in VANETs studies. If equipped with multiple wireless interfaces, each vehicle can simultaneously transmit over different channels, which has positive impact on the extension of communication window and the promotion of network throughput. Challenging research issues in multi-interface vehicular ad hoc networks are the spectrum allocation and the routing problems. Considering that the negative effects of the topological dynamic on spectrum allocation in VANETs, an algorithm of spectrum allocation based on clustering is presented, which could be utilized in multi-interface VANETs. Firstly, the network is clustered according to the velocity of vehicles. And then an off-line spectrum allocation scheme is applied to the inter-cluster links, thus eliminating the impact of the topological dynamic. Considering that the velocities of the vehicles from same cluster are similar, the topology of intra-cluster is stable. It is helpful to the application of dynamic spectrum allocation technology. Simulation results demonstrate that our method can effectively promote network throughput and perform well on dynamic topology.
A Data Forwarding Algorithm for Periodical Links in Space Delay-Tolerant Network
Du Xuehui, Wang Yadi, Chen Xingyuan, and Wang Zhen
2013, 50(4):  758-766. 
Asbtract ( 562 )   HTML ( 1)   PDF (2149KB) ( 583 )  
Related Articles | Metrics
The link in space delay-tolerant network is inter-contacted, which makes it impossible to form a end-to-end path. So data transmission mechanism by end-to-end based on TCP/IP can not be adjusted to space delay-tolerant network. In the space delay-tolerant network, there are large number of satellite nodes, which are shortly and periodical connected, and run very fast in periodical movement. They consist of the core link in space-to-ground and space-to-space. Toward the periodical contact links in space delay-tolerant network, the periodical contact time of a satellite to land and contact time of links between satellites are computed based on the analysis of moving regulation of satellites. The data forwarding algorithm based on contact vector between nodes is designed. This algorithm solves the problem of data effective forwarding in periodical links of space network in effect. The simulation results show that this algorithm has better performance on delivery successful rate and transmission delay in periodical links.
A Lighting Normalization Approach Exploiting Face Symmetry
Han Hu, Shan Shiguang, Chen Xilin, and Gao Wen,
2013, 50(4):  767-775. 
Asbtract ( 788 )   HTML ( 4)   PDF (1999KB) ( 681 )  
Related Articles | Metrics
Lighting normalization is a kind of widely used approach for achieving illumination invariant face recognition. Lighting normalization approaches try to regularize various lighting conditions in different face images into ideal illumination before face recognition. However, many existing methods perform lighting normalization by treating face images as natural images, and neglect the particular properties of faces, e.g. face symmetry. As a result, for the face images with side lighting, many existing methods cannot recover the facial features in shadow regions. To resolve this problem, in this paper, a novel lighting normalization approach exploiting face symmetry priori is proposed for illumination invariant face recognition. In the proposed approach, lighting normalization for a shadow region is performed by referring to the face structure information of a symmetric non-shadow region. The symmetry priori of face structure is modeled via an energy minimization framework. In addition, a shadow-free reliability map is further proposed to simplify the original bivariate optimization problem into a univariate one in order to reduce the computation cost. Experiments on face images with synthetic and real shadows show that the proposed lighting normalization approach is effective in recovering facial features in shadow regions of a face, and also robust to face misalignment and asymmetric face geometric normalization.
Wavelet-Domain Motion Estimation with Initial Search Point Prediction Using Low Bit-Depth Pixels
Song Chuanming, Tan Ru, Wang Xianghai, and Liu Dan
2013, 50(4):  776-786. 
Asbtract ( 390 )   HTML ( 0)   PDF (2467KB) ( 406 )  
Related Articles | Metrics
This paper proposes a fast wavelet-domain motion estimation algorithm with initial search point prediction using pixels of two-bit depth. Firstly, we formalize the procedure of bit-depth transformation by two successive steps, namely interval partitioning and interval mapping. A non-uniform quantization method is employed to compute three coarse thresholds. Then they are refined by using a membership function so as to obtain a video representation whose pixels are two-bit depth. Secondly, we design a non-uniform search point pattern and subsequently put forward an initial search point prediction algorithm based on two-bit depth presentation. Finally, centered on the predicted initial search point, the modified low-band-shift-based scalable video motion estimation (MLBSSME) is accepted to search motion vectors in a size-reduced window. Experimental results illustrate that the proposed algorithm can always achieve high motion estimation precision for video sequences with various characteristics. Our algorithm separately gains 0.41 dB and 1.43 dB higher average peak signal-to-noise ratio (PSNR) than the low-band-shift and band-to-band motion estimations, but 0.07 dB lower than full search (FS) in spatial domain. Nevertheless, the computational complexity of our algorithm is about 4.66% of FS, 4.62% of the low-band-shift motion estimation, and 22.70% of the band-to- band motion estimation.
A Practical and Efficient Contrast Enhancement Method
Xie Wenda, Qiu Junhong, and Wang Lei
2013, 50(4):  787-799. 
Asbtract ( 1487 )   HTML ( 6)   PDF (3821KB) ( 775 )  
Related Articles | Metrics
The principle of image enhancement is based on increasing the contrast between adjacent pixels, which enables enabling viewers to visually perceive images with more details in the textures and edges. Many contrast enhancement methods have been proposed to improve the quality of images and most of them are based on histogram equalization (HE). This paper proposes a contrast enhancement technique modified by previous study to keep the brightness of the enhanced image. The implementation of the modified version is also simplified. The proposed method is not only suitable for the requirement of video images, the adjustment for clear display made by viewers, but also simplified for the real world. In the hardware implementation, the proposed method can reach 43.6 MSps(million sample per second) which is 3.23 times faster than the demand of SDTV NTSC 720×480i specification and 1.61 times faster than the demand of EDTV NTSC 720×480p specification. Moreover, for evaluating the efficiency of the related techniques by more fairly and complete observations, the paper makes comprehensive evaluations based on several subjective and objective evaluation metrics for various famous contrast enhancement techniques. The several important issues about the efficiency characters are also included in this paper.
Image Copy Detection Scheme Against Cropping and Translation Attacks
Zhang Zhijie, Shen Tian, and Zou Jianhua
2013, 50(4):  800-807. 
Asbtract ( 521 )   HTML ( 0)   PDF (2744KB) ( 439 )  
Related Articles | Metrics
To effectively detect copyrighted images suffering from copy attacks is one of the hotspots and difficulties in the image copy detection research. Currently, most of copy detection algorithms can successfully resist the common attacks, such as noise, scale, rotation and so on, but they are quite fragile to cropping and translation attacks. In this paper, an image copy detection algorithm based on SIFT features and ordinal measure of the full DCT coefficients is proposed. The algorithm firstly extracts the same content regions from target image and query image based on matched SIFT features; Secondly, the ordinal measures of the low frequency DCT coefficients in the same content regions are used as the image signatures; Finally the image signatures are used to detect the image copies. The matched SIFT features can locate copy regions accurately, so the proposed algorithm can resist cropping and translation attacks. Meanwhile, the ordinal measure of full DCT coefficients is introduced to resist common attacks. Moreover, the flipping compensation strategy used in this algorithm can resist the horizontal and vertical flipping attacks. The experimental results show that the algorithm has higher precision and recall rate than the existing methods in terms of cropping and translation attacks.
Contrast Transfer Function Correction Model Based on Sine and Gaussian Modulation and Spline Interpolation
Wang Gongming, Zhang Fa, Fan Liya, Sun Fei, and Liu Zhiyong
2013, 50(4):  808-814. 
Asbtract ( 770 )   HTML ( 1)   PDF (2275KB) ( 530 )  
Related Articles | Metrics
In the course of 3D reconstruction for protein molecules based on single-particle cryoelectron microscopy, CTF correction has played an important role on obtaining the high resolution molecular structure. However, the traditional CTF correction model has some problems. Firstly, the reconstruction result would be amplified incorrectly in low frequency and instable in high frequency, because CTF value is very small in the two frequency regions. Secondly, the linear interpolation would affect the precision of adjusted data, because it could not match the sinusoidal shape of CTF curve. Thus, the above two issues are studied and a new CTF correction model is proposed. On the one hand, the sine and Gaussian modulation are applied in the low-frequency and high-frequency phase of CTF curve respectively. Thus, the shape of CTF curve is enhanced. On the other hand, the cubic spline interpolation is used for computing the CTF value in the fractional Fourier radius. The accuracy of the data acquisition is improved subsequently. Thereby, the resolution of reconstruction molecular structure is increased. Finally, the reconstruction experiment is carried out with the cryo-EM images of HBV. The experimental results indicate that the structure from our model has more detailed feature than the one from the old model. The resolution of reconstructed strcuture with our method is 7.8 angstrom, which is better than 8.06 angstrom of the one with traditional CTF model.
Adaptive Fast Image Segmentation Model Based on Local Feature
Ge Qi, Wei Zhihui, Xiao Liang, and Zhang Jun
2013, 50(4):  815-822. 
Asbtract ( 455 )   HTML ( 1)   PDF (2367KB) ( 433 )  
Related Articles | Metrics
Region-based active contour model such as Chan-Vese model is able to handle the blurry boundary and complex topological structures in images segmentation. However, based on the intensity-homogeneous distribution, the effect on segmentation in the images with intensity inhomogeneity is not fine. Textures are fine scale-details, usually with some periodicity nature, and they cannot be detected by intensity information. Aiming at these problems, an adaptive fast image segmentation model based on local features is proposed. On the one hand, two kinds of region data terms are designed for detecting cartoon and texture parts respectively. The local statistic information is extracted in the adaptive patch to solve the over-segmentation induced by the intensity inhomogeneities. On the other hand, the texture feature information calculated in the adaptive patch acts to compute the Kullback-Leibler distance for detecting the texture part. Our model is solved by the split Bregman method for efficiency. Experiments are carried on both medical and texture images to compare our approach with some competitors, demonstrating the precision and efficiency of our the model.
A Fitting Method of Central Catadioptric Line Images
Duan Huixian and Li Guangyao
2013, 50(4):  823-833. 
Asbtract ( 376 )   HTML ( 0)   PDF (4887KB) ( 353 )  
Related Articles | Metrics
Under central catadioptric camera, the projection of a line in space is a conic curve. Due to the partial occlusion, only a small arc of the conic curve is visible on the image plane, thus it is very difficult to correctly estimate the line image from its visible part, which affects the calibration accuracy of the camera. What’s more, the existing method in the literatures cannot still solve this problem effectively. In addition, according to the image formation of central catadioptric camera, we find that if the antipodal points of image points on the visible arc are known, the fitting accuracy can be improved greatly. In this paper, we propose a new catadioptric line fitting method, which is suitable to all kinds of central catadioptric cameras. Firstly, a new relationship between a pair of antipodal image points and the camera principal point are derived. Next, based on the relationship, a new objection function is established to estimate the line images. Finally, these estimated line images are used to calibrate camera intrinsic parameters to evaluate the performance of the fitting method. Experimental results on both simulated and real image data have demonstrated the effectiveness of the fitting method. That is, our method not only has good robustness but also can improve the fitting accuracy of central catadioptric line images and the calibration accuracy of the camera intrinsic parameters.
Automatic Knowledge Extraction from Chinese Natural Language Documents
Che Haiyan, Feng Tie, Zhang Jiachen, Chen Wei, and Li Dali
2013, 50(4):  834-842. 
Asbtract ( 1022 )   HTML ( 5)   PDF (1213KB) ( 1014 )  
Related Articles | Metrics
Automatic knowledge extraction method can recognize and extract the factual knowledge on matching the ontology from the Web documents automatically. These factual knowledge can not only be used to implement knowledge-based services but also provide necessary semantic content to enable the realization of the vision of Semantic Web. However, it is very difficult to deal with the natural language documents, especially the Chinese natural language documents. This paper proposes a new knowledge extraction method (AKE) based on Semantic Web theory and Chinese natural language processing (NLP) technologies. This method uses aggregated knowledge concept to depict N-ary relation knowledge in ontology and can automatically extract not only the explicit but also the implicit simple and N-ary complex factual knowledge from Chinese natural language documents without using the large scale linguistics databases and synonym table. Experimental results show that this method is better than other similar methods.
A Scalable Parallel Reinforcement Learning Method Based on Intelligent Scheduling
Liu Quan, Fu Qiming, Yang Xudong, Jing Ling, Li Jin, and Li Jiao
2013, 50(4):  843-851. 
Asbtract ( 795 )   HTML ( 1)   PDF (2083KB) ( 640 )  
Related Articles | Metrics
Aiming at the “curse of dimensionality” problem of reinforcement learning in large state space or continuous state space, a scalable reinforcement learning method, IS-SRL method, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem with large state space or continuous state space is divided into smaller subproblems so that each subproblem can be learned independently in memory. After a cycle of learning, next subproblem will be swapped in to continue the learning process. Information exchanges between the subproblems during the process of swap so that the learning process will converge to optima eventually. The order of subproblems’ executing significantly affects the efficiency of learning. Therefore, we propose an efficient scheduling algorithm which takes advantage of the distribution of value function’s backup in reinforcement learning and the idea of weighting the priorities of multiple scheduling strategies. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a parallel scheduling architecture, which can flexibly allocate learning tasks between learning agents, is proposed. A new method, IS-SPRL, is obtained after we blended the proposed architecture into the IS-SRL method. The experimental results show that learning based on this scheduling architecture has faster convergence speed and good scalability.
Frequent Closed Episode Mining Based on Minimal and Non-Overlapping Occurrences
Zhu Huisheng, Wang Wei, and Shi Baile
2013, 50(4):  852-860. 
Asbtract ( 647 )   HTML ( 3)   PDF (3277KB) ( 468 )  
Related Articles | Metrics
Mining frequent closed episodes from an event sequence is an important task. The existing research work is based on the support definition of minimal occurrences and the breadth-first search strategy, which unavoidably leads to the issues such as over-counting the occurrences of an episode and generating a huge number of candidate episodes. In this paper, a novel algorithm FCEMiner is proposed to mine frequent closed episodes from an event sequence, which employs the support definition of both minimal and non-overlapping occurrences and the depth-first search strategy. Moreover, FCEMiner utilizes the non-closed unanimity of special forward extension to skip redundant closure checking and narrow down the search space of frequent closed episodes. Both theoretical study and experimental evaluation confirm that FCEMiner is able to effectively discover frequent closed episodes from an event sequence.
A Universal Ensemble Learning Algorithm
Fu Zhongliang
2013, 50(4):  861-872. 
Asbtract ( 705 )   HTML ( 2)   PDF (1198KB) ( 554 )  
Related Articles | Metrics
The construction of ensemble learning algorithms is one of the important contents in machine learning area. The weak learning theorem proves that the weak learning algorithm is equal to the strong one essentially, but how to construct a good ensemble learning algorithm is still a problem to be studied. Freund and Schapire’s AdaBoost boosting algorithm, and Schapire and Singer’s real AdaBoost boosting algorithm partially solved this problem. A concept of learning error is defined. Based on it, aiming at the minimization of learning error, a universal ensemble learning algorithm is put forward. By using it, the learning error can decrease while increasing simple predictions. This universal ensemble learning algorithm can solve almost all classification problems, such as the multi-class classification problem, the cost-sensitive classification problem, the imbalanced classification problem, the multi-label classification problem, the fuzzy classification problem, etc. The universal ensemble learning algorithm can unify a series of AdaBoost algorithms and has generalized AdaBoost algorithm also. It is put forward that the simple prediction in all algorithms above can be constructed by single characteristics based on samples for the generalization ability of the ensemble prediction. Theoretical analysis and experimental conclusion all show that this universal ensemble learning algorithm can get any small learning error, and it is not easy to cause over-learning phenomenon.
Supervised Learning of an Automatic Noisy Semantic Unit Filter for Multi-Document Summarization
Gong Shu, Qu Youli, and Tian Shengfeng
2013, 50(4):  873-882. 
Asbtract ( 543 )   HTML ( 5)   PDF (3373KB) ( 412 )  
Related Articles | Metrics
The target of multi-document summarization is a document set containing many noises. Most of the state-of-art summarization systems use fixed threshold-based noise filter with a manually selected threshold to filter out low frequency units. But according to the observation in experiments, the best threshold varies according to different document sets, summarization algorithms and text representations. These mean that a fixed threshold-based noise filter cannot achieve good robustness in different summarization settings which will lead to an unstable noise filtering efficiency. Therefore, a supervised learning method to generate automatic noise filter is proposed. Based on the labels extracted automatically from human written summaries and a set of selected features which can be used for different types of semantic units, a semantic unit classifier is trained to compose the automatic noise filter, which can be used for different types of semantic unit generated by different text representation methods, and can automatically filter out noisy semantic units at the preprocessing stage of multi-document summarization systems. Experiments show the robustness of the automatic noise filter generated by the supervised learning method under different document sets, summarization algorithms and text representations, and also show the improvements in the speed and summary quality of each summarization algorithms benefited from noise filtering.
Connected Bit Minwise Hashing
Yuan Xinpan, Long Jun, Zhang Zuping, Luo Yueyi, Zhang Hao, and Gui Weihua
2013, 50(4):  883-890. 
Asbtract ( 1251 )   HTML ( 0)   PDF (2656KB) ( 462 )  
Related Articles | Metrics
Minwise Hashing has become a standard technique for estimating the similarity of the collection (e.g., resemblance) with applications in information retrieval. While traditional Minwise hashing methods store each hashed value using 64 bits, storing only the lowest b bits of each (Minwise) hashed value (e.g., b=1 or 2). The b-bit Minwise hashing algorithm can gain substantial advantages in terms of computational efficiency and storage space. Based on the b-bit Minwise hashing theory, a connected bit Minwise hashing algorithm is proposed. The unbiased estimator of the resemblance and storage factor of connected bit Minwise hashing are provided theoretically. It could be theoretically proved that the efficiency of similarity estimation is improved by the connected bit Minwise hashing algorithm since the number of comparisons is greatly reduced without significant loss of accuracy. Several key parameters (e.g., precision, recall and efficiency) are analyzed, and the availability of several estimators for connected bit Minwise hashing is analyzed. Theoretical analysis and experimental results demonstrate the effectiveness of this method.
A Prediction Model of 3D Menu Performance on Mobile Phone
Xu Ronglong, and Liu Zhengjie
2013, 50(4):  891-899. 
Asbtract ( 491 )   HTML ( 0)   PDF (1186KB) ( 495 )  
Related Articles | Metrics
Menus are key interaction components for mobile phone interface. As mobile phone functions increase, the contradiction between growing menu scale and smaller screen is getting more and more serious. 3D interface technology can extend interface information capacity. Therefore, introducing 3D menu into mobile phone is a subject worth studying, and for this, the evaluation problem of mobile phone 3D menu user performance need to be solved. Compared with empirical methods, model predicting methods can make researchers and designers evaluate the user interface quickly and at lower cost. Therefore, based on Fitts law and Hick-Hyman law, a quantitative model predicting mobile phone 3D menu performance is built and validated by experiment in this paper. There are two phases in the experiment. The goal of the first phase is to calculate coefficients of the model. They can be applied to the data analysis of the second phase experiment. The goal of the second phase is to test the validity of the model. Comparative results show that the data from experiments dovetails nicely with the data predicted by the model, and is most highly related to experiment data, compared with other relative models.