• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search

2007  Vol. 44  No. 7

Abstract:
In this paper, the Gabor transform is combined with the hierarchical histogram to extract facial expressional features, which could hierarchically represent the change of texture in local area, and thus capture the intrinsic facial features involved in facial expression analysis. In addition, Gabor transform is relatively less sensitive to the change of lighting conditions and can thus tolerates certain rotation and deformation of images. All these properties make this representation scheme quite robust in various conditions and more powerful than traditional representation scheme using 1-D Gabor coefficients. With the histogram feature used, a weak classifier which uses vectors as input and multi-class real values as output is designed to classify facial expression. This weak classifier has been embedded into the multi-class real AdaBoost algorithm to meet the request of multi-class classification of facial expression. The resulted method (named MVBoost) directly assigns a multi-class label to every input image. Thus it needs not to train many two-class classifiers to meet the request of multi-class classification. The training process and classification process are both simplified. Experiments conducted on common database show the efficiency and effectiveness of the proposed technique. It is expected that this new technique will make contribution to fast and robust facial expression recognition.
Abstract:
In practice, feature information such as colors, textures, surface normals, etc. is necessary to represent 3D models besides geometry and topology information of triangle meshes. However, with the rapid improvement of 3D capture equipments, the accuracy of 3D models increases and results in the large volume of 3D data. Most traditional methods could not deal with the feature of mesh models such as boundary edges and textures during simplification. So, in order to decrease the data volume of 3D textured models in digital museum and retain significant feature synchronously, a new algorithm of texture-mesh based on triangle collapse is presented. By the concept of geometry and texture boundary, the original meshes are divided to boundary, corner, interior or colory triangles according to different boundary conditions. In this way, the quadric and error criterion is improved, and constraint strategies are adopted to preserve the feature of geometry boundaries and texture attributes during the simplification process. In addition, a method combining octree-based compression with progressive meshes is given to provide a continuous multi-resolution representation of 3D models. The proposed method is successfully applied to progressive transmission and presentation system of 3D models over the Internet.
Abstract:
Starting from the viewpoint of maximum a posteriori (MAP) and MRF theory, a generalized variational functional model for image restoration is established in this paper. In this model, a hybrid image regularization term and image fidelity term are included. For image fidelity term, the distribution of noise is treated as the generalized Gaussian density, and thus the shape parameter is estimated by a maximum likelihood method to automatically choose the suitable L\+p norm as the image fidelity criteria. Assuming that the gradient of images is a member of ε-contaminated normal distributions, an image prior model in the form of total variational integral and Dirichlet integral is proposed using the robust estimation method. Due to the convexity of the proposed energy functional, the existence of the minimizing solution of such functional is discussed. Finally a weighted gradient descent flow is developed for image de-noising with an iterative algorithm based on semi-point scheme. Experimental results show that the model can automatically distinguish the statistical distribution of noise and has good performance in image restoration, including Gaussian noise and impulse noise pollution. Compared with other variation methods, the performance analysis and evaluation is made by calculating the peak of signal noise ratio (PSNR) and peak of edge preservation ability (PEPA).
Abstract:
It is important to improve the precision of the reconstructed surfaces from volume data. The marching cubes method and its precision are analyzed in detail. It is found that the error brought by the marching cubes method can reach to 1.5 pixels, which is a serious problem for reconstructing the small and thin objects, such as human blood vessels. A new method with more precision is presented. The new method distinguishes the pixels inside, across or outside of the boundary by a threshold, and determines the position of the boundary point according to the values of the three adjacent pixels (inside, across and outside), which is different from the marching cubes. The new method also modifies the grid topology of the MC method, considering the position change of boundary points. Theoretical analysis shows that the new method can find the accurate boundary points when the boundary is a straight line in a pixel. The precision and overhead of the new method are discussed. Finally examples of CT data show the contrast between the new method and the MC method.
Abstract:
Recently, manifold learning and semi-supervised learning are two hot topics in the field of machine learning. But there are only a few researches on semi-supervised learning from the point of manifold learning, especially for semi-supervised regression. In this paper, semi-supervised regression on manifolds is studied, which can employ the manifold structure hidden in datasets to the problem of regression estimation. Firstly the framework of Laplacian regularization presented by M. Belkin et al. is introduced. Then the framework of Laplacian semi-supervised regression with a class of generalized loss functions is deduced. Under this framework, Laplacian semi-supervised regression algorithms with linear ε-insensitive loss functions, quadric ε-insensitive loss functions and Huber loss functions are presented. Their experimental results on S-curve dataset and Boston Housing dataset are given and analyzed. The problem of semi-supervised regression on manifolds is interesting but quite difficult. The aim of this paper is only to accumulate some experience for further research in the future. There are still many hard problems on semi-supervised regression estimation on manifolds, such as constructing statistical basis of the algorithm, looking for better graph regularizer in the framework of Laplacian semi-supervised regression, designing quicker algorithms, implementing the algorithm on more datasets and so on.
Abstract:
A well-known drawback in the least squares support vector machine (LS-SVM) is that the sparseness is lost. In this study, an effective pruning algorithm is developed to deal with this problem. To avoid solving the primal set of linear equations, the bottom to the top strategy is adopted in the proposed algorithm. During the training process of the algorithm, the chunking incremental and decremental learning procedures are used alternately. A small support vector set, which can cover most of the information in the training set, can be formed adaptively. Using the support vector set, one can construct the final classifier. In order to test the validation of the proposed algorithm, it has been applied to five benchmarking UCI datasets. In order to show the relationships among the chunking size, the number of support vector machine, the training time, and the testing accuracy, different chunking sizes are tested. The experimental results show that the proposed algorithm can adaptively obtain the sparse solutions without almost losing generalization performance when the chunking size is equal to 2, and also its training speed is much faster than that of the sequential minimal optimization (SMO) algorithm. The proposed algorithm can also be applied to the least squares support vector regression machine as well as LS-SVM classifier.
Abstract:
As a nonlinear extension of the classical MDS algorithm, ISOMAP is suitable to visualize nonlinear low-dimensional manifolds embedded in high-dimensional spaces. However, ISOMAP requires that the data belong to a single well-sampled cluster. When the data consists of multiple clusters, long geodesic distances may be badly approximated by the corresponding shortest path lengths, which makes the classical MDS algorithm used in ISOMAP unsuitable. Besides, the success of ISOMAP depends greatly on being able to choose a suitable neighborhood size; however, it's difficult to choose a suitable neighborhood size efficiently. When the neighborhood size is unsuitable, shortcut edges are introduced into the neighborhood graph so that the neighborhood graph cannot represent the right neighborhood structure of the data. To solve the above problems, a new variant of ISOMAP, i.e., GISOMAP, is presented, which uses a special case of MDS to reduce the influence of long geodesic distances and shortcut edges on distance preservation to a certain extent. Consequently, GISOMAP can visualize the data which consists of multiple clusters better than ISOMAP, and can also be less sensitive to the neighborhood size than ISOMAP, which makes GISOMAP be applied more easily than ISOMAP. Finally, the feasibility of GISOMAP can be verified by experimental results well.
Abstract:
A multi-candidate mathematical expression (ME) recognition system is proposed. The system includes three main components: image preprocessing, symbol segmentation and structure analysis. During the symbol segmentation period, a three-stage segmentation method based on dynamic programming (DP) is proposed. In the initial segmentation based on DP algorithm, the ME image is segmented into several blocks. In the vertical segmentation, each block is segmented into some blobs. In the horizontal segmentation based on DP algorithm, every blob is segmented into symbols. During the structure analysis period, hierarchical structure is adopted to analyze structure of ME. The hierarchical structure analysis method consists of three steps, i.e., matrix analysis, sub-expression analysis and script expression analysis. In matrix analysis (sub-expression analysis), an ME is decomposed into several basic matrixes (basic sub-expressions) and some sub-expressions (script expressions) by reconstructing the ME (sub-expression) global structure, and then every basic matrix (sub-expression) is analyzed from bottom to up. In script analysis, a graph rewriting algorithm is adopted to build script relation trees among symbols within a script expression. A spatial relation model is built to calculate spatial relations' confidence between two symbols. The experiments are implemented on a database with 3268 ME images and the results show that the proposed system works well. Top-1 ME recognition accuracy reaches 78.2%.
Abstract:
The machine-made trait of search engines brings great trouble to people when they retrieve information. Therefore, the technology of user profile is introduced to solve the problem which makes search engine fit for people' demands of individuation and intelligence better. A modeling method of user profile based on ontology is advanced. And the innovation of this paper is to do modeling with a method of combining tree graphics and spatial graphics together, to set up ontology nodes in spatial graphics and to introduce the theory of interval valued fuzzy sets. In addition, the paper explains the method in theory, brings forward some correlative definitions and formulae, and designs an algorithm of founding and updating nodes based on ontology (LW-FUNO). The modeling method is helpful in overcoming the deficiencies of traditional modeling methods and is in favor of user profile's foundation, use and perfection. Finally, a well-ordered implement is used to prove correctness of the algorithm strictly, and then, time complexity of the algorithm, which is T(n)=o(n\+2), is analyzed. It is proved theoretically that the algorithm has the traits of correctness, validity and low time complexity. It is considered that the work in this paper is a useful attempt at the research of user profile.
Abstract:
Reliable identification of protein families is a major challenge in bioinformatics. Clustering protein sequences may help to identify novel relationships among proteins. However, many clustering algorithms cannot be readily applied to protein sequences. One of the main problems is that the similarity between two protein sequences cannot be easily defined. A similarity analysis method based on traditional sequence alignment, which assumes conservation of contiguity between homologous segments, is inconsistent with genetic recombination. Especially for remote homology protein family members which possess similar structure or related function, this method cannot achieve correct results. Information about protein motifs is very important to the analysis of protein family sequences. In this paper, a novel protein sequence family mining algorithm called ProFaM is proposed. The ProFaM algorithm is a two-step method. In the first step, conserved motifs across protein sequences are mined using efficient prefix-projected strategy without candidate, and then based on these result motifs, combined with weight of motifs, a novel similarity measure function is constructed. In the second step, protein family sequences are clustered using a shared nearest neighbor method according to new similarity measure. Experiments on protein family sequences database Pfam show that the ProFaM algorithm improves performance. The satisfactory experimental results suggest that ProFaM may be applied to protein family analysis.
Abstract:
Currently, data mining techniques have been widely applied in various business and financial fields. The success of data mining techniques in these fields has sparked an interest of applying such analysis techniques to various scientific and engineering fields, such as chemistry, biology and structural mechanism. However, datasets arising in scientific and engineering fields tend to have a strong topological, geometric, and/or relational nature. Most of the existing data mining algorithms can not be directly applied since they usually assume that data can be described either as a set of transactions or as multi-dimensional vectors. As a general data structure, graph model can be used to model complicated relationships among data and has been extensively used in various scientific and engineering fields. So, developing efficient graph-based mining algorithms has become a hot research topic in the data mining community in recent years. Graph classification is an important research branch in graph mining. In this paper, a novel graph classification approach based on frequent closed emerging patterns, called CEP, is proposed. It first mines frequent closed graph patterns in the graph dataset, then obtains emerging patterns from the set of closed graph patterns, and finally constructs classification rules based on emerging patterns. Experimental results show that CEP can achieve better classification performance than the current state-of-the-art graph classification approaches when applied for classifying chemical compounds. Furthermore, classification rules generated by CEP can be easily understood and exploited by domain experts.
Abstract:
Mining frequent itemsets is a fundamental and essential problem in data mining application. Most of the proposed mining algorithms are a variant of Apriori. These algorithms show good performance with spare datasets. However, with dense datasets such as telecommunications and medical image data, where there are many long frequent itemsets, the performance of these algorithms degrades incredibly. In order to solve this problem, an efficient algorithm MFCIA and its updating algorithm UMFCIA for mining frequent closed itemsets are proposed. The set of frequent closed itemsets uniquely determines the exact frequency of all frequent itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets, thus lowering the algorithm computation cost. The algorithm UMFCIA makes use of the previous mining results to cut down the cost of finding new frequent closed itemsets. The experiments show that the algorithm MFCIA is efficient.
Abstract:
Formal concept analysis which is an unsupervised learning method for conceptual clustering constitutes an appropriate framework for data mining. However, due to the completeness of concept lattice, the task of constructing the lattice is known to be computationally expensive. The iceberg lattice of context, a substructure of the complete concept lattice, served as a condensed representation of frequent itemsets. And it is well suited for analyzing very large database. And building concept lattice by merging factor lattices drawn from data fragments may be adapted to distributed data mining environment. Inspired by those ideas, a novel algorithm called Icegalamera for iceberg concept lattice assembly from heterogeneous relational tables is presented and is utilized for closed frequent itemsets mining. The completeness of closed frequent itemsets produced by Icegalamera is proved both in theory and in empirical way, and then the merge mapping process is analyzed and implemented from partial iceberg concept lattices to global one. This algorithm avoids computation of structuring the complete concept lattice. Furthermore the merge and pruning strategies are adopted, which makes the algorithm's efficiency outperforms that of the apriori algorithm on generating frequent itemsets under condense and sparse data set.
Abstract:
Web services are the mainstream distributed computing technology to construct aservice-oriented architecture (SOA). Over the past few years, lots of works have been carried out in comparing SOAP with binary protocols, such as Java RMI and CORBA. These researches show that there is a dramatic difference in the amount of encoding necessary for data transmission, when XML is compared with the binary encoding style, and all these researches have proven that SOAP, because of its reliance on XML, is inefficient compared with its peers in distributed computing. Although the performance of Web services is adequate for many important purposes, processing speed unfortunately remains to be a problem in more demanding applications. Some limitations are inherent in core features of XML: it is text based, flexible in format, and carries redundant information. Aiming at this question, a new approach to improve Web services performance is proposed. To avoid the cost brought by the traditional XML parsing and Java reflection at runtime, a specific SOAP message processor is generated dynamically for each Web service, which can create Java object for service business invoking by scanning SOAP message only once. The given experiments show that the approach can effectively improve the performance of Web service by incorporating the SOAP message processor into the SOAP engine.
Abstract:
Location management (LM), which is a challenging problem in personal communication service (PCS) networks, is used to track mobile terminals (MTs). Basically, it consists of two operations: location update and paging. Location update is a process in which the MT informs the network of its current location, while paging is a process in which the network searches for a called MT. There are three dynamic LM schemes, namely, time-based, distance-based, and movement-based LM schemes. The movement-based is simple to implement, in which each MT simply counts the number of cell boundary crossings and initiates location update when this number exceeds the predefined movement threshold. In the LM scheme used in the existing PCS networks such as GSM, a blanket paging is used that all the cells in a location area (LA) are simultaneously paged. This paging scheme consumes extra resources since the MT only stays in one cell of the paged LA consisting of a group of cells. Therefore, sequential paging schemes are proposed to overcome the drawback. In this paper, emphasis is put on optimal sequential paging for movement-based LM scheme. Both the probability distribution of an MT's moving distance and expected moving distance in a movement-based scheme are derived on the condition that the incoming calls form a Poisson process and the MT's cell residence time has exponential probability distribution. Besides, based on the derived statistics, an optimal sequential paging algorithm is proposed. Finally, numeric results show that it outperforms some well-known sequential paging schemes.
Abstract:
The design and analysis of multivariate cryptosystems play an important role in theory research and practical use. The HFE cryptosystem presented by Jacques Patarin in 1996 has long been regarded as the most promising one of multivariate cryptosystems, and is a promising public key cryptosystem with many practical applications: very fast or very short digital signatures, fast public key encryption, etc. The security of the HFE cryptosystem is based on the problem of solving a system of multivariate quadratic equations over a finite field F. The problems about keys, which haven't been investigated in detail in literature, are very important in the HFE cryptosystem. Nontrivial public key and nontrivial private key are defined. If a reversible linear mapping φ:K→Fn is given where K is an extension of degree n of the finite field F and char(F)=2, there are the corresponding qn(n+1)∏ni=1(q\+i-1)\+2 nontrivial private keys for per nontrivial public key. A conclusion that solving a system of m multivariate quadratic equations with n variants (m≤n) over F is reduced to finding root of polynomial equation over K. This result leads to a deeper understanding of HFE and may yield a new kind of attack. In addition, two categories of weak keys on the HFE cryptosystem over F are introduced.
Abstract:
The watermarking technique robust to both signal processing attacks and geometrical attacks is one of the hotspots and difficulties in the digital watermarking research. Proposed in this paper is a video watermarking scheme efficiently robust to many kinds of signal processing attacks and rotation, scaling and translation (RST) attacks. In the embedding scheme, a geometrical invariable is proposed, which is based on the invariability of statistical feature in the circular image area. Using this invariability, the geometrical distortion is first retrieved. And then a significant watermarking and synchronization information is adaptively embedded into the discrete cosine transform (DCT) domain according to the characteristics of DCT coefficients. Moreover, in the algorithm a novel embedding method is adopted, which can embed multi-bit information in a selected position. In the extraction scheme, the embedded watermarking is located by the synchronization information and the geometrical distortion is retrieved making use of circular feature area if geometrical attacks are encountered, and finally the watermarking is obliviously extracted in the DCT domain. The experimental results show the proposed scheme can achieve good transparency and better resistance against RST attacks and excellent robustness to common attacks such as MPEG compression, frame loss, etc.
Abstract:
How to efficiently revoke group membership and how to cope with key exposure are two important issues in designing group signature schemes. Up to now, there are few group schemes that can resolve the two problems at the same time. The common drawback of the previously proposed scheme is that the computational cost of verifying linearly depends on the number of the revoked group members. The revocation method based on accumulator has common drawback: previously signed signatures can not pass the verifying algorithm under the updated public value after the signer is revoked. Based on the ACJT group signature scheme, two new group signature schemes are proposed. The main trait is that they have efficiently revocable property and forward secure property at the same time. The evolution of secret key in scheme Ⅰ is more efficient. But the size of group public key and the computational cost of signing and verifying in scheme Ⅰ linearly depend on the number of time periods. Scheme Ⅱ adopts another forward secure method and overcomes this defect. Both the schemes tackle the drawback of the revocation method based on accumulator and support retroactively publicly revocable group membership with backward unlinkability. The computational cost of signing and verifying is independent of the number of the current group members and the revoked group members.
Abstract:
Although publishing views supply convenience to data exchange, it will potentially threat data owner because some information may be disclosed in the publishing process, so guaranteeing the security of the publishing views becomes a new issue of database security. Theoretically, the methods preventing the information from disclosing in the publishing process can be divided into two kinds, one is for the people who accept views, and the other is for the people who publish views. In practical applications, it is very difficult to realize the first method, so, people focus study on the second method. So far, relevant evaluating algorithms and protecting models have been proposed, for example, query answering, K-anonymity, probabilistic independence event model and so on, but all these methods have the same weakness, namely, limitation and can't eliminate information disclosure effectively, so they cannot resolve the problem completely. In order to measure the leakage of the publishing views and eliminate disclosure, an information disclosure measuring method of relative deviation is proposed, which can facilitate the information disclosure calculation, and a disclosure elimination method based on critical tuples is presented, which can eliminate the information disclosure of the publishing views completely. The experiment results show that the information disclosure measuring method of relative deviation can measure information disclosure effectively, and the information disclosure method based on critical tuples can eliminate information disclosure effectively. They can resolve the problem of information disclosure in the publishing process completely, and can guarantee the security of views in the logic level.
Abstract:
Architecture is one of important parts in software product line, and architecture comes from domain requirements. Individual requirements are seldom independent of each other, but various kinds of dependency exist among them. Software product line architecture determines reusable assets across a domain by exploring domain requirements commonality and variability, so domain requirements dependencies have very strong influence on product line architecture. A feature is a set of tight-related requirements from stakeholders' viewpoints and feature dependencies reflect requirements dependencies. Among those existing feature oriented approaches to managing requirements dependencies in software product lines, few approaches deal with mapping from requirements to product line architecture. In order to decrease the inconsistencies between assets and increase the reuse in a product line, mapping rules from requirements to features and mapping rules from features to architecture are developed based on a feature dependencies classification defined in another paper. It is also validated whether consistencies between product assets are coincident with those between requirements based on mapping rules. Following these mapping rules, consistent assets are generated from consistent requirements, thus reducing time consuming work on consistent analysis of assets, and supporting easy generation of product line architecture from domain requirements. A case study for spot and future transaction domain is described to illustrate and validate the approach.
Abstract:
Cross-organizational workflow has some different characteristics compared with normal workflow, such as process-oriented, compositionality, abstraction, involving communication, and collaboration of several systems. Traditional workflow modeling methods cannot meet these new requirements because they don't have mechanisms to support abstraction, and there are no standards and concurrency operations to obtain bigger models by combining small ones in these modeling methods. Aimed at this problem, a cross-organizational workflow modeling method based on Pi-calculus is presented and the consistency and compositionality of the model are analyzed. Using concurrent operators Pi-calculus provided, a cross-organizational business process is modeled as a composition of a set of autonomy and concurrent intra-organizational sub-processes, and an intra-organizational sub-process is modeled as a composition of a local business process definition and control constraints between organizations as well. Based on weak bisimulation theory of Pi-calculus, external behavior equivalence of two cross-organizational sub-processes is defined, which can help building abstract of an internal private business process. Compared with the traditional methods, this new cross-organizational workflow modeling method builds a loosely coupled relationship among sub-processes, which makes this model adapt to dynamic cross-organizational environment. And this model is based on strict formal method, which facilitates the analysis and verification of business process models.
Abstract:
Logarithmic converter is one of the most important components of logarithmic multipliers, which trades the computing precision for speed. It can be used in real-time applications, such as digital signal processing, which can tolerate loss in precision to a certain extent. In addition, hardware implementation of algorithms is another most important way to accelerate the execution of algorithms. In this paper an improved 32-bit binary logarithmic converter implemented in FPGA is presented. It is mainly composed of leading one detector, shifting logic and error correction circuit. Fast 4, 16, and 32-bit leading-one detector circuits are designed to obtain the leading-one position of an input binary word in parallel, which reduces the area and power-consumption estate while keeping the low delay. In addition, an improved 6-region algorithm is developed to reduce the maximum percent errors from current 30 percent to 20 percent, which greatly decreases the average error of the first region. The error correction circuit is implemented in double pipelined cycles with a bit more components to enlarge the throughput of the system; it maintains the regularity to reduce the complexity, area and power consumption of the system, while improving the precision of the computing. The design makes it possible to accelerate the processing of time-critical applications while keeping high precision.
Abstract:
Power consumption is one of the most important problems used in electronic systems today. High level synthesis can quickly trade off different objectives for complex designs during architecture optimization. A design at high level synthesis in VLSI includes two important tasks: scheduling and interconnection. In order to lower power in design, the two aspects can be considered simultaneously. In this paper, a high level synthesis scheme based on multiple voltages is proposed for low power design in VLSI under the timing and the resource constraints. In this scheme both scheduling and interconnection are considered to reduce power. First, for a given control and data flow graph, scheduling is done in Gain. Then the buses are allocated by interconnection consumption. The register transfer level graph can be optimized by the scheme in the end. In Gain scheduling, the priority function includes the power gain, the mobility, and the computation density of an operation which are three main factors in VLSI design. In interconnection, the transition activities on the signal lines and the coupling capacitances of the lines are considered simultaneously based on RS model. This scheme is implemented in CDFG Toolkits. Experiments with a number of DSP benchmarks show that the proposed scheme achieves an effective energy reduction.
Abstract:
Optimization directed inlining is a good direction for inlining, but it does not consider the factor of the execution frequency and size of the function. Although a traditional inlining model considers the factor of execution frequency and size of the function, it does not consider the optimization after inlining. In this paper, a new inline model, loop fusion conscious inline model, is proposed to avoid these drawbacks of the inline model of the past. It considers both execution frequency and size and optimization. The inlining mothod which only considers loop fusion is implemented and is added into the ORC's original inline model. Then the new inline model is built and the model is tuned for high performance. In the experiment, some fact is found that temperature (execution frequency) isn't effective in some cases, and the reason is analyzed. Experiment result shows that the new model can greatly improve the performance of the compiler, and some SPEC CPU 2000 benchmark's peak performance can increase as high as 6%, and 1% on average.