• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search

2017  Vol. 54  No. 8

Abstract:
By introducing broadcast distribution into TCP/IP, Broadcast-Storage network has clear advantages in reducing the redundant traffic in the Internet and remitting information overload problem. Uniform content label (UCL) is used to express the needs of users and help users obtain the information resources in Broadcast-Storage network. In the process of UCL recommendation, one key problem that needs to be solved is that how to improve the diversity of recommendation based on the features of Broadcast-Storage network, e.g., rich semantic information and high novelty. To solve this problem, this paper proposes a diversification method UDSCT for UCL recommendation based on semantic cover tree. UDSCT consists of two components. The first one is constructing the semantic cover tree for UCLs, which obeys some proposed invariants and considers the semantic information of UCL and the ratings from users. Besides that, new UCLs are given priority to improve the novelty of the whole UCL list. The second component is the query of diversified UCL list, which uses simple tree query and heuristic list supplement operation to obtain the diversified UCL list fast and returns specified UCL sets rapidly according to users’ need. Theoretical analysis and a series of experiments results show that, UDSCT outperforms some benchmark algorithms and is suitable for Broadcast-Storage network.
Abstract:
In this paper, an adaptive estimation of student’s t-distribution algorithm (EDA-t) is proposed to deal with the large-scale global optimization problems. The proposed algorithm can not only obtain optimal solution with high precision, but also run faster than EDA and their variants. In order to reduce the number of the parameters in student’s t-distribution, we adapt its closed-form in latent space to replace it, and use the expectation maximization algorithm to estimate its parameters. To escape from local optimum, a new strategy adaptively tune the degree of freedom in the t-distribution is also proposed. As we introduce the technology of latent variable, the computational cost in EDA-t significantly decreases while the quality of solution can be guaranteed. The experimental results show that the performance of EDA-t is super than or equal to the state-of-the-art evolutionary algorithms for solving the large scale optimization problems.
Abstract:
How to present knowledge in a more acceptable form has been a difficult problem. In most traditional conceptualization methods, educators always summarize and describe knowledge directly. Some education experiences have demonstrated schematization, which depicts knowledge by its adjacent knowledge units, is more comprehensible to learners. In conventional knowledge representation methods, knowledge schematization must be artificially completed. In this paper, a possible approach is proposed to finish knowledge schematization automatically. We explore the relationship between the given concept and its adjacent concepts on the basis of Wikipedia concept topology (WCT) and then present an innovative algorithm to select the most related concepts. In addition, the state-of-the-art neural embedding model Word2Vec is utilized to measure the semantic correlation between concepts, aiming to further enhance the effectiveness of knowledge schematization. Experimental results show that the use of Word2Vec is able to improve the effectiveness of selecting the most correlated concepts. Moreover, our approach is able to effectively and efficiently extract knowledge structure from WCT and provide available suggestions for students and researchers.
Abstract:
Much work has been done to implement metasearch engines with different rank aggregation methods. However, those methods do not have the ability to deal with the exploding data from huge amount of Web sources as well as the multiplying requirements of metasearch users. In this paper, we take the view that the rank aggregation problem can be solved with a multi-objective optimizer if the quality requirements of a user are considered along with the queries, and we find that the user’s preferences among those quality requirements can help reduce the solution space. Accordingly, we propose an evolutionary rank aggregation algorithm based on user preferences. We bring a new encoding scheme for MOPSO, leverage new definitions of position and velocity, modify initialization methods of the particle swarms, improve the turbulence operator, and adjust strategies of external archive updating and leader selection, aiming at building a discrete multi-objective optimizer based on decomposition and dominance (D\+3MOPSO) to map out the best aggregated ranking quickly and accurately from a large-scale discrete solution space. We have the proposed algorithm along with several state-of-the-art rank aggregation methods tested on 4 datasets of different sizes: the LETOR MQ2008-agg dataset, a Web dataset, a synthetically simulated dataset and an extended Web dataset. The experiment results demonstrate that our method outperforms machine-learning-based algorithms and other multi-objective evolutionary algorithms by convergence, performance and efficiency especially when dealing with the large-scale metasearch rank aggregation tasks.
Abstract:
Knowledge representation based relational inference algorithms is a crucial research issue in the field of statistical relational learning and knowledge graph population in recent years. In this work, we perform a comparative study of the prevalent knowledge representation based reasoning models, with detailed discussion of the general potential problems contained in their basic assumptions. The major problem of these representation based relational inference models is that they often ignore the semantical diversity of entities and relations, which will cause the lack of semantic resolution to distinguish them, especially when there exists more than one type of relation between two given entities. This paper proposes a new assumption for relation reasoning in knowledge graphs, which claims that each of the relations between any entity pairs reflects the semantical connection of some specific attention aspects of the corresponding entities, and could be modeled by selectively weighting on the constituent of the embeddings to help alleviating the semantic resolution problem. A semantical aspect aware relational inference algorithm is proposed to solve the semantic resolution problem, in which a nonlinear transformation mechanism is introduced to capture the effects of the different semantic aspects of the embeddings. Experimental results on public datasets show that the proposed algorithms have superior semantic discrimination capability for complex relation types and their associated entities, which can effectively improve the accuracy of relational inference on knowledge graphs, and the proposed algorithm significantly outperforms the state-of-the-art approaches.
Abstract:
With the increase of vehicle mounted sensors, the rapid change of urban landmarks and traffic facilities as well as the complex traffic conditions of vehicles and pedestrians, the demand for real-time auto-driving response capability is continuously becoming urgent. How to provide safety guarantee for auto-driving systems by handling the continuing events from sensors and accomplishing the reasoning process via scheduling strategies is worth studying. In this paper, a hard real-time scheduling method of reasoning tasks for automatic driving system is proposed, including a task model based on parallel directed acyclic graphs with hard deadlines, a scheduling algorithm and admission control algorithm to ensure the reasoning operations and reactions within their hard real-time constraints. The experimental results show that our proposed method can effectively increase the success ratio of auto-driving reasoning tasks by average 9.62% and 7.31% compared with the direct scheduling algorithm and model transformation scheduling algorithm; and has also higher admission control capability by average 7.15% compared with the algorithm proposed by Baruah, which is promising to be applied in the auto-driving system for the security concern.
Abstract:
The acquisition of lung 4D computed tomography (4D-CT) data is limited by the scanning time and radiation dose, which leads to the sampling rate in the axial direction is much less than that in the in-plane direction. In order to get better quality of 4D-CT images, based on the inherent self-similarity of medical images, a new method of image sequence super-resolution reconstruction is proposed in this paper. This method uses the local and global variational optical flow estimation to improve the quality of enlarged 4D-CT image. Firstly, we present a combined local and global variational optical flow model, in order to estimate the motion fields (i.e., the optical flow fields) between different phases in the corresponding positions. Then, the optical flow field is obtained by solving the model with the fast alternating direction method of multiplier. Finally, according to the calculated motion fields, we employ the improved non-local iterative back projection (NLIBP) algorithm to reconstruct high resolution lung images. The experimental results have shown that, in both quantification standard and visual perception, this method outperforms non-local iterative back projection algorithm and full search block matching based iterative back projection technique. Furthermore, our method can generate clear edges while enhancing the texture of images.
Abstract:
Clustering has two problems: multi-view and interpretation. In this paper, we propose an interpretable clustering with multi-view generative model (ICMG). ICMG can get multiple clustering based multi-view meanwhile qualitatively and quantitatively interpret clustering results by using semantic information in views. Firstly, we construct a multi-view generative model (MGM). It generates multiple views by using Bayesian program learning (BPL) and multi-view Bayesian case model (MBCM). Then we get multiple clustering by clustering based on views’ matching degree. Finally, ICMG qualitatively and quantitatively interprets clustering results by using semantic information in views’ prototypes and important features. Experimental results show ICMG can get multiple interpretable clustering and the performance of ICMG is superior to traditional multi-view clustering.
Abstract:
Unlike general sentiment analysis, aspect-based sentiment classification aims to infer the sentiment polarity of a sentence depending not only on the context but also on the aspect. For example, in sentence “The food was very good, but the service at that restaurant was dreadful”, for aspect “food”, the sentiment polarity is positive while the sentiment polarity of aspect “service” is negative. Even in the same sentence, sentiment polarity could be absolutely opposite when focusing on different aspects, so we need to infer the sentiment polarities of different aspects correctly. The attention mechanism is a good way for aspect-based sentiment classification. In current research, however, the attention mechanism is more combined with RNN or LSTM networks. Such neural network-based architectures generally rely on complex structures and cannot parallelize over the words of a sentence. To address the above problems, this paper proposes a multi-attention convolutional neural networks (MATT-CNN) for aspect-based sentiment classification. This approach can capture deeper level sentiment information and distinguish sentiment polarity of different aspects explicitly through a multi-attention mechanism without using any external parsing results. Experiments on the SemEval2014 and Automotive-domain datasets show that, our approach achieves better performance than traditional CNN, attention-based CNN and attention-based LSTM.
Abstract:
In order to improve the accuracy of meteorological forecasting, deal with frequent local meteorological disasters in real time, and have higher efficiency of dealing with massive data, this paper proposes a meteorological forecasting model using the Storm-based online sequential extreme learning machine. The model firstly initializes multiple online extreme learning machine. When new batches of data arrive, the model continually studies the new data samples based on the training results, and introduces the stochastic gradient descent method and the error weight adjustment method to give the error feedback for new prediction results and then update the error weight parameters in real time, and finally to improve prediction accuracy. In addition, the Storm flow processing framework is adopted to improve the proposed model in the aspect of parallelism in order to enhance the ability of dealing with massive high-dimensional data. The experimental results show that compared with the Hadoop-based parallel extreme learning machine (PELM), the proposed model has higher prediction accuracy and more excellent parallelism.
Abstract:
Ridge regression (RR) has been one of the most classical machine learning algorithms in many real applications such as face detection, cell prediction, etc. The ridge regression has many advantages such as convex optimization objection, closed-form solution, strong interpretability, easy to kernelization and so on. But the optimization objection of ridge regression doesn’t consider the structural relationship between instances. Supervised manifold regularized (MR) method has been one of the most representative and successful ridge regression regularized methods, which considers the instance structural relationship inter each class by minimizing each class’s variance. But considering the structural relationship interclasses alone is not a very comprehensive idea. Based on the recent principle of optimal margin distribution machine (ODM) learning with a novel view, we find the optimization object of ODM can include the local structural relationship and the global structural relationship by optimizing the margin variance interclasses and the margin variance intraclasses. In this thesis, we propose a ridge regression algorithm called optimal margin distribution machine ridge regression (ODMRR) which fully considers the structural character of the instance. Besides, this algorithm can still contain all the advantages of ridge regression and manifold regularized ridge regression. Finally, the experiments validate the effectiveness of our algorithm.
Abstract:
Balanced traveling salesman problem (BTSP), a variant of traveling salesman problem (TSP), is another combination optimization problem, which can be applied in many fields such as the optimization problem for gas turbine engines (GTE). BTSP can only model optimization problems with the single traveling salesman and task, but can’t model and optimize the problem with multiple salesmen and tasks at the same time. Therefore, this paper firstly provides a multi-objective balanced traveling salesman problem (MBTSP) model, which can model the optimization problems with multiple salesmen and tasks. Specifically it can be applied to the real-world problems with multiple objectives or individuals, for example, the optimization for multiple GTE. Some literatures have proved that ITO algorithm and genetic algorithms can show better performance in solving combination optimization problems, therefore, the paper utilizes the hybrid ITO algorithm (HITO) and hybrid genetic algorithm (GA) to solve MBTSP. For HITO, it utilizes ant colony optimization (ACO) to produce a probabilistic generative model based on graph, and then uses the drift and volatility operators to update the model, and obtains optimum solution. For the hybrid GA, the first is improved by greedy method called GAG, the second GA is optimized by incorporating hill-climbing named GAHC, and the final one is GASA. In order to effectively test the algorithms, the paper makes extensive experiments using small scale to large scale MBTSP data. The experiments show that the algorithms are effective and reveal the different characteristics in solving MBTSP problem.
Abstract:
Objective function-based clustering is a class of important clustering analysis techniques, of which almost all the algorithms are built by optimization of non-convex objective. Thus, these algorithms can hardly get global optimal solution and are sensitive to the provided initialization. Recently, convex clustering has been proposed by optimizing a convex objective function, not only does it overcome the insufficiency illustrated above, but it also obtains a relatively stable solution. It has been proven that clustering performance can be improved effectively by combining useful auxiliary information (typically must-links and/or cannot-links) obtained from reality with the corresponding objective. To the best of our knowledge, all such semi-supervised objective function-based clustering algorithms are based on non-convex objective, semi-supervised convex clustering has not been proposed yet. Thus, we attempt to combine pairwise constraints with convex clustering. However, the existing methods usually make the original convex objectives lose their convexity, which add constraint penalty terms to the objective function. In order to deal with such problem, we introduce a novel semi-supervised convex clustering model by using the weakly-supervised information. In particular, the key idea is to change distance metric instead of adding constraint penalty terms to the objective function. As a result, the proposed method not only maintains the advantages of convex clustering, but also improves the performance of convex clustering.
Abstract:
With the rapid development of artificial intelligence and big data, the explosive growth of big data and problem has grown in complexity, which leads to parallel intelligent computing demand increasing. Traditional theoretical models and methods are faced with severe challenges. Physics law and biological method inspired from nature has gradually become a hot spot in the present new period. Inspired by the foraging behavior of physarum, an dynamic algorithm based on energy mechanism is presented. Physarum-energy dynamic optimization algorithm (PEO) is being raised for overcome the drawbacks of physarum algorithm. According to physarum’s dynamic characteristics, the energy mechanism is introduced in PEO which aims to overcome the shortcomings of the existing physarum algorithm, such as its poor information interaction ability in whole. In addition, PEO develops age factor concept and disturbance mechanism, in order to adjust PEOs optimization ability and convergence speed in different age stages, and the convergence of algorithm model is proved through theoretical point of view. Finally, the validity and convergence of PEO are proved by experiments in TSP data set, and the main parameters of PEO are analyzed through experiments. When faced with complex problems, the simulation result comparison analysis between PEO and other optimization algorithms show that PEO is significantly better than other algorithm and PEO has the capability of high accuracy and fast convergence.
Abstract:
Existing projection-based person re-identification methods usually suffer from long time training, high dimension of projection matrix, and low matching rate. In addition, the intra-class samples may be much less than the inter-class samples when a training data set is built. To solve these problems, this paper proposes a distance-centralization based algorithm for similarity metric learning. When a training data set is to be built, the feature values of a same target person are centralized and the inter-class distances are built by these centralized values, while the intra-class distances are still directly built from original samples. As a result, the number of intra-class samples and the number of inter-class samples can be much closer, which reduces the risk of overfitting because of class imbalance. In addition, during learning projection matrix, the resulted projection vectors can be approximately orthogonal by using a strategy of updating training data sets. In this way, the proposed method can significantly reduce both the computational complexity and the storage space. Finally, the conjugate gradient method is used in the projection vector learning. The advantage of this method is its quadratic convergence, which can promote the convergence. Experimental results show that the proposed algorithm has higher efficiency. The matching rate can be significantly improved, and the time of training is much shorter than most of existing algorithms of person re-identification.
Abstract:
Link prediction is one of the primal problems in data mining. Due to the network complexity and the data diversity, the problem of link prediction for different types of data in heterogeneous networks has become more and more complicated. Aiming at link prediction in bi-typed heterogeneous information network, this paper proposes a link prediction method based on clustering and decision tree, called CDTLinks. One kind of objects is considered as the features of the other kind of objects. Then, they are clustered separately. Three heuristic rules are proposed to construct decision trees for bi-typed heterogeneous networks. The branch of the tree with the highest information gain is selected. Finally, we can judge whether there is a link between two nodes through the clustering result and the decision tree model. In addition, we define the concept of potential link nodes and introduce the number of layers, which can reduce the running time and improve the accuracy. The proposed CDTlinks method is validated on DBLP and AMiner datasets. The experimental results show that the CDTlinks model can be used to conduct link prediction effectively in bi-typed heterogeneous networks.
Abstract:
Machine translation quality estimation is an important task in natural language processing. Unlike the traditional automatic evaluation of machine translation, the quality estimation evaluates the quality of machine translation without human reference. Nowadays, the feature extraction approaches of sentence-level quality estimation depend heavily on linguistic analysis, which leads to the lack of generalization ability and restricts the system performance of the subsequent support vector regression algorithm. In order to solve this problem, we extract sentence embedding features using context-based word prediction model and matrix decomposition model in deep learning, and enrich the features with recurrent neural network language model feature to further improve the correlation between the automatic quality estimation approach and human judgments. The experimental results on the datasets of WMT’15 and WMT’16 machine translation quality estimation subtasks show that the system performance of extracting the sentence embedding features by the context-based word prediction model is better than the traditional QuEst method and the approach that extracts sentence embedding features by the continuous space language model, which reveals that the proposed feature extraction approach can significantly improve the system performance of machine translation quality estimation without linguistic analysis.
Abstract:
Location-based service (LBS) as an information sharing platform can help people obtain more useful information. But with the increasing number of users, LBS is faced with a serious problem of information overload. Using the recommender system to filter information and help users to find valuable information has become a hot research topic in recent years. In LBS, only positive implicit feedback is available and user cold-start problem in this scenario is not well studied. Based on the observations, we consider the characteristics of location-based services platform and propose a recommender algorithm, which combines collaborative PMF (probabilistic matrix factorization) with GBDT (gradient boosting decision tree), to solve the cold start problem. The algorithm first use multi probabilistic matrix factorization to learn user latent feature in different dimension, and then use gradient boosting decision tree to train the factor and label to learn the user’s preference, finally use the improved top-N recommender which considers the budget problem to produce the recommendation list. The experimental results on the real data show that the proposed algorithm can achieve better results in accuracy and F1 than other popular methods, and can solve the cold-start problem in LBS recommendation.
Abstract:
Product image search is an important application of mobile visual search in e-commerce. The target of product image search is to retrieve the exact product in a query image. The development of product image search not only facilitates people’s shopping, but also results in that e-commerce moves forward to mobile users. As one of the most important performance factors in product image search, image representation suffers from complicated image background, small variance within each product category, and variant scale of the target object. To deal with complicated background and variant object scale, we present a multi-scale deep model for extracting image representation. Meanwhile, we learn image similarity from product category annotations. We also optimize the computation cost by reducing the width and depth of our model to meet the speed requirements of online search services. Experimental results on a million-scale product image dataset shows that our method improves retrieval accuracy while keeps good computation efficiency, comparing with existing methods.
Abstract:
Currently discourse topic structure analysis is the fundamental research of natural language understanding. Due to the lack of a large number of high-quality discourse corpus resources, which are suitable for Chinese discourse analysis, it has seriously restricted the research of the relevant discourse topic computing models. In order to solve the above problems, we firstly study the theoretical representation system of Chinese discourse topic structure. From the theme-rheme theory, theory of English rhetorical structure and Pennsylvania discourse treebank system, research of Chinese complex sentence and sentence group, combined with Chinese characteristics, we propose a Chinese discourse micro-topic scheme based on theme-rheme theory and construct a Chinese discourse topic structure representation model based on the topic chain. Then, on the basis of the above, we adopt the top-down and backward search annotation strategy and the combination of the human machine and the corpus annotation method to construct the Chinese discourse topic corpus (CDTC). Moreover, we carry out a detailed statistical analysis of the CDTC which contains a total of 500 documents. Compared with the OntoNotes corpus and the generalized topic structure theory, this micro-topic scheme representation model has some advantages in theory and is consistent with the Chinese characteristics. Finally, the consistency test shows that CDTC can fully reflect the difficulty of Chinese discourse topic analysis, and can provide support for the relevant research.
Abstract:
Group recommendation has recently received great attention in the academic sector due to its significant utility in real applications. However, the available group recommendation methods mainly aggregate individual recommendation results or personal preferences directly based on an analysis of rating matrix. The relationship among users, groups, and services has not been taken into comprehensive consideration during group recommendation, which will interfere with the accuracy of recommendation results. Inspired by latent factor model and state space model, we propose a latent group recommendation (LGR) based on dynamic probabilistic matrix factorization model integrated with convolutional neural network (DPMFM-CNN), which comprehensively investigates rating matrix, service description documents and time factor and makes a joint analysis of the relationship among those three entities. The proposed LGR method firstly obtains a prior distribution for service latent factor model with the employment of text representation method based on convolutional neural network (CNN). Secondly, it integrates state space model with probabilistic matrix factorization model and draws user latent vector together with service latent vector. Thirdly, latent groups are detected through the use of multiple clustering algorithms on user latent vectors. Finally, group latent vectors are aggregated with average strategy and group rating can be generated. In addition, simulation on MovieLens is performed and comparison results demonstrate that LGR has better performance in efficiency and accuracy for group recommendation.