ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 August 2020, Volume 57 Issue 8
A Unified Momentum Method with Triple-Parameters and Its Optimal Convergence Rate
Ding Chengcheng, Tao Wei, Tao Qing
2020, 57(8):  1571-1580.  doi:10.7544/issn1000-1239.2020.20200194
Asbtract ( 353 )   HTML ( 6)   PDF (2704KB) ( 176 )  
Related Articles | Metrics
Momentum methods have been receiving much attention in machine learning community due to being able to improve the performance of SGD. With the successful application in deep learning, various kinds of formulations for momentum methods have been presented. In particular, two unified frameworks SUM (stochastic unified momentum) and QHM (quasi-hyperbolic momentum) were proposed. Unfortunately, even for nonsmooth convex problems, there still exist several unreasonable limitations such as assuming the performed number of iterations to be predefined and restricting the optimization problems to be unconstrained in deriving the optimal average convergence. In this paper, we present a more general framework for momentum methods with three parameters named TPUM (triple-parameters unified momentum), which includes SUM and QHM as specific examples. Then for constrained nonsmooth convex optimization problems, under the circumstances of using time-varying step size, we prove that TPUM has optimal average convergence. This indicates that adding the momentum will not affect the convergence of SGD and it provides a theoretical guarantee for applicability of momentum methods in machine learning problems. The experiments on L1-ball constrained hinge loss problems verify the correctness of theoretical analysis.
Support Vector Machine with Eliminating the Random Consistency
Wang Jieting, Qian Yuhua, Li Feijiang, Liu Guoqing
2020, 57(8):  1581-1593.  doi:10.7544/issn1000-1239.2020.20200127
Asbtract ( 354 )   HTML ( 12)   PDF (6174KB) ( 220 )  
Related Articles | Metrics
During the process of human learning, it is an important step to make the evaluation and feedback of the learning results objective. Usually, due to the lack of knowledge of evidence, there may exist consistency generated by the randomness in the learning results. Such rough feedback will hinder the improvement of the learning ability. Similarly, the machine learning system is a system driven by data and guided by performance measure. Due to the limitation, imbalance and noise of data, the results of machine learning also contain random consistency. However, the machine learning systems with the accuracy as the feedback index cannot discriminate the random consistency, which damages the generalization ability. In this paper, we propose the definition of the random accuracy and the pure accuracy. Further, the necessity of the elimination of random accuracy is analyzed. Then, based on the defined pure accuracy, we propose an SVM model with eliminating the random consistency, called as PASVM, and validate its efficiency on ten different benchmark data sets downloaded from KEEL. The experimental results show that the performance of the PASVM is better than that of the traditional SVM method, the SVMperf method and other methods that can optimize the pure accuracy measure.
Mondrian Deep Forest
He Yixiao, Pang Ming, Jiang Yuan
2020, 57(8):  1594-1604.  doi:10.7544/issn1000-1239.2020.20200490
Asbtract ( 605 )   HTML ( 14)   PDF (2279KB) ( 216 )  
Related Articles | Metrics
Most studies about deep learning are built on neural networks, i.e., multiple layers of parameterized differentiable nonlinear modules trained by backpropagation. Recently, deep forest was proposed as a non-NN style deep model, which has much fewer parameters than deep neural networks. It shows robust performance under different hyperparameter settings and across different tasks, and the model complexity can be determined in a data-dependent style. Represented by gcForest, the study of deep forest provides a promising way of building deep models based on non-differentiable modules. However, deep forest is now used offline which inhibits its application in many real tasks, e.g., in the context of learning from data streams. In this work, we explore the possibility of building deep forest under the incremental setting and propose Mondrian deep forest. It has a cascade forest structure to do layer-by-layer processing. And we further enhance its layer-by-layer processing by devising an adaptive mechanism, which is capable of adjusting the attention to the original features versus the transformed features of the previous layer, therefore notably mitigating the deficiency of Mondrian forest in handling irrelevant features. Empirical results show that, while inheriting the incremental learning ability of Mondrian forest, Mondrian deep forest has a significant improvement in performance. And using the same default setting of hyperparameters, Mondrian deep forest is able to achieve satisfying performance across different datasets. In the incremental training setting, Mondrian deep forest achieves highly competitive predictive performance with periodically retrained gcForest while being an order of magnitude faster.
A Bayesian Classification Algorithm Based on Selective Patterns
Ju Zhuoya, Wang Zhihai
2020, 57(8):  1605-1616.  doi:10.7544/issn1000-1239.2020.20200196
Asbtract ( 361 )   HTML ( 19)   PDF (672KB) ( 232 )  
Related Articles | Metrics
Data mining is mainly related to the theories and methods on how to discover knowledge from data in very large databases, while classification is an important topic in data mining. In the field of classification research, the Nave Bayesian classifier is a simple but effective learning technique, which has been widely used. It is commonly thought to assume that the probability of each attribute belonging to a given class value is independent of all other attributes. However, there are lots of contexts where the dependencies between attributes are more complex. It is an important technique to construct a classifier using specific patterns based on “attribute-value” pairs in lots of researchers’ work, while the dependencies among the attributes implied in the patterns and others will have significant impacts on classification results, thus the dependency between attributes is exploited adequately here. A Bayesian classification algorithm based on selective patterns is proposed, which could not only make use of the excellent classification ability based on Bayesian network classifiers, but also further weaken restrictions of the conditional independence assumption by further analyzing the dependencies between attributes in the patterns. The classification accuracies will benefit from fully considering the characteristics of datasets, mining and employing patterns which own high discrimination, and building the dependent relationship between attributes in a proper way. The empirical research results have shown that the average accuracy of the proposed classification algorithm on 10 datasets has been increased by 1.65% and 4.29%, comparing with the benchmark algorithms NB and AODE, respectively.
Linear Regularized Functional Logistic Model
Meng Yinfeng, Liang Jiye
2020, 57(8):  1617-1626.  doi:10.7544/issn1000-1239.2020.20200496
Asbtract ( 297 )   HTML ( 11)   PDF (2336KB) ( 138 )  
Related Articles | Metrics
The pattern recognition problems of functional data widely exist in various fields such as medicine, economy, finance, biology and meteorology, therefore, to explore classifiers with more better generalized performance is critical to accurately mining the hidden knowledge in functional data. Aiming at the low generalization performance of the classical functional logistic model, a linear regularized functional logistic model based on functional principal component representation is proposed and the model is acquired by means of solving an optimization problem. In the optimization problem, the former term is constructed based on the likelihood function of training functional samples to control the classification performance of functional samples. The latter term is the regularization term, which is used to control the complexity of the model. At the same time, the two terms are combined by linear weighted combination, which limits the value range of the regularizer and makes it convenient to give an empirical optimal parameter. Then, under the guidance of this empirical optimal parameter, a logistic model with the appropriate number of principal components can be selected for the classification of functional data. The experimental results show that the generalization performance of the selected linear regularized functional logistic model is better than that of the classical logistic model.
Late Fusion Multi-View Clustering Based on Local Multi-Kernel Learning
Xia Dongxue, Yang Yan, Wang Hao, Yang Shuhong
2020, 57(8):  1627-1638.  doi:10.7544/issn1000-1239.2020.20200212
Asbtract ( 336 )   HTML ( 15)   PDF (947KB) ( 325 )  
Related Articles | Metrics
Graph-based multi-view clustering is one of the representative methods in that field. However, existing models still have problems as following. First, most of them do not consider the difference of clustering capacity among different views and force all views to share a common similarity graph. Next, some models construct the similarity graph and conduct clustering in separated steps, resulting in the constructed similarity graph is not optimal for the following clustering tasks. Finally, although there are many models using kernel learning to deal with the nonlinear relationship between data points, most of them calculate the self-expressive relationship in kernel space based on global models. Such global schemes are not conducive to fully explore local nonlinear relationship, and easy to bring about heavy computing load. Therefore, this paper proposes a late fusion multi-view clustering model based on local multi-kernel learning. We implement information fusion at the level of class partition space rather than similarity graph, and adopt local multi-kernel learning scheme to fully preserve the local nonlinear relationship as well as reduce the computational load. We also propose an alternative optimization scheme to solve the construction of similarity graph, combination of multi-kernel and generation of class indicator matrix in a unified framework. Experiments on multiple datasets show that the proposed method has good multi-view clustering effect.
Adaptive Neighborhood Embedding Based Unsupervised Feature Selection
Liu Yanfang, Li Wenbin, Gao Yang
2020, 57(8):  1639-1649.  doi:10.7544/issn1000-1239.2020.20200219
Asbtract ( 315 )   HTML ( 15)   PDF (1543KB) ( 313 )  
Related Articles | Metrics
Unsupervised feature selection algorithms can effectively reduce the dimensionality of high-dimensional unmarked data, which not only reduce the time and space complexity of data processing, but also avoid the over-fitting phenomenon of the feature selection model. However, most of the existing unsupervised feature selection algorithms use k-nearest neighbor method to capture the local geometric structure of data samples, ignoring the problem of uneven data distribution. To solve this problem, an unsupervised feature selection algorithm based on adaptive neighborhood embedding (ANEFS) is proposed. The algorithm determines the number of neighbors of samples according to the distribution of datasets, and then constructs similarity matrix. Meanwhile, a mid-matrix is introduced which maps from high-dimensional space to low-dimensional space, and Laplacian multiplier method is used to optimize the reconstructed function. The experimental results of six UCI datasets show that the proposed algorithm can select representative feature subsets which have higher clustering accuracy and normalize mutual information.
A Degree Corrected Stochastic Block Model for Attributed Networks
Zheng Yimei, Jia Caiyan, Chang Zhenhai, Li Xuanya
2020, 57(8):  1650-1662.  doi:10.7544/issn1000-1239.2020.20200158
Asbtract ( 346 )   HTML ( 9)   PDF (741KB) ( 168 )  
Related Articles | Metrics
Community detection is an important task in complex network analysis. The existing community detection methods mostly focus on utilizing the simple network structure, while the methods of integrating network topology and node attributes are also mainly aimed at the traditional community structure, which fails to detect the bipartite structure, mixed structure, etc. However, the degree of each node in the network will affect the composition of the links in the network, as well as the distribution of the community structure. This paper proposes a method called DPSB_PG for attributed networks community detection based on the stochastic block model. Unlike other generative models for attributed networks, in this method, the generation of node links and node attributes both followes the Poisson distribution, and considers the probability between communities based on the stochastic block model. Moreover, the idea of degree corrected is integrated in the process of generating node links. Finally, in order to obtain the community membership of nodes, the expectation-maximization algorithm is used to infer the parameters of the model. The experimental results on the real networks show that the DPSB_PG inherits the advantages of the stochastic block model and can detect the general community structure in networks. Since the introduction of the idea of degree corrected, this model has a good data fitting ability. Overall, the performance of this model is superior to other existing state-of-the-art community detection algorithms for both attributed networks and non-attributed networks.
Conditional Variational Time-Series Graph Auto-Encoder
Chen Kejia, Lu Hao, Zhang Jiajun
2020, 57(8):  1663-1673.  doi:10.7544/issn1000-1239.2020.20200202
Asbtract ( 356 )   HTML ( 8)   PDF (992KB) ( 170 )  
Related Articles | Metrics
Network representation learning (also called graph embedding) is the basis for graph tasks such as link prediction, node classification, community discovery, and graph visualization. Most of the existing graph embedding algorithms are mainly developed for static graphs, which is difficult to capture the dynamic characteristics of the real-world networks that evolve over time. At present, research on dynamic network representation learning is still inadequate. This paper proposes a conditional variational time-series graph auto-encoder (TS-CVGAE), which can simultaneously learn the local structure and evolution pattern of a dynamic network. The model improves the traditional graph convolution to obtain time-series graph convolution and uses it to encode the network in the framework of conditional variational auto-encoder. After training, the middle layer of TS-CVGAE is the final network embedding. Experimental results show that the method performs better in link prediction task than the related static and dynamic network representation learning methods with all four real dynamic network datasets.
Exploiting Composite Relation Graph Convolution for Attributed Network Embedding
Chen Yiqi, Qian Tieyun, Li Wanli, Liang Yile
2020, 57(8):  1674-1682.  doi:10.7544/issn1000-1239.2020.20200206
Asbtract ( 396 )   HTML ( 11)   PDF (965KB) ( 333 )  
Related Articles | Metrics
Network embedding aims at learning a low-dimensional dense vector for each node in the network. It has attracted much attention from researchers in recent years. Most existing studies mainly focus on modeling graph structure and neglect the attribute information. Though attributed network embedding methods take node attribute into account, the informative relations between nodes and their attributes are still under-exploited. In this paper, we propose a novel framework to employ the abundant relation information for attributed network embedding. To this end, we first present to construct the composite relations between the nodes and their attributes in attributed networks. We then develop a composite relation graph convolution network (CRGCN) to encode the composite relations in both types of networks. We conduct extensive experiments on real world datasets and results demonstrate the effectiveness of our model on various network analysis tasks.
High Dimensional Data Stream Clustering Algorithm Based on Random Projection
Zhu Yingwen, Chen Songcan
2020, 57(8):  1683-1696.  doi:10.7544/issn1000-1239.2020.20200432
Asbtract ( 259 )   HTML ( 9)   PDF (2161KB) ( 153 )  
Related Articles | Metrics
High dimensional data streams emerge ubiquitously in many real-world applications such as network monitoring. Clustering such data streams differs from traditional data clustering algorithm where the given datasets are generally static and can be read and processed repeatedly, thus facing more challenges due to having to satisfy such constraints as bounded memory, single-pass, real-time response and concept-drift detection. Recently many methods of such type have been proposed. However, when dealing with high dimensional data, they often result in high computational cost and poor performance due to the curse of dimensionality. To address the above problem, in this paper we present a new clustering algorithm for data streams, called RPFART, by combining the random projection method with the adaptive resonance theory (ART) model that has linear computational complexity, uses a single parameter, i.e., the vigilance parameter to identify data clusters, and is robust to modest parameters setting. To gain insights into the performance improvement obtained by our algorithm, we analyze and identify the major influence of random projection on ART. Although our method is embarrassingly simple just by incorporating the random projection into ART, the experimental results on variety of benchmark datasets indicate that our method can still achieve comparable or even better performance than RPGStream algorithm even if the raw dimension is compressed up to 10% of the original one. For ACT1 dataset, its dimension is reduced from 67500 to 6750.
Deep Generative Recommendation Based on List-Wise Ranking
Sun Xiaoyi, Liu Huafeng, Jing Liping, Yu Jian
2020, 57(8):  1697-1706.  doi:10.7544/issn1000-1239.2020.20200497
Asbtract ( 320 )   HTML ( 9)   PDF (1906KB) ( 166 )  
Related Articles | Metrics
Variational autoencoders have been successfully applied in recommendation field in recent years. The advantage of this kind of nonlinear probabilistic model is that it can break through the limited modeling ability of linear model, which is still dominant in collaborative filtering research. Although the recommendation method based on variational autoencoder has achieved excellent performance, there are still some unresolved problems, such as the inability to generate personalized recommendation ranking lists for users based on the recommendation data of implicit feedback. Therefore, in this paper, we propose a depth generation recommendation model for variational autoencoder by using polynomial likelihood to implement list-based ranking strategies. The model has the ability to simultaneously generate point-wise implicit feedback data and create a list-like ranking list for each user. To seamlessly combine ranking loss with variational autoencoder loss, the normalized cumulative loss gain (NDCG) is adopted here and approximated with a smoothed function. A series of experiments on three real-world datasets (MovieLens-100k, XuetangX and Jester) have been conducted. Experimental results show that the variational autoencoder combined with list-wise ranking method has better performance in generate a personalized recommendation list.
Mutual Linear Regression Based Supervised Discrete Cross-Modal Hashing
Liu Xingbo, Nie Xiushan, Yin Yilong
2020, 57(8):  1707-1714.  doi:10.7544/issn1000-1239.2020.20200122
Asbtract ( 258 )   HTML ( 5)   PDF (1634KB) ( 69 )  
Related Articles | Metrics
Cross-modal hashing can map heterogeneous multimodal data into compact binary codes with similarity preserving, which provides great efficiency in cross-modal retrieval. Existing cross-modal hashing methods usually utilize two different projections to describe the correlation between Hash codes and class labels. In order to capture the relation between Hash codes and semantic labels efficiently, we propose a method named mutual linear regression based supervised discrete cross-modal hashing(SDCH) in this study. Only one stable projection is used in the proposed method to describe the linear regression relation between Hash codes and corresponding labels, which enhances the precision and stability in cross-modal hashing. In addition, we learn the modality-specific projections for out-of-sample extension by preserving the similarity and considering the feature distribution with different modalities. Comparisons with several state-of-the-art methods on two benchmark datasets verify the superiority of SDCH under various cross-modal retrieval scenarios.
A Sequence-to-Sequence Spatial-Temporal Attention Learning Model for Urban Traffic Flow Prediction
Du Shengdong, Li Tianrui, Yang Yan, Wang Hao, Xie Peng, Horng Shi-Jinn
2020, 57(8):  1715-1728.  doi:10.7544/issn1000-1239.2020.20200169
Asbtract ( 669 )   HTML ( 21)   PDF (5462KB) ( 476 )  
Related Articles | Metrics
Urban traffic flow prediction is a key technology to study the behavior of traffic-related big data and predict future traffic flow, which is crucial to guide the early warning of traffic congestion in the intelligent transportation system. But effective traffic flow prediction is very challenging as it is affected by many complex factors, e.g. spatial-temporal dependency and temporal dynamics of traffic networks. In the literature, some research works applied convolutional neural networks (CNN) or recurrent neural networks (RNN) for traffic flow prediction. However, it is difficult for these models to capture the spatial-temporal correlation features of traffic flow related temporal data. In this paper, we propose a novel sequence-to-sequence spatial-temporal attention framework to deal with the urban traffic flow forecasting task. It is an end-to-end deep learning model which is based on convolutional LSTM layers and LSTM layers with attention mechanism to adaptively learn spatial-temporal dependency and non-linear correlation features of urban traffic flow related multivariate sequence data. Extensive experimental results based on three real-world traffic flow datasets show that our model has the best forecasting performance compared with state-of-the-art methods.
Student Performance Prediction Model Based on Two-Way Attention Mechanism
Li Mengying, Wang Xiaodong, Ruan Shulan, Zhang Kun, Liu Qi
2020, 57(8):  1729-1740.  doi:10.7544/issn1000-1239.2020.20200181
Asbtract ( 408 )   HTML ( 14)   PDF (1830KB) ( 378 )  
Related Articles | Metrics
The prediction and analysis of student performance aims to achieve personalized guidance to students, improve students’ performance and teachers’ teaching effectiveness. Student performance is affected by many factors such as family environment, learning conditions and personal performance. The traditional performance prediction methods either treat all the factors equally, or treat all students equally, which cannot achieve personalized analysis and guidance for students. Therefore, we propose a two-way attention (TWA) based students’ performance prediction model, which can assign different weights to different influence factors, and pay more attention to the important ones. Besides, we also take the individual features of students into account. Firstly, we calculate the attention scores of the attributes on the first-stage performance and the second-stage performance. Then we consider a variety of feature fusion approaches. Finally, we made better predictions of student performance based on the integrated features. We conduct extensive experiments on two public education datasets, and visualize the prediction results. The result shows that the proposed model can predict student performance accurately and have good interpretability.
Multi-Source Contextual Collaborative Recommendation for Medicine
Zheng Zhi, Xu Tong, Qin Chuan, Liao Xiangwen, Zheng Yi, Liu Tongzhu, Tong Guixian
2020, 57(8):  1741-1754.  doi:10.7544/issn1000-1239.2020.20200149
Asbtract ( 347 )   HTML ( 15)   PDF (1431KB) ( 186 )  
Related Articles | Metrics
Recent years have witnessed the accumulation of electronic medical records (EMR), as well as the rapid development of data analytics techniques, which highly support the intelligent medical services, e.g., automatic diagnosis and medicine recommendation. Unfortunately, due to the simplicity of general EMR, the diagnosis model could be easily disturbed by common diseases or symptoms, thus fine-grained prescription with personalized focalization will hardly be achieved. At the same time, we realize that some related context information, e.g., personalized information like age and sexuality, treatment records like examinations, and external information like weather and temperature, could all benefit the diagnosis and medicine recommendation task. However, these information could not be effectively extracted and integrated by current techniques, which constrains the performance of medicine recommendation. To that end, in this paper, we propose a comprehensive framework based on the collaborative awareness of multi-source context information. Specifically, we first utilize the bag-of-words model to process the EMR and related context records. Along this line, a LDA-based contextual collaborative model called Medicine-LDA has been designed to integrate the multi-source information, while at the same time, alleviate the problem of combination explosion of context information. Extensive experiments on the real-world data set from a first-rate hospital demonstrate the effectiveness of our solution.
A Hierarchical Attention Mechanism Framework for Internet Credit Evaluation
Chen Yanmin, Wang Hao, Ma Jianhui, Du Dongfang, Zhao Hongke
2020, 57(8):  1755-1768.  doi:10.7544/issn1000-1239.2020.20200217
Asbtract ( 322 )   HTML ( 16)   PDF (1622KB) ( 257 )  
Related Articles | Metrics
With the development of the Internet, online service products based on user credit have been increasingly applied to various fields. The Internet user credit data, which contains diverse types of data, describes the user’s various aspects. Thus how to use user’s data to evaluate users’ credit ratings on the Internet is an important issue. Most of previous research methods mainly focus on the traditional credit evaluation which is based on the extraction of attributes in the credit field. However, there are only a few of work on Internet credit evaluation. And those work lies in lacking efficient methods to consider the different importance of multiple user attributes on their credit history. Therefore, to solve these problems, this paper presents a hierarchical attention mechanism framework for user credit evaluation based on users’ profiles. Specifically, first, the model builds user profile with user attributes such as user credit history and user behaviors to describe the coarse granularity of users. Then, the significance of user’s attribute with multiple attention layers is gradually obtained to achieve the evaluation of user credit ratings. Extensive experimental results on the public dataset have demonstrated that this model can achieve better performance on evaluation of user than other benchmark algorithms.
SCONV: A Financial Market Trend Forecast Method Based on Emotional Analysis
Lin Peiguang, Zhou Jiaqian, Wen Yulian
2020, 57(8):  1769-1778.  doi:10.7544/issn1000-1239.2020.20200494
Asbtract ( 632 )   HTML ( 29)   PDF (1623KB) ( 647 )  
Related Articles | Metrics
The stock market plays a critical role in the economic development of countries, and it is also a market closely related to our daily life. The sentiment of shareholders may be judged as one of the factors affecting the stock price. This paper proposes a deep learning model of stock sentiment analysis price prediction based on convolution long short-term memory, named semantic convolution (SCONV). The model utilizes long short-term memory model and word2vec to analyze the emotion, extracts emotion vector, and to calculate the emotion weight of each day. Then we put the corresponding weights of the daily stock prices respectively to the average of the previous day, the previous three days, and the average of the previous week, together with the stock price into the ConvLstm. There is a dropout between ConvLstm and the increased LSTM to avoid over-fitting. In this paper, BABA.us, 000001.sh, 000651.sz are used as experimental data. BABA.us about 3 years, 000001.sh about 1.5 years and 000651.sz about 5 months are respectively implemented in the experiment. Compared with traditional models, the experimental results show that SCONV is still able to predict more precisely the trend of the stock price on a smaller sample set.