面向信息系统推荐与决策的高阶张量分析方法

王贝伦; 张嘉琦; 蔡英豪; 王兆阳; 谈笑; 沈典

doi:10.7544/issn1000-1239.202330624

面向信息系统推荐与决策的高阶张量分析方法

High-Order Tensor Analysis Method for Information System Recommendations and Decisions

摘要

摘要: 张量数据（或多维数组）在各个行业的信息系统中广泛存在，例如医疗系统中的功能性磁共振成像（fMRI）数据和商品数据信息系统中的用户-产品数据. 将这些数据用以预测张量特征与单变量响应之间的关系，可以实现数据赋能，提供更精准的服务或解决方案，例如疾病决策诊断或商品推荐. 然而，现有的张量回归方法存在2个主要问题：一是可能丢失了张量的空间信息，导致预测结果不准确；二是计算成本过高，导致服务或解决方案不及时. 对于具有高阶结构的大规模数据而言，这2点则显得更为突出. 因此为了实现数据赋能，即利用张量数据来提高信息服务或解决方案的质量和效率，提出了稀疏低秩张量回归模型（sparse and low-rank tensor regression model，SLTR）. 该模型通过对张量系数应用 \mathscrl_1 范数和张量核范数使得张量系数具有稀疏性和低秩性两大特点，这样既保留了张量的结构信息又可以方便地解释数据. 利用近端梯度方法优化了混合正则化器，使得求解过程可扩展且高效. 除此之外证明了SLTR的严格误差界. 在多个模拟数据集和一个视频数据集上的实验结果表明，SLTR相比于之前的方法，在更短的时间内获得了更好的预测性能.

Abstract: Tensor data (or multi-dimensional array data) are often generated in information systems of various industries, such as functional magnetic resonance imaging (fMRI) data in medicine systems and user-product data in product information systems. By using these data to predict the relationship between tensor features and univariate responses, data empowerment can be achieved, providing more accurate services or solutions, such as disease decision diagnosis or product recommendations. Currently available tensor regression methods, however, present two major shortcomings: the spatial information of tensors may be lost in these models, resulting in inaccurate prediction results; the calculation cost is too high, which results in untimely solutions or services. The two problems are more severe for large-scale data with high-order structures. Therefore, in order to achieve data empowerment, that is, to use tensor data to improve the quality and efficiency of information services or solutions, we propose sparse and low-rank tensor regression model (SLTR). This model enforces sparsity and low-rankness of the tensor coefficient by directly applying \mathscrl_1 norm and tensor nuclear norm on it respectively, such that not only the structural information of the tensor is preserved but also the data interpretation is convenient. To make the solving procedure scalable and efficient, SLTR makes use of the proximal gradient method to optimize the hybrid regularizer, which can be easily implemented parallelly. Additionally, a tight error bound of SLTR is theoretically proved. We evaluate SLTR on several simulated datasets and one video dataset. Experimental results show that, compared with previous models, SLTR is capable to obtain a better solution with much fewer time costs.

HTML全文

参考文献(29)

施引文献

资源附件(0)