范畴表示机器学习算法

徐晓祥; 李凡长; 张莉; 张召

doi:10.7544/issn1000-1239.2017.20160350

摘要: 长期以来，人们认为表示问题是机器学习领域的瓶颈问题之一.机器学习方法的性能在很大程度上依赖于数据表示的选择.数据表示领域的主要问题是如何更好地学习到有意义和有用的数据表示.宽泛来看数据表示领域有深度学习、特征学习、度量学习、成分建模、结构化预测和强化学习等.这些技术应用的范围也非常广泛，包括图像、语音识别和文字理解等.因此，研究机器学习表示方法是一件长期且具有探索意义的工作.基于此，利用范畴理论来研究机器学习方法的表示，提出了范畴表示机器学习方法的基本概念.对决策树、支持向量机、深度神经网络等方法进行研究分析，提出了范畴表示分类算法、范畴表示决策树算法、切片范畴表示主成分分析和支持向量机算法、范畴函子表示深度学习方法，给出相应的理论证明及可行性分析.并对这5种算法做了深入分析，找到了主成分分析和支持向量机之间的本质联系，最后通过仿真实验论证范畴表示方法的可行性.

Abstract: For a long time, it is thought that the representation is one of the bottleneck problems in the field of machine learning. The performance of machine learning methods is heavily dependent on the choice of data representation. The rapidly developing field of representation learning is concerned with questions surrounding how we can best learn meaningful and useful representations of data. We take a broad view of the field and include topics such as deep learning, feature learning, metric learning, compositional modeling, structured prediction, reinforcement learning, etc. The range of domains to which these techniques apply is also very broad, from vision to speech recognition, text understanding, etc. Thus, the research on new representation methods for machine learning is a piece of work which is long-term, explorative and meaningful. Based on this, we propose several basic concepts of category representation of machine learning methods via the category theory. We analyze the decision tree, support vector machine, principal component analysis and deep neural network with category representation and give the corresponding category representation for each algorithms: the category representation of decision tree, slice category representation of support vector machine, and functor representation of the neural network. We also give the corresponding theoretical proof and feasibility analysis. According to further reach of category representation of machine learning algorithms, we find the essential relationship between support vector machine and principal component analysis. Finally, we confirm the feasibility of the category representation method using the simulation experiments.

范畴表示机器学习算法

The Category Representation of Machine Learning Algorithm