四元数神经网络的通用近似与逼近优势

吴锦辉; 姜远

doi:10.7544/issn1000-1239.202440410

四元数神经网络的通用近似与逼近优势

吴锦辉,
姜远

Universal Approximation and Approximation Advantages of Quaternion-Valued Neural Networks

摘要

摘要: 四元数神经网络将实值神经网络推广到了四元数代数中，其在偏振合成孔径雷达奇异点补偿、口语理解、机器人控制等任务中取得了比实值神经网络更高的精度或更快的收敛速度. 四元数神经网络的性能在实验中已得到广泛验证，但四元数神经网络的理论性质及其相较于实值神经网络的优势研究较少. 从表示能力的角度出发，研究四元数神经网络的理论性质及其相较于实值神经网络的优势. 首先，证明了四元数神经网络使用一个非分开激活的修正线性单元（rectified linear unit，ReLU）型激活函数时的通用近似定理. 其次，研究了四元数神经网络相较于实值神经网络的逼近优势. 针对分开激活的ReLU型激活函数，证明了单隐层实值神经网络需要约4倍参数量才能生成与单隐层四元数神经网络相同的最大凸线性区域数. 针对非分开激活的ReLU型激活函数，证明了单隐层四元数神经网络与单隐层实值神经网络间的逼近分离：四元数神经网络可用相同的隐层神经元数量与权重模长表示实值神经网络，而实值神经网络需要指数多个隐层神经元或指数大的参数才可能近似四元数神经网络. 最后，模拟实验验证了理论.

Abstract: Quaternion-valued neural networks extend real-valued neural networks to the algebra of quaternions. Quaternion-valued neural networks achieve higher accuracy or faster convergence than real-valued neural networks in some tasks, such as singular point compensation in polarimetric synthetic aperture, spoken language understanding, and radar robot control. The performance of quaternion-valued neural networks is widely supported by empirical studies, but there are few studies about theoretical properties of quaternion-valued neural networks, especially why quaternion-valued neural networks can be more efficient than real-valued neural networks. In this paper, we investigate theoretical properties of quaternion-valued neural networks and the advantages of quaternion-valued neural networks compared with real-valued neural networks from the perspective of approximation. Firstly, we prove the universal approximation of quaternion-valued neural networks with a non-split ReLU (rectified linear unit)-type activation function. Secondly, we demonstrate the approximation advantages of quaternion-valued neural networks compared with real-valued neural networks. For split ReLU-type activation functions, we show that one-hidden-layer real-valued neural networks need about 4 times the number of parameters to possess the same maximum number of convex linear regions as one-hidden-layer quaternion-valued neural networks. For the non-split ReLU-type activation function, we prove the approximation separation between one-hidden-layer quaternion-valued neural networks and one-hidden-layer real-valued neural networks, i.e., a quaternion-valued neural network can express a real-valued neural network using the same number of hidden neurons and the same parameter norm, while a real-valued neural network cannot approximate a quaternion-valued neural network unless the number of hidden neurons is exponentially large or the parameters are exponentially large. Finally, simulation experiments support our theoretical findings.

HTML全文

参考文献(34)

施引文献

资源附件(0)