Citation: | Wu Jinhui, Jiang Yuan. Universal Approximation and Approximation Advantages of Quaternion-Valued Neural Networks[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440410 |
Quaternion-valued neural networks extend real-valued neural networks to the algebra of quaternions. Quaternion-valued neural networks achieve higher accuracy or faster convergence than real-valued neural networks in some tasks, such as singular point compensation in polarimetric synthetic aperture, spoken language understanding, and radar robot control. The performance of quaternion-valued neural networks is widely supported by empirical studies, but there are few studies about theoretical properties of quaternion-valued neural networks, especially why quaternion-valued neural networks can be more efficient than real-valued neural networks. In this paper, we investigate theoretical properties of quaternion-valued neural networks and the advantages of quaternion-valued neural networks compared with real-valued neural networks from the perspective of approximation. Firstly, we prove the universal approximation of quaternion-valued neural networks with a non-split ReLU (rectified linear unit)-type activation function. Secondly, we demonstrate the approximation advantages of quaternion-valued neural networks compared with real-valued neural networks. For split ReLU-type activation functions, we show that one-hidden-layer real-valued neural networks need about 4 times the number of parameters to possess the same maximum number of convex linear regions as one-hidden-layer quaternion-valued neural networks. For the non-split ReLU-type activation function, we prove the approximation separation between one-hidden-layer quaternion-valued neural networks and one-hidden-layer real-valued neural networks, i.e., a quaternion-valued neural network can express a real-valued neural network using the same number of hidden neurons and the same parameter norm, while a real-valued neural network cannot approximate a quaternion-valued neural network unless the number of hidden neurons is exponentially large or the parameters are exponentially large. Finally, simulation experiments support our theoretical findings.
[1] |
Oyama K, Hirose A. Phasor quaternion neural networks for singular point compensation in polarimetric-interferometric synthetic aperture radar[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 57(5): 2510−2519
|
[2] |
Parcollet T, Morchid M, Linares G. Deep quaternion neural networks for spoken language understanding [C] //Proc of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop. Piscataway, NJ: IEEE, 2017: 504−511
|
[3] |
武越,苑咏哲,岳铭煜,等. 点云配准中多维度信息融合的特征挖掘方法[J]. 计算机研究与发展,2022,59(8):1732−1741
Wu Yue, Yuan Yongzhe, Yue Mingyu, et al. Feature mining method of multi-dimensional information fusion in point cloud registration[J]. Journal of Computer Research and Development, 2022, 59(8): 1732−1741 (in Chinese)
|
[4] |
Bayro-Corrochano E, Lechuga-Gutiérrez L, Garza-Burgos M. Geometric techniques for robotics and HMI: Interpolation and haptics in conformal geometric algebra and control using quaternion spike neural networks[J]. Robotics and Autonomous Systems, 2018, 104: 72−84 doi: 10.1016/j.robot.2018.02.015
|
[5] |
Parcollet T, Ravanelli M, Morchid M, et al. Quaternion recurrent neural networks [C/OL] //Proc of the 7th Int Conf on Learning Representations. 2019 [2024-07-14]. https://openreview.net/pdf?id=ByMHvs0cFQ
|
[6] |
Shoemake K. Animating rotation with quaternion curves [C] //Proc of the 12th Annual Conf on Computer Graphics and Interactive Techniques. New York: ACM, 1985: 245−254
|
[7] |
Parcollet T, Morchid M, Linarès G. A survey of quaternion neural networks[J]. Artificial Intelligence Review, 2020, 53(4): 2957−2982 doi: 10.1007/s10462-019-09752-1
|
[8] |
Arena P, Fortuna L, Muscato G, et al. Multilayer perceptrons to approximate quaternion valued functions[J]. Neural Networks, 1997, 10(2): 335−342 doi: 10.1016/S0893-6080(96)00048-2
|
[9] |
Valle M E, Vital W L, Vieira G. Universal approximation theorem for vector- and hypercomplex-valued neural networks [J]. arXiv preprint, arXiv: 2401.02277, 2014
|
[10] |
Ujang B C, Took C C, Mandic D P. Quaternion-valued nonlinear adaptive filtering[J]. IEEE Transactions on Neural Networks, 2011, 22(8): 1193−1206 doi: 10.1109/TNN.2011.2157358
|
[11] |
Cybenko G. Approximation by superpositions of a sigmoidal function[J]. Mathematics of Control, Signals and Systems, 1989, 2(4): 303−314 doi: 10.1007/BF02551274
|
[12] |
Funahashi K I. On the approximate realization of continuous mappings by neural networks[J]. Neural Networks, 1989, 2(3): 183−192 doi: 10.1016/0893-6080(89)90003-8
|
[13] |
Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators[J]. Neural Networks, 1989, 2(5): 359−366 doi: 10.1016/0893-6080(89)90020-8
|
[14] |
Leshno M, Lin V Y, Pinkus A, et al. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function[J]. Neural Networks, 1993, 6(6): 861−867 doi: 10.1016/S0893-6080(05)80131-5
|
[15] |
Seidl D R, Lorenz R D. A structure by which a recurrent neural network can approximate a nonlinear dynamic system [C] //Proc of the 1991 Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 1991: 709−714
|
[16] |
Funahashi K I, Nakamura Y. Approximation of dynamical systems by continuous time recurrent neural networks[J]. Neural Networks, 1993, 6(6): 801−806 doi: 10.1016/S0893-6080(05)80125-X
|
[17] |
Chow T W, Li Xiaodong. Modeling of continuous time dynamical systems with input by recurrent neural networks[J]. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 2000, 47(4): 575−578 doi: 10.1109/81.841860
|
[18] |
Li Xiaodong, Ho J K, Chow T W. Approximation of dynamical time-variant systems by continuous-time recurrent neural networks[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2005, 52(10): 656−660 doi: 10.1109/TCSII.2005.852006
|
[19] |
Schäfer A M, Zimmermann H G. Recurrent neural networks are universal approximators [C] //Proc of the 16th Int Conf of Artificial Neural Networks. Berlin: Springer, 2006: 632−640
|
[20] |
Zhou Dingxuan. Universality of deep convolutional neural networks[J]. Applied and Computational Harmonic Analysis, 2020, 48(2): 787−794 doi: 10.1016/j.acha.2019.06.004
|
[21] |
Arena P, Fortuna L, Re R, et al. On the capability of neural networks with complex neurons in complex valued functions approximation [C] //Proc of the 1993 IEEE Int Symp on Circuits and Systems. Piscataway, NJ: IEEE, 1993: 2168−2171
|
[22] |
Voigtlaender F. The universal approximation theorem for complex-valued neural networks[J]. Applied and Computational Harmonic Analysis, 2023, 64: 33−61 doi: 10.1016/j.acha.2022.12.002
|
[23] |
Barron A R. Approximation and estimation bounds for artificial neural networks[J]. Machine Learning, 1994, 14(1): 115−133
|
[24] |
Arora R, Basu A, Mianjy P, et al. Understanding deep neural networks with rectified linear units [C/OL] //Proc of the 6th Int Conf on Learning Representations. 2018 [2024-07-14]. https://openreview.net/pdf?id=B1J_rgWRW
|
[25] |
Montufar G F, Pascanu R, Cho K, et al. On the number of linear regions of deep neural networks [C] //Advances in Neural Information Processing Systems 27. Cambridge, MA: MIT, 2014: 2924−2932
|
[26] |
Goujon A, Etemadi A, Unser M. On the number of regions of piecewise linear neural networks[J]. Journal of Computational and Applied Mathematics, 2024, 441: 115667 doi: 10.1016/j.cam.2023.115667
|
[27] |
Eldan R, Shamir O. The power of depth for feedforward neural networks [C] //Proc of the 29th Conf on Learning Theory. NewYork: PMLR, 2016: 907−940
|
[28] |
Telgarsky M. Benefits of depth in neural networks [C] //Proc of the 29th Conf on Learning Theory. New York: PMLR, 2016: 1517−1539
|
[29] |
Zhang Shaoqun, Gao Wei, Zhou Zhihua. Towards understanding theoretical advantages of complex-reaction networks[J]. Neural Networks, 2022, 151: 80−93 doi: 10.1016/j.neunet.2022.03.024
|
[30] |
Wu Jinhui, Zhang Shaoqun, Jiang Yuan, et al. Theoretical exploration of flexible transmitter model[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 35(3): 3674−3688
|
[31] |
Fukushima K. Visual feature extraction by a multilayered network of analog threshold elements[J]. IEEE Transactions on Systems Science and Cybernetics, 1969, 5(4): 322−333 doi: 10.1109/TSSC.1969.300225
|
[32] |
Maas A L, Hannun A Y, Ng A Y. Rectifier nonlinearities improve neural network acoustic models [C/OL] //Proc of the 2013 ICML Workshop on Deep Learning for Audio, Speech, and Language Processing. 2013 [2024-07-15]. http://robotics.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf
|
[33] |
He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification [C] //Proc of the 2015 IEEE Int Conf on Computer Vision. Piscataway , NJ: IEEE, 2015: 1026−1034
|
[34] |
Zaslavsky T. Facing up to Arrangements: Face-count Formulas for Partitions of Space by Hyperplanes [M]. Providence, RI: American Mathematical Society, 1975
|
[1] | Shen Yuan, Song Wei, Zhao Changsheng, Peng Zhiyong. A Cross-Domain Ciphertext Sharing Scheme Supporting Access Behavior Identity Tracing[J]. Journal of Computer Research and Development, 2024, 61(7): 1611-1628. DOI: 10.7544/issn1000-1239.202330618 |
[2] | Tang Yongli, Li Yuanhong, Zhang Xiaohang, Ye Qing. Identity-Based Group Signatures Scheme on Lattice[J]. Journal of Computer Research and Development, 2022, 59(12): 2723-2734. DOI: 10.7544/issn1000-1239.20210930 |
[3] | Li Jianmin, Yu Huifang, Xie Yong. ElGamal Broadcasting Multi-Signcryption Protocol with UC Security[J]. Journal of Computer Research and Development, 2019, 56(5): 1101-1111. DOI: 10.7544/issn1000-1239.2019.20180130 |
[4] | Wang Ziyu, Liu Jianwei, Zhang Zongyang, Yu Hui. Full Anonymous Blockchain Based on Aggregate Signature and Confidential Transaction[J]. Journal of Computer Research and Development, 2018, 55(10): 2185-2198. DOI: 10.7544/issn1000-1239.2018.20180430 |
[5] | Wu Libing, Zhang Yubo, He Debiao. Dual Server Identity-Based Encryption with Equality Test for Cloud Computing[J]. Journal of Computer Research and Development, 2017, 54(10): 2232-2243. DOI: 10.7544/issn1000-1239.2017.20170446 |
[6] | Xiao Siyu, Ge Aijun, Ma Chuangui. Decentralized Attribute-Based Encryption Scheme with Constant-Size Ciphertexts[J]. Journal of Computer Research and Development, 2016, 53(10): 2207-2215. DOI: 10.7544/issn1000-1239.2016.20160459 |
[7] | Li Huixian, Chen Xubao, Ju Longfei, Pang Liaojun, Wang Yumin. Improved Multi-Receiver Signcryption Scheme[J]. Journal of Computer Research and Development, 2013, 50(7): 1418-1425. |
[8] | Zhu Hui, Li Hui, and Wang Yumin. Certificateless Signcryption Scheme Without Pairing[J]. Journal of Computer Research and Development, 2010, 47(9): 1587-1594. |
[9] | Hu Liang, Liu Zheli, Sun Tao, Liu Fang. Survey of Security on Identity-Based Cryptography[J]. Journal of Computer Research and Development, 2009, 46(9): 1537-1548. |
[10] | Lai Xin, Huang Xiaofang, He Dake. An ID-Based Efficient Signcryption Key Encapsulation Scheme[J]. Journal of Computer Research and Development, 2009, 46(5): 857-863. |