Universal Approximation and Approximation Advantages of Quaternion-Valued Neural Networks

Wu Jinhui; Jiang Yuan

doi:10.7544/issn1000-1239.202440410

Journal of Computer Research and Development > 2025 > 62(5): 1205-1215. > DOI: 10.7544/issn1000-1239.202440410 CSTR: 32373.14.issn1000-1239.202440410

Wu Jinhui, Jiang Yuan. Universal Approximation and Approximation Advantages of Quaternion-Valued Neural Networks[J]. Journal of Computer Research and Development, 2025, 62(5): 1205-1215. DOI: 10.7544/issn1000-1239.202440410

Citation:

PDF (1993 KB)

Universal Approximation and Approximation Advantages of Quaternion-Valued Neural Networks

Wu Jinhui,
Jiang Yuan^,

National Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023
School of Artificial Intelligence, Nanjing University, Nanjing 210023

Funds: This work was supported by the National Natural Science Foundation of China (62176117) and the Program for Outstanding PhD Candidates of Nanjing University (202401A13).

More Information

Author Bio:
Wu Jinhui: born in 1998. PhD candidate. His main research interests include neural network theories and machine learning

Jiang Yuan: born in 1976. PhD, professor, PhD supervisor. Her main research interests include artificial intelligence, machine learning, and intelligent medical applications
Received Date: May 30, 2024
Revised Date: October 07, 2024
Accepted Date: October 15, 2024
Available Online: October 15, 2024

Graphical Abstract

Abstract

Abstract

Quaternion-valued neural networks extend real-valued neural networks to the algebra of quaternions. Quaternion-valued neural networks achieve higher accuracy or faster convergence than real-valued neural networks in some tasks, such as singular point compensation in polarimetric synthetic aperture, spoken language understanding, and radar robot control. The performance of quaternion-valued neural networks is widely supported by empirical studies, but there are few studies about theoretical properties of quaternion-valued neural networks, especially why quaternion-valued neural networks can be more efficient than real-valued neural networks. In this paper, we investigate theoretical properties of quaternion-valued neural networks and the advantages of quaternion-valued neural networks compared with real-valued neural networks from the perspective of approximation. Firstly, we prove the universal approximation of quaternion-valued neural networks with a non-split ReLU (rectified linear unit)-type activation function. Secondly, we demonstrate the approximation advantages of quaternion-valued neural networks compared with real-valued neural networks. For split ReLU-type activation functions, we show that one-hidden-layer real-valued neural networks need about 4 times the number of parameters to possess the same maximum number of convex linear regions as one-hidden-layer quaternion-valued neural networks. For the non-split ReLU-type activation function, we prove the approximation separation between one-hidden-layer quaternion-valued neural networks and one-hidden-layer real-valued neural networks, i.e., a quaternion-valued neural network can express a real-valued neural network using the same number of hidden neurons and the same parameter norm, while a real-valued neural network cannot approximate a quaternion-valued neural network unless the number of hidden neurons is exponentially large or the parameters are exponentially large. Finally, simulation experiments support our theoretical findings.

FullText(HTML)

References (34)

References

[1]	Oyama K, Hirose A. Phasor quaternion neural networks for singular point compensation in polarimetric-interferometric synthetic aperture radar[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 57(5): 2510−2519
[2]	Parcollet T, Morchid M, Linares G. Deep quaternion neural networks for spoken language understanding [C] //Proc of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop. Piscataway, NJ: IEEE, 2017: 504−511
[3]	武越,苑咏哲,岳铭煜,等. 点云配准中多维度信息融合的特征挖掘方法[J]. 计算机研究与发展,2022,59(8):1732−1741 Wu Yue, Yuan Yongzhe, Yue Mingyu, et al. Feature mining method of multi-dimensional information fusion in point cloud registration[J]. Journal of Computer Research and Development, 2022, 59(8): 1732−1741 (in Chinese)
[4]	Bayro-Corrochano E, Lechuga-Gutiérrez L, Garza-Burgos M. Geometric techniques for robotics and HMI: Interpolation and haptics in conformal geometric algebra and control using quaternion spike neural networks[J]. Robotics and Autonomous Systems, 2018, 104: 72−84 doi: 10.1016/j.robot.2018.02.015
[5]	Parcollet T, Ravanelli M, Morchid M, et al. Quaternion recurrent neural networks [C/OL] //Proc of the 7th Int Conf on Learning Representations. 2019 [2024-07-14]. https://openreview.net/pdf?id=ByMHvs0cFQ
[6]	Shoemake K. Animating rotation with quaternion curves [C] //Proc of the 12th Annual Conf on Computer Graphics and Interactive Techniques. New York: ACM, 1985: 245−254
[7]	Parcollet T, Morchid M, Linarès G. A survey of quaternion neural networks[J]. Artificial Intelligence Review, 2020, 53(4): 2957−2982 doi: 10.1007/s10462-019-09752-1
[8]	Arena P, Fortuna L, Muscato G, et al. Multilayer perceptrons to approximate quaternion valued functions[J]. Neural Networks, 1997, 10(2): 335−342 doi: 10.1016/S0893-6080(96)00048-2
[9]	Valle M E, Vital W L, Vieira G. Universal approximation theorem for vector- and hypercomplex-valued neural networks [J]. arXiv preprint, arXiv: 2401.02277, 2014
[10]	Ujang B C, Took C C, Mandic D P. Quaternion-valued nonlinear adaptive filtering[J]. IEEE Transactions on Neural Networks, 2011, 22(8): 1193−1206 doi: 10.1109/TNN.2011.2157358
[11]	Cybenko G. Approximation by superpositions of a sigmoidal function[J]. Mathematics of Control, Signals and Systems, 1989, 2(4): 303−314 doi: 10.1007/BF02551274
[12]	Funahashi K I. On the approximate realization of continuous mappings by neural networks[J]. Neural Networks, 1989, 2(3): 183−192 doi: 10.1016/0893-6080(89)90003-8
[13]	Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators[J]. Neural Networks, 1989, 2(5): 359−366 doi: 10.1016/0893-6080(89)90020-8
[14]	Leshno M, Lin V Y, Pinkus A, et al. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function[J]. Neural Networks, 1993, 6(6): 861−867 doi: 10.1016/S0893-6080(05)80131-5
[15]	Seidl D R, Lorenz R D. A structure by which a recurrent neural network can approximate a nonlinear dynamic system [C] //Proc of the 1991 Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 1991: 709−714
[16]	Funahashi K I, Nakamura Y. Approximation of dynamical systems by continuous time recurrent neural networks[J]. Neural Networks, 1993, 6(6): 801−806 doi: 10.1016/S0893-6080(05)80125-X
[17]	Chow T W, Li Xiaodong. Modeling of continuous time dynamical systems with input by recurrent neural networks[J]. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 2000, 47(4): 575−578 doi: 10.1109/81.841860
[18]	Li Xiaodong, Ho J K, Chow T W. Approximation of dynamical time-variant systems by continuous-time recurrent neural networks[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2005, 52(10): 656−660 doi: 10.1109/TCSII.2005.852006
[19]	Schäfer A M, Zimmermann H G. Recurrent neural networks are universal approximators [C] //Proc of the 16th Int Conf of Artificial Neural Networks. Berlin: Springer, 2006: 632−640
[20]	Zhou Dingxuan. Universality of deep convolutional neural networks[J]. Applied and Computational Harmonic Analysis, 2020, 48(2): 787−794 doi: 10.1016/j.acha.2019.06.004
[21]	Arena P, Fortuna L, Re R, et al. On the capability of neural networks with complex neurons in complex valued functions approximation [C] //Proc of the 1993 IEEE Int Symp on Circuits and Systems. Piscataway, NJ: IEEE, 1993: 2168−2171
[22]	Voigtlaender F. The universal approximation theorem for complex-valued neural networks[J]. Applied and Computational Harmonic Analysis, 2023, 64: 33−61 doi: 10.1016/j.acha.2022.12.002
[23]	Barron A R. Approximation and estimation bounds for artificial neural networks[J]. Machine Learning, 1994, 14(1): 115−133
[24]	Arora R, Basu A, Mianjy P, et al. Understanding deep neural networks with rectified linear units [C/OL] //Proc of the 6th Int Conf on Learning Representations. 2018 [2024-07-14]. https://openreview.net/pdf?id=B1J_rgWRW
[25]	Montufar G F, Pascanu R, Cho K, et al. On the number of linear regions of deep neural networks [C] //Advances in Neural Information Processing Systems 27. Cambridge, MA: MIT, 2014: 2924−2932
[26]	Goujon A, Etemadi A, Unser M. On the number of regions of piecewise linear neural networks[J]. Journal of Computational and Applied Mathematics, 2024, 441: 115667 doi: 10.1016/j.cam.2023.115667
[27]	Eldan R, Shamir O. The power of depth for feedforward neural networks [C] //Proc of the 29th Conf on Learning Theory. NewYork: PMLR, 2016: 907−940
[28]	Telgarsky M. Benefits of depth in neural networks [C] //Proc of the 29th Conf on Learning Theory. New York: PMLR, 2016: 1517−1539
[29]	Zhang Shaoqun, Gao Wei, Zhou Zhihua. Towards understanding theoretical advantages of complex-reaction networks[J]. Neural Networks, 2022, 151: 80−93 doi: 10.1016/j.neunet.2022.03.024
[30]	Wu Jinhui, Zhang Shaoqun, Jiang Yuan, et al. Theoretical exploration of flexible transmitter model[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 35(3): 3674−3688
[31]	Fukushima K. Visual feature extraction by a multilayered network of analog threshold elements[J]. IEEE Transactions on Systems Science and Cybernetics, 1969, 5(4): 322−333 doi: 10.1109/TSSC.1969.300225
[32]	Maas A L, Hannun A Y, Ng A Y. Rectifier nonlinearities improve neural network acoustic models [C/OL] //Proc of the 2013 ICML Workshop on Deep Learning for Audio, Speech, and Language Processing. 2013 [2024-07-15]. http://robotics.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf
[33]	He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification [C] //Proc of the 2015 IEEE Int Conf on Computer Vision. Piscataway , NJ: IEEE, 2015: 1026−1034
[34]	Zaslavsky T. Facing up to Arrangements: Face-count Formulas for Partitions of Space by Hyperplanes [M]. Providence, RI: American Mathematical Society, 1975

[1]	Jin Dongming, Jin Zhi, Chen Xiaohong, Wang Chunhui. ChatModeler: A Human-Machine Collaborative and Iterative Requirements Elicitation and Modeling Approach via Large Language Models[J]. Journal of Computer Research and Development, 2024, 61(2): 338-350. DOI: 10.7544/issn1000-1239.202330746
[2]	Wang Juanjuan, Wang Hongan. Multi-Agent Multi-Criticality Scheduling Based Self-Healing System of Power Grid[J]. Journal of Computer Research and Development, 2017, 54(4): 720-730. DOI: 10.7544/issn1000-1239.2017.20161026
[3]	He Wenbin, Liu Qunfeng, Xiong Jinzhi. The Error Theory of Polynomial Smoothing Functions for Support Vector Machines[J]. Journal of Computer Research and Development, 2016, 53(7): 1576-1585. DOI: 10.7544/issn1000-1239.2016.20148462
[4]	He Wangquan, Wei Di, Quan Jianxiao, Wu Wei, Qi Fengbin. Dynamic Task Scheduling Model and Fault-Tolerant via Queuing Theory[J]. Journal of Computer Research and Development, 2016, 53(6): 1271-1280. DOI: 10.7544/issn1000-1239.2016.20148445
[5]	Zhao Yu, Wang Yadi, Han Jihong, Fan Yudan, and Zhang Chao. A Formal Model for Cryptographic Protocols Based on Planning Theory[J]. Journal of Computer Research and Development, 2008, 45(9).
[6]	Shi Jin, Lu Yin, and Xie Li. Dynamic Intrusion Response Based on Game Theory[J]. Journal of Computer Research and Development, 2008, 45(5): 747-757.
[7]	Li Ye, Cai Yunze, Yin Rupo, Xu Xiaoming. Support Vector Machine Ensemble Based on Evidence Theory for Multi-Class Classification[J]. Journal of Computer Research and Development, 2008, 45(4): 571-578.
[8]	Lin Jianning, Wu Huizhong. Research on a Trust Model Based on the Subjective Logic Theory[J]. Journal of Computer Research and Development, 2007, 44(8): 1365-1370.
[9]	He Lijian and Zhang Wei. An Agent Organization Structure for Solving DCOP Based on the Partitions of Constraint Graph[J]. Journal of Computer Research and Development, 2007, 44(3).
[10]	Mu Kedian and Lin Zuoquan. Symbolic Dempster-Shafer Theory[J]. Journal of Computer Research and Development, 2005, 42(11): 1833-1842.