A Multimodal Cross-Domain Sentiment Analysis Algorithm Based on Feature Disentanglement Meta-Optimization

Jia Xibin; Li Chen; Wang Luo; Zhang Muchen; Liu Xiaojian; Zhang Yangyang; Wen Jiakai

doi:10.7544/issn1000-1239.202440624

Journal of Computer Research and Development > 2025 > Uncorrected proof > DOI: 10.7544/issn1000-1239.202440624 CSTR: 32373.14.issn1000-1239.202440624

Jia Xibin, Li Chen, Wang Luo, Zhang Muchen, Liu Xiaojian, Zhang Yangyang, Wen Jiakai. A Multimodal Cross-Domain Sentiment Analysis Algorithm Based on Feature Disentanglement Meta-Optimization[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440624

Citation:

PDF (1242 KB)

A Multimodal Cross-Domain Sentiment Analysis Algorithm Based on Feature Disentanglement Meta-Optimization

1.
College of Computer, Beijing University of Technology, Beijing 100124
2.
Beijing Institute of Artificial Intelligence(Beijing University of Technology), Beijing 100124
3.
China Electronics Standardization Institute, Beijing 100007
4.
Guangxi Daring Technology Co., Ltd, Nanning 530006

Funds: This work was supported by the General Program of the National Natural Science Foundation of China (62476015) and the Cultivation Program for Excellent Achievements in Graduate Education and Teaching at Beijing University of Technology (GER202316).

More Information

Author Bio:
Jia Xibin: born in 1969. Professor. Member of distinguished CCF(29295D). Her main research interests include visual computing, multi-modality deep learning, affection computing, intelligent medical image, behaviour understanding and computing

Li Chen: born in 1998. Master. Her main research interests include multimodal sentiment and transfer learning

Wang Luo: born in 1990. PhD. Lecturer. Member of CCF(P4015M). His main research interests include computer vision and deep learning

Zhang Muchen: born in 2002. Master candidate. Her main research interests include multi-modality deep learning and computer vision

Liu Xiaojian: born in 1989. PhD. Senior engineer. Member of CCF(P2866M). His main research interests include software engineering and cybersecurity

Zhang Yangyang: born in 1976. Master. Professorial senior engineer. Member of CCF(E9145M). Her main research interests include Software and Systems Engineering, Standardization of IT

Wen Jiakai: born in 1979. Bachelor. Senior engineer. Member of CCF(C2990M). His main research interests include machine translation and big data
Received Date: July 19, 2024
Revised Date: January 14, 2025
Accepted Date: March 02, 2025
Available Online: March 02, 2025

Graphical Abstract

Abstract

Abstract

Multimodal sentiment analysis aims to utilize the multimodal customer comments and other data to identify users' sentimental tendencies. To realize cross-domain application with the domain bias, commonly used solutions are unsupervised domain adaptation methods. Nevertheless, this type of solutions focuses on the extraction of domain-invariant features, and it neglects the significance of domain-specific features at the target domain. Thus, a meta-optimization based domain-invariant and domain-specific feature disentanglement network is proposed. First, by embedding adapters into the pre-trained large model with fine-tuning fitting, the image-text fused sentiment feature encoder is accordingly constructed. Then, a feature disentanglement module is constructed on the basis of the factorization operation, which utilizes domain adversary and domain classification, together with collaborative independence constraints, respectively, to achieve knowledge-transferable domain-invariant feature embedding while extracting the domain-specific features to enhance the performance of sentiment classification at the target domain. To ensure the consistency of the overall optimization tendency for feature disentanglement and sentiment classification, a meta-learning-based meta-optimization training strategy is put forward to synergistically optimize the sentiment analysis network. Comparative experiments on bidirectional sentiment transfer tasks constructed by MVSA and Yelp datasets demonstrate that compared to other advanced image-text sentiment transfer algorithms, the proposed algorithm achieves superior performance on bidirectional sentiment transfer tasks in terms of three consensus metrics: Precision, Recall and F1 score.

FullText(HTML)

References (33)

References

[1]	Das R, Singh T D. Multimodal sentiment analysis: A survey of methods, trends, and challenges[J]. ACM Computing Surveys, 2023, 55(13s): 1−38
[2]	Chan J Y L, Bea K T, Leow S M H, et al. State of the art: A review of sentiment analysis based on sequential transfer learning[J]. Artificial Intelligence Review, 2023, 56(1): 749−780 doi: 10.1007/s10462-022-10183-8
[3]	Singhal P,Walambe R,Ramanna S,et al. Domain adaptation:Challenges,methods,datasets,and applications[J]. IEEE Access,2023,11:6973-7020(没有期
[4]	Azuma C, Ito T, Shimobaba T. Adversarial domain adaptation using contrastive learning[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106394 doi: 10.1016/j.engappai.2023.106394
[5]	Zhou Qianyu, Gu Qiqi, Pang Jiangmiao, et al. Self-adversarial disentangling for specific domain adaptation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(7): 8954−8968
[6]	Li Jingjing, Chen Erpeng, Ding Zhengming, et al. Maximum density divergence for domain adaptation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11): 3918−3930
[7]	Zhu Yongchun, Zhuang Fuzhen, Wang Jindong, et al. Deep subdomain adaptation network for image classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(4): 1713−1722
[8]	Zhao Han, Des Combes R T, Zhang Kun, et al. On learning invariant representations for domain adaptation[C]//Proc of the 36th Int Conf on Machine Learning. New York: PMLR, 2019: 7523−7532
[9]	Johansson F D, Sontag D, Ranganath R. Support and invertibility in domain-invariant representations[C]//Proc of the 22nd Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2019: 527−536
[10]	Zadeh A, Chen Minghai, Poria S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proc of the 2017 Conf on Empirical Methods in Natural Language Processing. New York: PMLR, 2017: 1103−1114
[11]	Truong Q T, Lauw H W. Vistanet: Visual aspect attention network for multimodal sentiment analysis[C]//Proc of the 33rd Association for the Advancement of Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 305−312
[12]	Gui Tao, Zhu Liang, Zhang Qi, et al. Cooperative multimodal approach to depression detection in twitter[C]//Proc of the 33rd Association for the Advancement of Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 110−117
[13]	Ling Yan, Yu Jianfei, Xia Rui. Vision-language pre-Training for multimodal aspect-based sentiment analysis[C]//Proc of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2022: 2149−2159
[14]	Ganin Y, Ustinova E, Ajakan H, et al. Domain-adversarial training of neural networks[J]. Journal of Machine Learning Research, 2016, 17(59): 1−35
[15]	Jia Xibin, Li Chen, Zeng Meng, et al. An improved unified domain adversarial category-wise alignment network for unsupervised cross-domain sentiment classification[J]. Engineering Applications of Artificial Intelligence, 2023, 126: 107108 doi: 10.1016/j.engappai.2023.107108
[16]	黄学坚,马廷淮,王根生. 基于样本内外协同表示和自适应融合的多模态学习方法[J]. 计算机研究与发展,2024,61(5):1310−1324 doi: 10.7544/issn1000-1239.202330722 Huang Xuejian, Ma Tinghuai, Wang Gensheng. Multimodal Learning Method Based on Intra- and Inter-Sample Cooperative Representation and Adaptive Fusion[J]. Journal of Computer Research and Development, 2024, 61(5): 1310−1324 (in Chinese) doi: 10.7544/issn1000-1239.202330722
[17]	Qi Fan, Yang Xiaoshan, Xu Changsheng. A unified framework for multimodal domain adaptation[C]//Proc of the 26th ACM Int Conf on Multimedia. New York, NY: ACM, 2018: 429−437
[18]	Ma Xinhong, Zhang Tianzhu, Xu Changsheng. Deep multi-modality adversarial networks for unsupervised domain adaptation[J]. IEEE Transactions on Multimedia, 2019, 21(9): 2419−2431 doi: 10.1109/TMM.2019.2902100
[19]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proc of the 31st Int Conf on Neural Information Processing Systems. New York: ACM, 2017: 6000−6010
[20]	Li Junnan, Selvaraju R, Gotmare A, et al. Align before fuse: Vision and language representation learning with momentum distillation[J]. Advances in Neural Information Processing Systems, 2021, 34: 9694−9705
[21]	Chen Shoufa, Ge Chongjian, Tong Zhan, et al. Adaptformer: Adapting vision transformers for scalable visual recognition[J]. Advances in Neural Information Processing Systems, 2022, 35: 16664−16678
[22]	Li Ya, Tian Xinmei, Gong Mingming, et al. Deep domain generalization via conditional invariant adversarial networks[C]//Proc of the 2018 European Conf on Computer Vision(ECCV). Berlin: Springer, 2018: 624−639
[23]	Bui M H, Tran T, Tran A, et al. Exploiting domain-specific features to enhance domain generalization[J]. Advances in Neural Information Processing Systems, 2021, 34: 21189−21201
[24]	Wei Guoqiang, Lan Cuiling, Zeng Wenjun, et al. Metaalign: Coordinating domain alignment and classification for unsupervised domain adaptation[C]//Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ: IEEE, 2021: 16643−16653
[25]	Niu Teng, Zhu Shiai, Pang Lei, et al. Sentiment analysis on multi-view social data[C]//Proc of the 22nd Int Conf on Multimedia Modeling. Berlin: Springer, 2016: 15−27
[26]	刘琦玮,李俊,顾蓓蓓,等. TSAIE:图像增强文本的多模态情感分析模型[J]. 数据与计算发展前沿,2022,4(3):131−140 Liu Qiwei, Li Jun, Gu Beibei, et al. TSAIE: Text sentiment analysis model based on image enhancement[J]. Frontiers of Data & Computing, 2022, 4(3): 131−140 (in Chinese)
[27]	Zhang Yuhao, Zhang Ying, Guo Wenya, et al. Learning disentangled representation for multimodal cross-domain sentiment analysis[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 34(10): 7956−7966
[28]	Baltrušaitis T, Ahuja C, Morency L P. Multimodal machine learning: A survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(2): 423−443
[29]	Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint, arXiv: 2010.11929, 2020
[30]	Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint, arXiv: 1810.04805, 2018
[31]	Li Junnan, Li Dongxu, Xiong Caiming, et al. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation[C]//Proc of the 39th Int Conf on Machine Learning. New York: PMLR, 2022: 12888−12900
[32]	Zhu Tong, Li Leida, Yang Jufeng, et al. Multimodal sentiment analysis with image-text interaction network[J]. IEEE Transactions on Multimedia, 2022, 25: 3375−3385
[33]	Li Jingzhe, Wang Chengji, Luo Zhiming, et al. Modality-dependent sentiments exploring for multi-modal sentiment classification[C]//Proc of the 2024 IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2024: 7930−7934

[1]	Jin Dongming, Jin Zhi, Chen Xiaohong, Wang Chunhui. ChatModeler: A Human-Machine Collaborative and Iterative Requirements Elicitation and Modeling Approach via Large Language Models[J]. Journal of Computer Research and Development, 2024, 61(2): 338-350. DOI: 10.7544/issn1000-1239.202330746
[2]	Wang Juanjuan, Wang Hongan. Multi-Agent Multi-Criticality Scheduling Based Self-Healing System of Power Grid[J]. Journal of Computer Research and Development, 2017, 54(4): 720-730. DOI: 10.7544/issn1000-1239.2017.20161026
[3]	He Wenbin, Liu Qunfeng, Xiong Jinzhi. The Error Theory of Polynomial Smoothing Functions for Support Vector Machines[J]. Journal of Computer Research and Development, 2016, 53(7): 1576-1585. DOI: 10.7544/issn1000-1239.2016.20148462
[4]	He Wangquan, Wei Di, Quan Jianxiao, Wu Wei, Qi Fengbin. Dynamic Task Scheduling Model and Fault-Tolerant via Queuing Theory[J]. Journal of Computer Research and Development, 2016, 53(6): 1271-1280. DOI: 10.7544/issn1000-1239.2016.20148445
[5]	Zhao Yu, Wang Yadi, Han Jihong, Fan Yudan, and Zhang Chao. A Formal Model for Cryptographic Protocols Based on Planning Theory[J]. Journal of Computer Research and Development, 2008, 45(9).
[6]	Shi Jin, Lu Yin, and Xie Li. Dynamic Intrusion Response Based on Game Theory[J]. Journal of Computer Research and Development, 2008, 45(5): 747-757.
[7]	Li Ye, Cai Yunze, Yin Rupo, Xu Xiaoming. Support Vector Machine Ensemble Based on Evidence Theory for Multi-Class Classification[J]. Journal of Computer Research and Development, 2008, 45(4): 571-578.
[8]	Lin Jianning, Wu Huizhong. Research on a Trust Model Based on the Subjective Logic Theory[J]. Journal of Computer Research and Development, 2007, 44(8): 1365-1370.
[9]	He Lijian and Zhang Wei. An Agent Organization Structure for Solving DCOP Based on the Partitions of Constraint Graph[J]. Journal of Computer Research and Development, 2007, 44(3).
[10]	Mu Kedian and Lin Zuoquan. Symbolic Dempster-Shafer Theory[J]. Journal of Computer Research and Development, 2005, 42(11): 1833-1842.