Citation: | Jia Xibin, Li Chen, Wang Luo, Zhang Muchen, Liu Xiaojian, Zhang Yangyang, Wen Jiakai. A Multimodal Cross-Domain Sentiment Analysis Algorithm Based on Feature Disentanglement Meta-Optimization[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440624 |
Multimodal sentiment analysis aims to utilize the multimodal customer comments and other data to identify users' sentimental tendencies. To realize cross-domain application with the domain bias, commonly used solutions are unsupervised domain adaptation methods. Nevertheless, this type of solutions focuses on the extraction of domain-invariant features, and it neglects the significance of domain-specific features at the target domain. Thus, a meta-optimization based domain-invariant and domain-specific feature disentanglement network is proposed. First, by embedding adapters into the pre-trained large model with fine-tuning fitting, the image-text fused sentiment feature encoder is accordingly constructed. Then, a feature disentanglement module is constructed on the basis of the factorization operation, which utilizes domain adversary and domain classification, together with collaborative independence constraints, respectively, to achieve knowledge-transferable domain-invariant feature embedding while extracting the domain-specific features to enhance the performance of sentiment classification at the target domain. To ensure the consistency of the overall optimization tendency for feature disentanglement and sentiment classification, a meta-learning-based meta-optimization training strategy is put forward to synergistically optimize the sentiment analysis network. Comparative experiments on bidirectional sentiment transfer tasks constructed by MVSA and Yelp datasets demonstrate that compared to other advanced image-text sentiment transfer algorithms, the proposed algorithm achieves superior performance on bidirectional sentiment transfer tasks in terms of three consensus metrics: Precision, Recall and F1 score.
[1] |
Das R, Singh T D. Multimodal sentiment analysis: A survey of methods, trends, and challenges[J]. ACM Computing Surveys, 2023, 55(13s): 1−38
|
[2] |
Chan J Y L, Bea K T, Leow S M H, et al. State of the art: A review of sentiment analysis based on sequential transfer learning[J]. Artificial Intelligence Review, 2023, 56(1): 749−780 doi: 10.1007/s10462-022-10183-8
|
[3] |
Singhal P,Walambe R,Ramanna S,et al. Domain adaptation:Challenges,methods,datasets,and applications[J]. IEEE Access,2023,11:6973-7020(没有期
|
[4] |
Azuma C, Ito T, Shimobaba T. Adversarial domain adaptation using contrastive learning[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106394 doi: 10.1016/j.engappai.2023.106394
|
[5] |
Zhou Qianyu, Gu Qiqi, Pang Jiangmiao, et al. Self-adversarial disentangling for specific domain adaptation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(7): 8954−8968
|
[6] |
Li Jingjing, Chen Erpeng, Ding Zhengming, et al. Maximum density divergence for domain adaptation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11): 3918−3930
|
[7] |
Zhu Yongchun, Zhuang Fuzhen, Wang Jindong, et al. Deep subdomain adaptation network for image classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(4): 1713−1722
|
[8] |
Zhao Han, Des Combes R T, Zhang Kun, et al. On learning invariant representations for domain adaptation[C]//Proc of the 36th Int Conf on Machine Learning. New York: PMLR, 2019: 7523−7532
|
[9] |
Johansson F D, Sontag D, Ranganath R. Support and invertibility in domain-invariant representations[C]//Proc of the 22nd Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2019: 527−536
|
[10] |
Zadeh A, Chen Minghai, Poria S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proc of the 2017 Conf on Empirical Methods in Natural Language Processing. New York: PMLR, 2017: 1103−1114
|
[11] |
Truong Q T, Lauw H W. Vistanet: Visual aspect attention network for multimodal sentiment analysis[C]//Proc of the 33rd Association for the Advancement of Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 305−312
|
[12] |
Gui Tao, Zhu Liang, Zhang Qi, et al. Cooperative multimodal approach to depression detection in twitter[C]//Proc of the 33rd Association for the Advancement of Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 110−117
|
[13] |
Ling Yan, Yu Jianfei, Xia Rui. Vision-language pre-Training for multimodal aspect-based sentiment analysis[C]//Proc of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2022: 2149−2159
|
[14] |
Ganin Y, Ustinova E, Ajakan H, et al. Domain-adversarial training of neural networks[J]. Journal of Machine Learning Research, 2016, 17(59): 1−35
|
[15] |
Jia Xibin, Li Chen, Zeng Meng, et al. An improved unified domain adversarial category-wise alignment network for unsupervised cross-domain sentiment classification[J]. Engineering Applications of Artificial Intelligence, 2023, 126: 107108 doi: 10.1016/j.engappai.2023.107108
|
[16] |
黄学坚,马廷淮,王根生. 基于样本内外协同表示和自适应融合的多模态学习方法[J]. 计算机研究与发展,2024,61(5):1310−1324 doi: 10.7544/issn1000-1239.202330722
Huang Xuejian, Ma Tinghuai, Wang Gensheng. Multimodal Learning Method Based on Intra- and Inter-Sample Cooperative Representation and Adaptive Fusion[J]. Journal of Computer Research and Development, 2024, 61(5): 1310−1324 (in Chinese) doi: 10.7544/issn1000-1239.202330722
|
[17] |
Qi Fan, Yang Xiaoshan, Xu Changsheng. A unified framework for multimodal domain adaptation[C]//Proc of the 26th ACM Int Conf on Multimedia. New York, NY: ACM, 2018: 429−437
|
[18] |
Ma Xinhong, Zhang Tianzhu, Xu Changsheng. Deep multi-modality adversarial networks for unsupervised domain adaptation[J]. IEEE Transactions on Multimedia, 2019, 21(9): 2419−2431 doi: 10.1109/TMM.2019.2902100
|
[19] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proc of the 31st Int Conf on Neural Information Processing Systems. New York: ACM, 2017: 6000−6010
|
[20] |
Li Junnan, Selvaraju R, Gotmare A, et al. Align before fuse: Vision and language representation learning with momentum distillation[J]. Advances in Neural Information Processing Systems, 2021, 34: 9694−9705
|
[21] |
Chen Shoufa, Ge Chongjian, Tong Zhan, et al. Adaptformer: Adapting vision transformers for scalable visual recognition[J]. Advances in Neural Information Processing Systems, 2022, 35: 16664−16678
|
[22] |
Li Ya, Tian Xinmei, Gong Mingming, et al. Deep domain generalization via conditional invariant adversarial networks[C]//Proc of the 2018 European Conf on Computer Vision(ECCV). Berlin: Springer, 2018: 624−639
|
[23] |
Bui M H, Tran T, Tran A, et al. Exploiting domain-specific features to enhance domain generalization[J]. Advances in Neural Information Processing Systems, 2021, 34: 21189−21201
|
[24] |
Wei Guoqiang, Lan Cuiling, Zeng Wenjun, et al. Metaalign: Coordinating domain alignment and classification for unsupervised domain adaptation[C]//Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ: IEEE, 2021: 16643−16653
|
[25] |
Niu Teng, Zhu Shiai, Pang Lei, et al. Sentiment analysis on multi-view social data[C]//Proc of the 22nd Int Conf on Multimedia Modeling. Berlin: Springer, 2016: 15−27
|
[26] |
刘琦玮,李俊,顾蓓蓓,等. TSAIE:图像增强文本的多模态情感分析模型[J]. 数据与计算发展前沿,2022,4(3):131−140
Liu Qiwei, Li Jun, Gu Beibei, et al. TSAIE: Text sentiment analysis model based on image enhancement[J]. Frontiers of Data & Computing, 2022, 4(3): 131−140 (in Chinese)
|
[27] |
Zhang Yuhao, Zhang Ying, Guo Wenya, et al. Learning disentangled representation for multimodal cross-domain sentiment analysis[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 34(10): 7956−7966
|
[28] |
Baltrušaitis T, Ahuja C, Morency L P. Multimodal machine learning: A survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(2): 423−443
|
[29] |
Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint, arXiv: 2010.11929, 2020
|
[30] |
Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint, arXiv: 1810.04805, 2018
|
[31] |
Li Junnan, Li Dongxu, Xiong Caiming, et al. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation[C]//Proc of the 39th Int Conf on Machine Learning. New York: PMLR, 2022: 12888−12900
|
[32] |
Zhu Tong, Li Leida, Yang Jufeng, et al. Multimodal sentiment analysis with image-text interaction network[J]. IEEE Transactions on Multimedia, 2022, 25: 3375−3385
|
[33] |
Li Jingzhe, Wang Chengji, Luo Zhiming, et al. Modality-dependent sentiments exploring for multi-modal sentiment classification[C]//Proc of the 2024 IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2024: 7930−7934
|