• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Jia Xibin, Li Chen, Wang Luo, Zhang Muchen, Liu Xiaojian, Zhang Yangyang, Wen Jiakai. A Multimodal Cross-Domain Sentiment Analysis Algorithm Based on Feature Disentanglement Meta-Optimization[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440624
Citation: Jia Xibin, Li Chen, Wang Luo, Zhang Muchen, Liu Xiaojian, Zhang Yangyang, Wen Jiakai. A Multimodal Cross-Domain Sentiment Analysis Algorithm Based on Feature Disentanglement Meta-Optimization[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440624

A Multimodal Cross-Domain Sentiment Analysis Algorithm Based on Feature Disentanglement Meta-Optimization

Funds: This work was supported by the General Program of the National Natural Science Foundation of China (62476015) and the Cultivation Program for Excellent Achievements in Graduate Education and Teaching at Beijing University of Technology (GER202316).
More Information
  • Author Bio:

    Jia Xibin: born in 1969. Professor. Member of distinguished CCF(29295D). Her main research interests include visual computing, multi-modality deep learning, affection computing, intelligent medical image, behaviour understanding and computing

    Li Chen: born in 1998. Master. Her main research interests include multimodal sentiment and transfer learning

    Wang Luo: born in 1990. PhD. Lecturer. Member of CCF(P4015M). His main research interests include computer vision and deep learning

    Zhang Muchen: born in 2002. Master candidate. Her main research interests include multi-modality deep learning and computer vision

    Liu Xiaojian: born in 1989. PhD. Senior engineer. Member of CCF(P2866M). His main research interests include software engineering and cybersecurity

    Zhang Yangyang: born in 1976. Master. Professorial senior engineer. Member of CCF(E9145M). Her main research interests include Software and Systems Engineering, Standardization of IT

    Wen Jiakai: born in 1979. Bachelor. Senior engineer. Member of CCF(C2990M). His main research interests include machine translation and big data

  • Received Date: July 19, 2024
  • Revised Date: January 14, 2025
  • Accepted Date: March 02, 2025
  • Available Online: March 02, 2025
  • Multimodal sentiment analysis aims to utilize the multimodal customer comments and other data to identify users' sentimental tendencies. To realize cross-domain application with the domain bias, commonly used solutions are unsupervised domain adaptation methods. Nevertheless, this type of solutions focuses on the extraction of domain-invariant features, and it neglects the significance of domain-specific features at the target domain. Thus, a meta-optimization based domain-invariant and domain-specific feature disentanglement network is proposed. First, by embedding adapters into the pre-trained large model with fine-tuning fitting, the image-text fused sentiment feature encoder is accordingly constructed. Then, a feature disentanglement module is constructed on the basis of the factorization operation, which utilizes domain adversary and domain classification, together with collaborative independence constraints, respectively, to achieve knowledge-transferable domain-invariant feature embedding while extracting the domain-specific features to enhance the performance of sentiment classification at the target domain. To ensure the consistency of the overall optimization tendency for feature disentanglement and sentiment classification, a meta-learning-based meta-optimization training strategy is put forward to synergistically optimize the sentiment analysis network. Comparative experiments on bidirectional sentiment transfer tasks constructed by MVSA and Yelp datasets demonstrate that compared to other advanced image-text sentiment transfer algorithms, the proposed algorithm achieves superior performance on bidirectional sentiment transfer tasks in terms of three consensus metrics: Precision, Recall and F1 score.

  • [1]
    Das R, Singh T D. Multimodal sentiment analysis: A survey of methods, trends, and challenges[J]. ACM Computing Surveys, 2023, 55(13s): 1−38
    [2]
    Chan J Y L, Bea K T, Leow S M H, et al. State of the art: A review of sentiment analysis based on sequential transfer learning[J]. Artificial Intelligence Review, 2023, 56(1): 749−780 doi: 10.1007/s10462-022-10183-8
    [3]
    Singhal P,Walambe R,Ramanna S,et al. Domain adaptation:Challenges,methods,datasets,and applications[J]. IEEE Access,2023,11:6973-7020(没有期
    [4]
    Azuma C, Ito T, Shimobaba T. Adversarial domain adaptation using contrastive learning[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106394 doi: 10.1016/j.engappai.2023.106394
    [5]
    Zhou Qianyu, Gu Qiqi, Pang Jiangmiao, et al. Self-adversarial disentangling for specific domain adaptation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(7): 8954−8968
    [6]
    Li Jingjing, Chen Erpeng, Ding Zhengming, et al. Maximum density divergence for domain adaptation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11): 3918−3930
    [7]
    Zhu Yongchun, Zhuang Fuzhen, Wang Jindong, et al. Deep subdomain adaptation network for image classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(4): 1713−1722
    [8]
    Zhao Han, Des Combes R T, Zhang Kun, et al. On learning invariant representations for domain adaptation[C]//Proc of the 36th Int Conf on Machine Learning. New York: PMLR, 2019: 7523−7532
    [9]
    Johansson F D, Sontag D, Ranganath R. Support and invertibility in domain-invariant representations[C]//Proc of the 22nd Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2019: 527−536
    [10]
    Zadeh A, Chen Minghai, Poria S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proc of the 2017 Conf on Empirical Methods in Natural Language Processing. New York: PMLR, 2017: 1103−1114
    [11]
    Truong Q T, Lauw H W. Vistanet: Visual aspect attention network for multimodal sentiment analysis[C]//Proc of the 33rd Association for the Advancement of Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 305−312
    [12]
    Gui Tao, Zhu Liang, Zhang Qi, et al. Cooperative multimodal approach to depression detection in twitter[C]//Proc of the 33rd Association for the Advancement of Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 110−117
    [13]
    Ling Yan, Yu Jianfei, Xia Rui. Vision-language pre-Training for multimodal aspect-based sentiment analysis[C]//Proc of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: ACL, 2022: 2149−2159
    [14]
    Ganin Y, Ustinova E, Ajakan H, et al. Domain-adversarial training of neural networks[J]. Journal of Machine Learning Research, 2016, 17(59): 1−35
    [15]
    Jia Xibin, Li Chen, Zeng Meng, et al. An improved unified domain adversarial category-wise alignment network for unsupervised cross-domain sentiment classification[J]. Engineering Applications of Artificial Intelligence, 2023, 126: 107108 doi: 10.1016/j.engappai.2023.107108
    [16]
    黄学坚,马廷淮,王根生. 基于样本内外协同表示和自适应融合的多模态学习方法[J]. 计算机研究与发展,2024,61(5):1310−1324 doi: 10.7544/issn1000-1239.202330722

    Huang Xuejian, Ma Tinghuai, Wang Gensheng. Multimodal Learning Method Based on Intra- and Inter-Sample Cooperative Representation and Adaptive Fusion[J]. Journal of Computer Research and Development, 2024, 61(5): 1310−1324 (in Chinese) doi: 10.7544/issn1000-1239.202330722
    [17]
    Qi Fan, Yang Xiaoshan, Xu Changsheng. A unified framework for multimodal domain adaptation[C]//Proc of the 26th ACM Int Conf on Multimedia. New York, NY: ACM, 2018: 429−437
    [18]
    Ma Xinhong, Zhang Tianzhu, Xu Changsheng. Deep multi-modality adversarial networks for unsupervised domain adaptation[J]. IEEE Transactions on Multimedia, 2019, 21(9): 2419−2431 doi: 10.1109/TMM.2019.2902100
    [19]
    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proc of the 31st Int Conf on Neural Information Processing Systems. New York: ACM, 2017: 6000−6010
    [20]
    Li Junnan, Selvaraju R, Gotmare A, et al. Align before fuse: Vision and language representation learning with momentum distillation[J]. Advances in Neural Information Processing Systems, 2021, 34: 9694−9705
    [21]
    Chen Shoufa, Ge Chongjian, Tong Zhan, et al. Adaptformer: Adapting vision transformers for scalable visual recognition[J]. Advances in Neural Information Processing Systems, 2022, 35: 16664−16678
    [22]
    Li Ya, Tian Xinmei, Gong Mingming, et al. Deep domain generalization via conditional invariant adversarial networks[C]//Proc of the 2018 European Conf on Computer Vision(ECCV). Berlin: Springer, 2018: 624−639
    [23]
    Bui M H, Tran T, Tran A, et al. Exploiting domain-specific features to enhance domain generalization[J]. Advances in Neural Information Processing Systems, 2021, 34: 21189−21201
    [24]
    Wei Guoqiang, Lan Cuiling, Zeng Wenjun, et al. Metaalign: Coordinating domain alignment and classification for unsupervised domain adaptation[C]//Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ: IEEE, 2021: 16643−16653
    [25]
    Niu Teng, Zhu Shiai, Pang Lei, et al. Sentiment analysis on multi-view social data[C]//Proc of the 22nd Int Conf on Multimedia Modeling. Berlin: Springer, 2016: 15−27
    [26]
    刘琦玮,李俊,顾蓓蓓,等. TSAIE:图像增强文本的多模态情感分析模型[J]. 数据与计算发展前沿,2022,4(3):131−140

    Liu Qiwei, Li Jun, Gu Beibei, et al. TSAIE: Text sentiment analysis model based on image enhancement[J]. Frontiers of Data & Computing, 2022, 4(3): 131−140 (in Chinese)
    [27]
    Zhang Yuhao, Zhang Ying, Guo Wenya, et al. Learning disentangled representation for multimodal cross-domain sentiment analysis[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 34(10): 7956−7966
    [28]
    Baltrušaitis T, Ahuja C, Morency L P. Multimodal machine learning: A survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(2): 423−443
    [29]
    Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint, arXiv: 2010.11929, 2020
    [30]
    Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint, arXiv: 1810.04805, 2018
    [31]
    Li Junnan, Li Dongxu, Xiong Caiming, et al. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation[C]//Proc of the 39th Int Conf on Machine Learning. New York: PMLR, 2022: 12888−12900
    [32]
    Zhu Tong, Li Leida, Yang Jufeng, et al. Multimodal sentiment analysis with image-text interaction network[J]. IEEE Transactions on Multimedia, 2022, 25: 3375−3385
    [33]
    Li Jingzhe, Wang Chengji, Luo Zhiming, et al. Modality-dependent sentiments exploring for multi-modal sentiment classification[C]//Proc of the 2024 IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2024: 7930−7934

Catalog

    Article views (17) PDF downloads (6) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return