Abstract:
The recognition of negative sentiment intensity in financial public opinion on social media is of great significance for financial risk prevention and market stability. Given the prevalence of multimodal information combining text and images in financial sentiment expression, and the need for further exploration of structural correlations between modalities, we propose a multimodal negative sentiment intensity recognition model based on the collaborative enhancement of graph convolutional networks and stacking ensemble learning. The model first employs the CLIP model to extract semantic features from text and images. Subsequently, a multimodal graph structure is constructed and graph convolutional networks are introduced to achieve structural association embedding of features, thereby enhancing the ability to capture potential inter-modal correlations. Finally, a two-layer stacking ensemble learning framework is built to optimize the decision performance of sentiment intensity recognition by integrating the prediction results of multiple diverse base classifiers. Experimental results show that under the optimal configuration, the model achieves an
F1-score of 83.57% and an AUC of 95.05%, outperforming the optimal single-text model, the optimal single-image model, and the original CLIP multimodal model by 2.48%, 5.27%, and 1.87% in
F1-score, respectively. These results verify the effectiveness of the proposed method in mining modal correlations and improving recognition performance, providing technical support for intelligent financial public opinion analysis and financial risk monitoring.