Aspect-term extraction and aspect-level sentiment classification extract aspect-sentiment pairs in the sentence. They help social media platforms such as Twitter and Facebook to mine users' sentiments of different aspects, which is of great significance to personalized recommendation. In the field of multimodality, the existing methods use two independent models to complete two subtasks respectively. The former identifies goods, important people and other entities or entities’ aspects in the sentence, and the latter predicts the user's sentiment orientation according to the given aspect terms. There are two problems in the above method: first, using two independent models loses the continuity of the underlying features between the two tasks, and cannot model the potential semantic association of sentences; second, aspect-level sentiment classification can only predict the sentiment of one aspect at a time, which does not match the throughput of aspect-term extraction model that extracts multiple aspects simultaneously, and the serial execution of the two models makes the efficiency of extracting aspect-sentiment pairs low. To solve the above problems, a unified framework for multimodal aspect-term extraction and aspect-level sentiment classification is proposed in this paper. Firstly, the shared feature module is built to realize the latent semantic association modeling between tasks, and it makes the two tasks only need to care about their upper network, which reduces the complexity of the model. Secondly, multiple aspects and their corresponding sentiment categories in the sentence are output at the same time by using sequence tagging, which improves the extraction efficiency of aspect-sentiment pairs. In addition, we introduce part of speech in two tasks at the same time: using the grammatical information to improve the performance of aspect-term extraction, and the information of opinion words is obtained through part of speech to improve the performance of aspect-level sentiment classification. The experimental results show that the unified model has superior performance compared with multiple baseline models on the two benchmark datasets of Twitter2015 and Restaurant2014.