Abstract:
With the development of Internet, crowdsourcing short texts such as time-sync comments for videos are of significant importance for online media sharing platforms and leisure industry. It also provides a new research opportunity for the evolution of recommender system, artificial intelligence and so on, which have tremendous values for every walk of life. At the same time, there are many challenges for crowdsourcing short text analysis, because of its high noise, non-standard expressions and latent semantic implication. These have limited the application of traditional natural language processing (NLP) techniques, thus it needs a novel short text understanding method which is of high fault tolerance, and can capture the deep semantics. To this end, this paper proposes a deep semantic representation model based on recurrent neural network (RNN). It can avoid the effect of noise on text segmentation by exploiting the character-based RNN. To achieve the semantic representation, we apply the neural network to represent the latent semantics such that the outputted semantic vectors can deeply reflect the time-sync comments. Then we further design a time-sync comment explanation framework based on semantic retrieval, used for the validation of semantic representation. Finally, we compare them with others baselines, and apply many measures to validate the proposed model. The experimental results show that model can capture the semantics in these short texts more precisely, and the assumptions related to time-sync comments are reasonable.