Cross-media web video hot topic detection has become a new research hotspot. However, there is less text information to describe video, which makes the space of text semantic features sparse, resulting in weak correlation between text semantic features, which increases the difficulty of mining hot topics. The existing methods mainly enrich the text semantic feature space through visual information. However, due to the heterogeneity between visual and text information, the semantic features of text and visual are quite different under the same topic. This further reduces the correlation strength between text semantics under the same topic, and also brings great challenges to cross-media hot topic detection based on web videos. Therefore, we propose a new cross-media semantic association enhancement method. Firstly, the core semantic features of the text from the word and sentence levels through double-layer attention are captured; Secondly, by understanding the visual content, a large number of text descriptions highly related to the video content are generated to enrich the text semantic space; Then, through text semantic similarity and visual semantic similarity, the text semantic map and visual semantic map are constructed, and the time decay function is constructed to establish the correlation between cross-media data from the time dimension, so as to enhance the correlation strength between text and visual semantics, and smoothly fuse the two semantic maps into a hybrid semantic map to realize cross-media semantic complementarity; Finally, hot topics are detected by graph clustering method. A large number of experimental results show that the proposed model is superior to the existing methods.