Research on content-based multimedia retrieval is motivated by a growing amount of digital multimedia content in which video data is a big part. Interaction and integration of multi-modality media types such as visual, audio and textual data in video are the essence of video content analysis. Although any uni-modality type partially expresses limited semantics less or more, video semantics are fully manifested only by interaction and integration of any unimodal. Video data comprises plentiful semantics, such as people, scene, object, event and story, etc. A great deal of research has been focused on utilizing multi-modality features for better understanding of video semantics. Proposed in this paper is a new approach to detect semantic concepts in video using co-occurrence data embedding (CODE), SimFusion, and locality preserving projections (LPP) from temporal associated co-occurring multimodal media data in video. The authors experiments show that by employing these key techniques, the performance of video semantic concept detection can be improved and better video semantics mining results can be obtained.