Advanced Search
    Enhancing Keyframe Selection via LLM-Generated Pseudo-Labels for Long-form Video Question AnsweringJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550537
    Citation: Enhancing Keyframe Selection via LLM-Generated Pseudo-Labels for Long-form Video Question AnsweringJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550537

    Enhancing Keyframe Selection via LLM-Generated Pseudo-Labels for Long-form Video Question Answering

    • Keyframe selection is an important method for long-form video question answering, as it accurately identifies key content from redundant information and establishes an interpretable reasoning path. However, existing keyframe selection methods face challenges with insufficient semantic sensitivity during end-to-end training, leading to the introduction of a significant amount of irrelevant frame noise, which affects the model's accuracy and interpretability. To address this, we propose a Pseudo-Labels Guided Keyframe Selection (PGKS) model. This model first utilizes a large language model to semantically integrate questions and answers, generating global descriptive text. It then employs a multimodal alignment model to compute semantic similarity scores between the description and video sampled frames, thereby constructing frame-level pseudo-labels. By guiding the calculation of frame-level similarity scores with these pseudo-labels, the model effectively suppresses irrelevant frame noise, enhancing both the accuracy and interpretability of keyframe selection. Furthermore, our method balances differentiable and non-differentiable sorting results, and introduces a sliding time window mechanism to further improve the model's understanding of temporal relationships in videos. Experimental results demonstrate that the PGKS model achieves an accuracy of 62.67% on the long-form video question answering dataset NExT-QA, outperforming existing methods of comparable size and improving by 8.51% over the baseline without pseudo-label guidance.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return