Advanced Search
    Cheng Xiaotian, Ding Weiping, Geng Yu, Huang Jiashuang, Ju Hengrong, Guo Jing. Transformer Interpretation Method Based on Sequential Three-Way Mask and Attention Fusion[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440382
    Citation: Cheng Xiaotian, Ding Weiping, Geng Yu, Huang Jiashuang, Ju Hengrong, Guo Jing. Transformer Interpretation Method Based on Sequential Three-Way Mask and Attention Fusion[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440382

    Transformer Interpretation Method Based on Sequential Three-Way Mask and Attention Fusion

    • Transformer has gradually become the preferred solution for computer vision tasks, which has promoted the development of its interpretability methods. Traditional interpretation methods mostly use the perturbation mask generated by the Transformer encoder’s final layer to generate an interpretable map. However, these methods ignore uncertain information on the mask and the information loss in the upsampling and downsampling processes, which can result in rough and incomplete positioning of the object area. To overcome the mentioned problems, a Transformer explanation method based on sequential three-way and attention fusion (SAF-Explainer) is proposed. SAF-Explainer mainly includes the sequential three-way mask (S3WM) module and attention fusion (AF) module. The S3WM module processes the mask by applying strict threshold conditions to avoid the uncertainty information in the mask from damaging the interpretation results, so as to effectively locate the object position. Subsequently, the AF module uses attention matrix aggregation to generate a relationship matrix for cross-layer information interaction, which is used to optimize the detailed information in the interpretation results and generate clear and complete interpretation results. To verify the effectiveness of the proposed SAF-Explainer, comparative experiments were conducted on three natural image datasets and one medical image dataset. The results showed that SAF-Explainer has better explainability. This work advances visual explanation techniques by providing more accurate and clinically relevant interpretability for Transformer-based vision systems, particularly in medical diagnostic applications where precise region identification is crucial.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return