高级检索

    基于类别对抗联合学习的跨提示自动作文评分方法

    Category Adversarial Joint Learning Method for Cross-Prompt Automated Essay Scoring

    • 摘要: 自动作文评分能够有效减轻教师的作文评阅负担并为学生提供客观、及时的反馈,是自然语言处理在教育领域的一项重要应用. 跨提示自动作文评分旨在学习一个可迁移的自动评分模型,使之能够有效为目标提示的作文评分. 然而,现有的跨提示自动作文评分大都是面向目标提示数据可见的场景,通过将源提示和目标提示的特征分布进行对齐学习提示不变特征表示来学习可迁移到目标提示的评分模型,但是这类方法无法应用于目标提示数据不可见的场景. 面向目标提示数据不可见的场景,提出一种基于类别对抗联合学习的跨提示自动作文评分方法. 一方面,通过对分类和回归联合任务进行联合建模学习2个任务的共享特征,从而实现二者性能的相互促进;另一方面,不同于现有方法采用提示无关特征来提升模型泛化性能,针对不同提示的类别分布差异引入类别对抗策略,通过对不同提示进行类别级特征对齐学习不同提示间的细粒度不变特征表示,从而提升模型泛化性能. 将所提出方法用于自动学生评估奖(ASAP)和ASAP++数据集,分别对作文的总体评分和属性评分进行预测. 实验结果表明,与6种经典方法相比,在平方卡帕指标上取得最好的实验效果.

       

      Abstract: Automated essay scoring (AES) can effectively alleviate the burden on teachers when evaluating student essays and provide students with objective and timely feedback. It is a crucial application of natural language processing in the field of education. Cross-prompt AES aims to develop a transferable automated scoring model that performs well on essays from a target prompt. However, existing cross-prompt AES models primarily operate in scenarios where target prompt data is available. These models align feature distributions between source and target prompts to learn invariant feature representations for transferring to the target prompt. Unfortunately, such methods cannot be applied to scenarios where target prompt data is not available. In this paper, we propose a cross-prompt AES method based on Category Adversarial Joint Learning (CAJL). First, we jointly model AES as classification and regression tasks to achieve combined performance improvement. Second, unlike existing methods that rely on prompt-agnostic features to enhance model generalization, our approach introduces a category adversarial strategy. By aligning category level features across different prompts, we can learn invariant feature representations of different prompt and further enhance model generalization. We evaluate our proposed method on the Automated Student Assessment Prize (ASAP) and ASAP++ datasets, predicting both overall essay scores and trait scores. Experimental results demonstrate that our method outperforms six classical methods in terms of the quadratic weighted kappa metric.

       

    /

    返回文章
    返回