数据驱动的数学试题难度预测

佟威; 汪飞; 刘淇; 陈恩红

doi:10.7544/issn1000-1239.2019.20180366

摘要: 现代化国家题库系统建设是教育考试改革发展的重要保障，也是促进我国教育考试现代化的重要手段.试题难度是入库试题的核心参数，对于命题、组卷、分数报告甚至是考试公平性保障都有着直接影响.由于我国国家考试的特点，很难通过类似国外考试机构的考前试测等方式提前获取试题难度参数，传统的试题难度评估任务通常由人工完成，即由命题专家对试题难度进行评估.这样的做法耗时耗力，且难以保证客观性，因此借助先进信息技术手段探索试题难度的自动化判断具有较大的研究意义，更是体现着中国特色教育考试背景下的中国智慧和中国解决方案.以利用试题文本和答题记录数据实现数据驱动的数学试题难度自动化预测模型为目标，提出了分别基于卷积神经网络(convolutional neural network, CNN)和循环神经网络(recurrent neural network, RNN)的数学试题难度预测模型C-MIDP(CNN for mathematical item difficulty prediction)和R-MIDP(RNN for mathematical item difficulty prediction)，以及二者的混合模型H-MIDP(hybrid model for mathematical item difficulty prediction).具体地，利用所提出的模型直接学习试题文本表征，将考试试题得分率作为标签训练模型，整个过程不需要提供知识标注等教育先验信息.然后，考虑到不同考试中学生群体的不可比性，在训练时提出一种基于context的训练方式；最后，可通过输入试题特征到训练好的模型中进行难度预测.模型在真实的试题数据上取得了较好的实验结果.

Abstract: The construction of item banking system is an important guarantee for the reform and development of educational examination, and meanwhile, is also an essential means to promote the modernization of examination. In such a system, item difficulty is one of the most important parameters, which has a direct influence on item designing, test paper organization, result report and even the fairness guarantee. Unfortunately, due to the unique education background and test characteristics in China, it is difficult to evaluate item difficulty through pre-test organization like some foreign countries. Thus, traditional efforts usually refer to the manual evaluation by expertise (e.g., experienced teachers). However, this way tends to be laborious, time-consuming and subjective in some way. Therefore, it is of great value to automatically judge the difficulty of items by information technology. Along this line, in this paper, we aim to propose a data-driven solution to predict the item difficulty in mathematics leveraged by the historical test logs and the corresponding item materials. Specifically, we propose a C-MIDP model and a R-MIDP model, which are based on CNN and RNN respectively, and further a hybrid H-MIDP model combined with both C-MIDP and R-MIDP. In the models, we directly learn item sematic representation from its text and train its difficulty with the statistic score rates among tests, where the whole modeling do not need any expertise, such as knowledge labeling. Then, we adopt a context-dependent training strategy considering the incomparability between different groups. Finally, with the trained models, we can predict each item difficulty only with its text input. Extensive experiments on a real-world dataset demonstrate that the proposed models perform very well.

数据驱动的数学试题难度预测

Data Driven Prediction for the Difficulty of Mathematical Items