Abstract:
As an important branch in the study of question answering system, automatic reading comprehension (RC) system involves reading a short passage of text and answering a series of questions pertaining to that text. In all question types including who, what, when, where, why studied in the field of RC, answer extraction of why-question should apply the discourse structure information of text and the answer is not an named entity. Concerning these difference of why-question with other types, an answer sentence extraction approach for why-question of reading comprehension is given in this paper based on question topic and causal rhetorical relation identification. It uses machine learning model to rank sentences in text according to their probabilities of becoming answer sentence. In the model, two kinds of feature are used for identification of text sentence corresponding to question topic and that of causal rhetorical relation between question topic and sentence context respectively. In all features, the idf and semantic role similarity features are utilized to identify the sentence corresponding to the question topic, and other features, including cue phrases, special semantic roles, causal relation entailment probabilities between words mined from large scale document collections, position and expression format of sentence context, are used to identify causal rhetorical relation. Experimental results on Remedia corpus show that the method improves significantly the performance of reading comprehension why-question answering.