Abstract:
Traditional English text chunking approach is to transfer chunking to part of speech. It is shown that this could not take into account the relationship of neighbor part of speech and the cohesion of all part of speeches within one phrase. In this paper, the headword extending and the evaluation of relative-degree strategy are proposed and applied in the identification of English text chunking whose main features are: 1) regarding each phrase as a cluster whose kernel is headword, which richly uses the disciplinarian of consisting of one phrase; 2) dynamically evaluating the chunking result using doubt-degree and reliability. Through testing on the public corpus, the speed of this method is faster than others, and the F score achieves 94.05%, which is at the state-of-the-art.