Abstract:
With the overwhelming volume of news stories created and stored electronically everyday, there is an increasing need for techniques to analyze and present news stories to the users in a more meaningful manner. Most previous research focuses on organizing news set into flat collections (topics) of stories. However, a topic in news is more than a mere collection of stories: it is actually characterized by a definite structure of inter-related events. Unfortunately, it is very difficult to identify events within a topic because stories about the same topic are usually very similar to each other irrespective of the events they belong to. To deal with this problem, two methods based on event key terms to identify events and their relations are proposed. For event identification, some tight term clusters are first captured as term committees of potential events, and then used to find the core story sets of potential events, and each story is assigned to an event core story set at last. For event relation discovery, the term committees are also used to improve story and event similarity calculation. The experimental results on two Linguistic Data Consortium (LDC) datasets show that the proposed methods for event identification and relation discovery outperform previous methods significantly.