Abstract:
Attribute value extraction is an important task of information extraction. However, the heterogeneous attributes and the natural language processing bottleneck make this problem more difficult and complex. In addition, most quantity attributes are single-valued and variable, thus its difficult to find out the accurate value of those attributes. Most research works are based on semi -supervision methods or lexico-syntactic patterns, however these methods overlook the properties of quantity attributes and require much effort to ensure the reliability of extraction results. In this paper, the definition of meta-property is given to avoid these drawbacks, and a novel approach to attribute-value extraction based on meta-property is proposed to avoid the drawback of traditional methods. The system is implemented and the overall structure and major components of the system are presented, including textual information source selection, candidate extraction, candidate evaluation and automatic verification. Experiments are carried out on 5 kinds of entity types and their 9 subtypes from Baidu encyclopedia. Experimental results show that the new approach achieves an average precision up to 71% and an average recall of 89%, significantly higher than general query-based approaches and traditional lexico-syntactic pattern based methods. The new approach has a better generalization capability on open domain attribute-value extraction, especially on the singled-valued attribute.