Extracting Relations from the Web via Weakly Supervised Learning

Chen Liwei, Feng Yansong, and Zhao Dongyan

Journal of Computer Research and Development > 2013 > 50(9): 1825-1835.

Chen Liwei, Feng Yansong, and Zhao Dongyan. Extracting Relations from the Web via Weakly Supervised Learning[J]. Journal of Computer Research and Development, 2013, 50(9): 1825-1835.

Citation:

Chen Liwei, Feng Yansong, and Zhao Dongyan. Extracting Relations from the Web via Weakly Supervised Learning[J]. Journal of Computer Research and Development, 2013, 50(9): 1825-1835.

Citation:

Chen Liwei, Feng Yansong, and Zhao Dongyan. Extracting Relations from the Web via Weakly Supervised Learning[J]. Journal of Computer Research and Development, 2013, 50(9): 1825-1835.

PDF (2460 KB)

Extracting Relations from the Web via Weakly Supervised Learning

Chen Liwei, Feng Yansong, and Zhao Dongyan

(Institute of Computer Science and Technology, Peking University, Beijing 100871)

More Information

Published Date: September 14, 2013

Graphical Abstract

Abstract

Abstract

In the time of big data, information extraction at a large scale has been an important topic discussed in natural language processing and information retrieval. Specifically, weak supervision, as a novel framework that need not any human involvement and can be easily adapted to new domains, is receiving increasing attentions. The current study of weak supervision is intended primarily for English, with conventional features such as segments of words based lexical features and dependency based syntactic features. However, this type of lexical features often suffer from the data sparsity problem, while syntactic features strongly rely on the availability of syntactic analysis tools. This paper proposes to make use of n-gram features which can relieve to some extent the data sparsity problem brought by lexical features. It is also observed that the n-gram features are important for multilingual relation extraction, especially, they can make up for the syntactic features in those languages where syntactic analysis tools are not reliable. In order to deal with the quality issue of training data used in weakly supervised learning models, a bootstrapping approach, co-training, is introduced into the framework to improve this extraction paradigm. We study the strategies used to combine the outputs from different training views. The experimental results on both English and Chinese datasets show that the proposed approach can effectively improve the performance of weak supervision in both languages, and has the potential to work well in a multilingual scenario with more languages.
- relation extraction,
- weakly supervised learning,
- maximum entropy,
- co-training,
- knowledge base construction

FullText(HTML)

References (0)

[1]	Wang Jianwei, Hao Zhongxiao. Node Probability Query Algorithm in Probabilistic XML Document Tree[J]. Journal of Computer Research and Development, 2012, 49(4): 785-794.
[2]	Meng Xiangfu, Yan Li, Zhang Wengbo, Ma Zongmin. XML Approximate Query Approach Based on Attribute Units Extension[J]. Journal of Computer Research and Development, 2010, 47(11): 1936-1946.
[3]	Liu Xiping, Wan Changxuan, and Liu Dexi. Effective XML Vague Content and Structure Retrieval and Scoring[J]. Journal of Computer Research and Development, 2010, 47(6): 1070-1078.
[4]	Yang Weidong and Shi Baile. A Survey of XML Stream Management[J]. Journal of Computer Research and Development, 2009, 46(10): 1721-1728.
[5]	Wang Xin, Yuan Xiaojie, Wang Chenying, and Zhang Haiwei. XN-Store: A Storage Scheme for Native XML Databases[J]. Journal of Computer Research and Development, 2008, 45(7).
[6]	Wan Jing, Hao Zhongxiao. Study of Multi-Valued Dependency in Strong Total Order Temporal Scheme with Multiple Time Granularities[J]. Journal of Computer Research and Development, 2008, 45(6).
[7]	Wu Yonghui. The Sufficient and Necessary Condition for No Implicit Redundancies in an XML Schema[J]. Journal of Computer Research and Development, 2007, 44(12): 2106-2111.
[8]	Hao Zhongxiao, Li Yanjuan. Normalization of Temporal Scheme with Respect to Temporal Multivalued Dependency with Multiple Time Granularities[J]. Journal of Computer Research and Development, 2007, 44(5): 853-859.
[9]	Lü Teng, Yan Ping. Functional Dependencies and Inference Rules for XML[J]. Journal of Computer Research and Development, 2005, 42(5): 792-796.
[10]	Zhang Zhongping, Wang Chao, Zhu Yangyong. Constraint-Based Normalization Algorithms for XML Documents[J]. Journal of Computer Research and Development, 2005, 42(5): 755-764.