• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Wang Yu, Tan Songbo, Liao Xiangwen, Zeng Yiling. Extended Domain Model Based Named Attribute Extraction[J]. Journal of Computer Research and Development, 2010, 47(9): 1567-1573.
Citation: Wang Yu, Tan Songbo, Liao Xiangwen, Zeng Yiling. Extended Domain Model Based Named Attribute Extraction[J]. Journal of Computer Research and Development, 2010, 47(9): 1567-1573.

Extended Domain Model Based Named Attribute Extraction

More Information
  • Published Date: September 14, 2010
  • Web information extraction is an important task of Web mining. Various applications could benefit from the advancement in this area. These applications include semantic Web, vertical search, sentiment analysis, etc. Current techniques require lots of human interaction which preclude the universal application of Web information extraction. To automate the extraction process, recent research works identify specific features of special domains and extract information by machine learning techniques. However, because of the dependence on specific features, it is very difficult to extend such methods to other domains. In this paper, the Web information extraction problem is analyzed and a subtask is proposed. This new subtask is called named attribute extraction task. Statistics results from multiple datasets prove that named attribute extraction task covers more than 60% attributes in these domains, which show the importance of this subtask. Named attributes are attributes of objects which are encoded in the name-value pair form. That is, the names and values of attributes are settled nearby in the Web pages. Therefore, once the names of attributes are located, the values can be extracted automatically. In this paper, an extended domain model is proposed to summarize attribute names of a domain. And an information extraction method based on this model is developed. Experiments show that the method can extract named attributes at the precision 80%, and at the recall higher than 90%.
  • Related Articles

    [1]Yu Wei, Li Shijun, Yang Sha, Hu Yahui, Liu Jing, Ding Yonggang, Wang Qian. Automatically Discovering of Inconsistency Among Cross-Source Data Based on Web Big Data[J]. Journal of Computer Research and Development, 2015, 52(2): 295-308. DOI: 10.7544/issn1000-1239.2015.20140224
    [2]Zhang Xianchao, Xu Wen, Gao Liang, and Liang Wenxin. Combining Content and Link Analysis for Local Web Community Extraction[J]. Journal of Computer Research and Development, 2012, 49(11): 2352-2358.
    [3]Mao Xianling, He Jing, and Yan Hongfei. A Survey of Web Page Cleaning Research[J]. Journal of Computer Research and Development, 2010, 47(12).
    [4]Wu Qiong, Tan Songbo, Xu Hongbo, Duan Miyi, Cheng Xueqi. CrossDomain Opinion Analysis Based on RandomWalk Model[J]. Journal of Computer Research and Development, 2010, 47(12).
    [5]Ma Anxiang, Zhang Bin, Gao Kening, Qi Peng, and Zhang Yin. Deep Web Data Extraction Based on Result Pattern[J]. Journal of Computer Research and Development, 2009, 46(2): 280-288.
    [6]Shi Yuliang, Wang Haiyang, Zhang Liang, Shi Baile. Compatibility and Substitutability Analysis of Web Services Composition[J]. Journal of Computer Research and Development, 2007, 44(11): 1955-1961.
    [7]Ye Lei and Zhang Bin. A Method of Web Service Discovery Based on Functional Semantics[J]. Journal of Computer Research and Development, 2007, 44(8): 1357-1364.
    [8]Xue Xiaobing, Han Jieling, Jiang Yuan, and Zhou Zhihua. Link Recommendation in Web Index Page Based on Multi-Instance Learning Techniques[J]. Journal of Computer Research and Development, 2007, 44(3).
    [9]Li Shijun, Yu Junqing, Ou Weijie. Web Information Extraction Based on HTML Pattern Algebra[J]. Journal of Computer Research and Development, 2006, 43(9): 1644-1650.
    [10]Yang Nan, Gong Danzhi, Li Xian, and Meng Xiaofeng. Survey of Web Communities Identification[J]. Journal of Computer Research and Development, 2005, 42(3): 1.

Catalog

    Article views (823) PDF downloads (717) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return