• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Li Shijun, Yu Junqing, Ou Weijie. Web Information Extraction Based on HTML Pattern Algebra[J]. Journal of Computer Research and Development, 2006, 43(9): 1644-1650.
Citation: Li Shijun, Yu Junqing, Ou Weijie. Web Information Extraction Based on HTML Pattern Algebra[J]. Journal of Computer Research and Development, 2006, 43(9): 1644-1650.

Web Information Extraction Based on HTML Pattern Algebra

More Information
  • Published Date: September 14, 2006
  • Generating wrapper efficiently for extracting Web data has broad application prospect, but is also a difficult problem that is not yet solved efficiently till now. To tackle this problem, a pattern algebra for HTML documents is introduced, which includes key concepts, such as consistent pattern set, and the addition operation of pattern, and based on it a new approach to extract Web information is presented. It induces the consistent pattern set which represents identifying rules of each attribute by exploring the whole samples, and then extracts data by the consistent pattern set with multiple patterns. It can apply Web pages with tabular structure, in which there are missing attributes or attributes with multiple values or different order and hierarchical structure, and has been validated experimentally in the prototype.
  • Related Articles

    [1]Tang Kezong, Liu Bingxiang, Yang Jingyu, Sun Tingkai. Double Center Particle Swarm Optimization Algorithm[J]. Journal of Computer Research and Development, 2012, 49(5): 1086-1094.
    [2]Fan Xiaoqin, Jiang Changjun, Fang Xianwen, Ding Zhijun. Dynamic Web Service Selection Based on Discrete Particle Swarm Optimization[J]. Journal of Computer Research and Development, 2010, 47(1): 147-156.
    [3]Jie Jing, Zeng Jianchao, Han Chongzhao. Self-Organized Particle Swarm Optimization Based on Feedback Control of Diversity[J]. Journal of Computer Research and Development, 2008, 45(3): 464-471.
    [4]Ma Ming, Zhou Chunguang, Zhang Libiao, Ma Jie. Fuzzy Neural Network Optimization by a Multi-Objective Particle Swarm Optimization Algorithm[J]. Journal of Computer Research and Development, 2006, 43(12): 2104-2109.
    [5]Lei Kaiyou and Qiu Yuhui. A Study of Constrained Layout Optimization Using Adaptive Particle Swarm Optimizer[J]. Journal of Computer Research and Development, 2006, 43(10): 1724-1731.
    [6]Cui Zhihua and Zeng Jianchao. Modified Particle Swarm Optimization Based on Differential Model[J]. Journal of Computer Research and Development, 2006, 43(4): 646-653.
    [7]Dou Quansheng, Zhou Chunguang, Xu Zhongyu, Pan Guanyu. Swarm-Core Evolutionary Particle Swarm Optimization in Dynamic Optimization Environments[J]. Journal of Computer Research and Development, 2006, 43(1): 89-95.
    [8]Liu Yu, Qin Zheng, Lu Jiang, Shi Zhewen. Multimodal Particle Swarm Optimization for Neural Network Ensemble[J]. Journal of Computer Research and Development, 2005, 42(9): 1519-1526.
    [9]Chen Hongzhou, Gu Guochang, and Kang Wangxing. A Sentient Particle Swarm Optimization[J]. Journal of Computer Research and Development, 2005, 42(8): 1299-1305.
    [10]Dou Quansheng, Zhou Chunguang, and Ma Ming. Two Improvement Strategies for Particle Swarm Optimization[J]. Journal of Computer Research and Development, 2005, 42(5): 897-904.

Catalog

    Article views (661) PDF downloads (543) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return