• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

基于分组提升集成的跨领域文本情感分类

赵传君, 王素格, 李德玉, 李欣

赵传君, 王素格, 李德玉, 李欣. 基于分组提升集成的跨领域文本情感分类[J]. 计算机研究与发展, 2015, 52(3): 629-638. DOI: 10.7544/issn1000-1239.2015.20140156
引用本文: 赵传君, 王素格, 李德玉, 李欣. 基于分组提升集成的跨领域文本情感分类[J]. 计算机研究与发展, 2015, 52(3): 629-638. DOI: 10.7544/issn1000-1239.2015.20140156
Zhao Chuanjun, Wang Suge, Li Deyu, Li Xin. Cross-Domain Text Sentiment Classification Based on Grouping-AdaBoost Ensemble[J]. Journal of Computer Research and Development, 2015, 52(3): 629-638. DOI: 10.7544/issn1000-1239.2015.20140156
Citation: Zhao Chuanjun, Wang Suge, Li Deyu, Li Xin. Cross-Domain Text Sentiment Classification Based on Grouping-AdaBoost Ensemble[J]. Journal of Computer Research and Development, 2015, 52(3): 629-638. DOI: 10.7544/issn1000-1239.2015.20140156
赵传君, 王素格, 李德玉, 李欣. 基于分组提升集成的跨领域文本情感分类[J]. 计算机研究与发展, 2015, 52(3): 629-638. CSTR: 32373.14.issn1000-1239.2015.20140156
引用本文: 赵传君, 王素格, 李德玉, 李欣. 基于分组提升集成的跨领域文本情感分类[J]. 计算机研究与发展, 2015, 52(3): 629-638. CSTR: 32373.14.issn1000-1239.2015.20140156
Zhao Chuanjun, Wang Suge, Li Deyu, Li Xin. Cross-Domain Text Sentiment Classification Based on Grouping-AdaBoost Ensemble[J]. Journal of Computer Research and Development, 2015, 52(3): 629-638. CSTR: 32373.14.issn1000-1239.2015.20140156
Citation: Zhao Chuanjun, Wang Suge, Li Deyu, Li Xin. Cross-Domain Text Sentiment Classification Based on Grouping-AdaBoost Ensemble[J]. Journal of Computer Research and Development, 2015, 52(3): 629-638. CSTR: 32373.14.issn1000-1239.2015.20140156

基于分组提升集成的跨领域文本情感分类

基金项目: 国家自然科学基金项目(61175067,61272095,61405109);国家“八六三”高技术研究发展计划基金项目(2015AA015407);山西省回国留学人员科研项目(2013-014);山西省自然科学基金项目(2013011066-4);山西省科技攻关项目(20110321027-02)
详细信息
  • 中图分类号: TP391

Cross-Domain Text Sentiment Classification Based on Grouping-AdaBoost Ensemble

  • 摘要: 针对目标领域带标签数据偏少的问题,综合运用半监督学习、BootStrapping、数据分组、AdaBoost、集成学习等策略与技术,提出了一种基于分组提升集成的跨领域文本情感分类方法.该方法首先利用少量人工标注的目标领域数据,基于合成过抽样技术产生一定数量的虚拟数据.在此基础上,采用BootStrapping方法获得更多目标领域高可信度的带标签数据.在分类器的构建方面,首先将源领域的带标签数据等量分割,并分别与目标领域带标签数据组合,在每个组合数据块上运用AdaBoost方法提升地训练多个分类器,并将这些分类器线性地集成为一个分类器.在亚马逊购物网站4个领域的情感数据集上的实验表明,基于分组提升集成的跨领域文本情感分类方法一定程度上提高了跨领域文本情感分类的精度.
    Abstract: In the cross-domain sentiment classification, the labeled data in the target domain is often scarce and precious. To solve this problem, this paper proposes a grouping-AdaBoost ensemble classifier method by comprehensively using the strategies and techniques of semi-supervised learning, Bootstrapping, data grouping, AdaBoost, ensemble learning. Firstly, we adopt a small amount of labeled data in the target domain to generate a number of virtual data by using synthetic minority over-sampling technique. On this basis, we can obtain more data with high credibility label in the target domain by using Bootstrapping method. In the aspect of classifier construction, we firstly make an equivalent quantity partition to the labeled data in the source domain, and combine each part with the labeled data in the target domain to form the corresponding combined data sets. Corresponding to each combined data set, a classifier is trained, and it is then promoted by AdaBoost method. At last, these classifiers corresponding to the combined data sets are linearly integrated into an ensemble classifier. The experimental results on four data sets from Amazon online shopping reviews corpora indicate that the proposed method can improve the accuracy of cross-domain sentiment transformation effectively.
计量
  • 文章访问数:  1563
  • HTML全文浏览量:  4
  • PDF下载量:  827
  • 被引次数: 0
出版历程
  • 发布日期:  2015-02-28

目录

    /

    返回文章
    返回