ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (3): 474-486.doi: 10.7544/issn1000-1239.2020.20190625

所属专题: 2020面向服务的群智化生态化软件开发方法专题

• 软件技术 • 上一篇    下一篇

面向技术论坛的问题解答状态预测

沈明珠,刘辉   

  1. (北京理工大学计算机学院 北京 100081) (3120181025@bit.edu.cn)
  • 出版日期: 2020-03-01
  • 基金资助: 
    国家自然科学基金重大项目(61690205)

Status Prediction for Questions Post on Technical Forums

Shen Mingzhu, Liu Hui   

  1. (School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081)
  • Online: 2020-03-01
  • Supported by: 
    This work was supported by the Major Program of the National Natural Science Foundation of China (61690205).

摘要: 当遭遇技术问题时,开发人员往往会在Stack Overflow等技术论坛上发布问题并等待回答.此类QA系统也是基于互联网的群智化软件开发的一个重要表现形式.但是论坛上提出的问题并不一定能够获得满意答案.因此,提出问题并被动地等待答案并不总是最佳策略.为此,提出了一种基于深度神经网络的方法以自动预测问题能否获得满意答案.提前预知问题能否及时获得有效答复,开发人员可以提前规划应对策略.该方法不仅充分利用了问题本身的文本信息,也将提问人员相关内容作为预测的主要依据.利用最新的深度学习技术,充分挖掘输入特征与问题解答状态之间的内在关联关系.在Stack Overflow提供的数据集上的实验结果表明:所提出的方法能够预测问题的解答情况,结果显示在预测问题是否有满意答案的查准率为58.87%、查全率为46.68%(随机猜测的查准率为38.77%,查全率为35.26%),并优于机器学习KNN和浅层神经网络FastText.

关键词: 群智化软件, 社区问答, 状态预测, 深度学习, 文本分类

Abstract: When encountered by technical problems, developers often post questions on technical forums such as Stack Overflow, and wait for satisfying answers. QA forums are also an important manifestation of Internet-based group intelligence software development. However, the questions posted in the forums may not get satisfying answers. Therefore, asking problems and passively waiting for solution is not always the best strategy. To this end, we propose a deep neural network based approach to automatically predict whether the questions can obtain satisfying answers. Knowing whether the questions can be effectively answered in advance, developers figure out the best strategy for their technical problems in advance. This approach not only takes full usage of the text information of the problems itself, but also exploits the relevant content of the inquirer of the questions. With the latest deep learning technologies, it fully exploits the intrinsic relationship between the input features and the questions’ solving status. Experimental results on the dataset provided by Stack Overflow suggest that the proposed approach can accurately predict the solving status of the questions. The precision of predicting well-answered problems is 58.87%, and the recall is 46.68% (in contrast, random guess results in a precision of 38.77%, and recall of 35.26%), better than KNN and FastText.

Key words: group intelligence software, QA forum, status prediction, deep learning, text classification

中图分类号: