ISSN 1000-1239 CN 11-1777/TP


• 软件技术 • 上一篇    下一篇


李 征1,2 王 健2 张 能2 李 昭2 何成万3 何克清2   

  1. 1(河南大学计算机与信息工程学院 河南开封 475001) 2(软件工程国家重点实验室(武汉大学) 武汉 430072) 3(武汉工程大学计算机科学与工程学院 武汉 430073) (
  • 出版日期: 2014-02-15

A Topic-Oriented Clustering Approach for Domain Services

Li Zheng1,2, Wang Jian2, Zhang Neng2, Li Zhao2, He Chengwan3, and He Keqing2   

  1. 1(School of Computer and Information Engineering, Henan University, Kaifeng, Henan 475001) 2(State Key Laboratory of Software Engineering (Wuhan University), Wuhan 430072) 3(School of Computer Science and Technology, Wuhan Institute of Technology, Wuhan 430073)
  • Online: 2014-02-15

摘要: 随着互联网上服务资源规模的快速增长,如何高效、准确地发现服务成为一个亟待解决的关键问题.服务聚类是促进服务发现的一种重要技术.但是,现有服务聚类方法只对单一类型的服务文档进行聚类,并且没有考虑服务的领域特性.针对该问题,在对服务进行领域分类的基础上,提出了一种基于概率、融合领域特性的服务聚类模型——领域服务聚类模型(domain service clustering model, DSCM),然后基于该模型提出了一种面向主题的服务聚类方法.最后通过ProgrammableWeb网站提供的真实服务集对提出的方法进行了验证.实验结果表明,该方法可以准确地对不同类型的服务文档进行聚类.与经典的潜在狄利克雷分配(latent Dirichlet allocation, LDA),K-means等方法相比,该方法在聚类纯度和F-measure指标上均具有更好的效果,从而为按需服务发现与服务组合提供更好的支持.

关键词: 服务聚类, 潜在狄利克雷分配, 主题, 概率, 特征降维

Abstract: With the development of SOA and SaaS technologies, the scale of services on the Internet shows a trend of rapid growth. Faced with the abundant and heterogeneous services, how to efficiently and accurately discover user desired services becomes a key issue in service-oriented software engineering. Services clustering is an important technology to facilitate services discovery. However, the existing clustering approaches are only for a single type of service documents, and they do not consider the domain characteristic of services. To avoid these limitations, on the basis of domain-oriented services classification, this paper proposes a services clustering model named as DSCM based on probability and domain characteristic, and then proposes a topic-oriented clustering approach for domain services based on the DSCM model. The proposed clustering approach can cluster services described in WSDL, OWL-S, and text, which can effectively solve the problem of single service document type. Finally, experiments are conducted on real services from ProgrammableWeb to demonstrate the effectiveness of the proposed approach. Experimental results show that the proposed approach can cluster services more accurately. Compared with the approaches of classical latent Dirichlet allocation (LDA) and K-means, the proposed approach can achieve better in the purity of cluster and F-measure, which can greatly promote on demand services discovery and composition.

Key words: services clustering, latent Dirichlet allocation, topic, probability, feature dimension reduction