Advanced Search
    Ji Yimu, Zhang Yongpan, Lang Xianbo, Zhang Dianchao, Wang Ruchuan. Parallel of Decision Tree Classification Algorithm for Stream Data[J]. Journal of Computer Research and Development, 2017, 54(9): 1945-1957. DOI: 10.7544/issn1000-1239.2017.20160554
    Citation: Ji Yimu, Zhang Yongpan, Lang Xianbo, Zhang Dianchao, Wang Ruchuan. Parallel of Decision Tree Classification Algorithm for Stream Data[J]. Journal of Computer Research and Development, 2017, 54(9): 1945-1957. DOI: 10.7544/issn1000-1239.2017.20160554

    Parallel of Decision Tree Classification Algorithm for Stream Data

    • With the rise of cloud computing, Internet of things and other technologies, streaming data exists widely in telecommunications, Internet, finance and other fields as a new form of big data. Compared with the traditional static data, stream data in big data has the characters of rapidness, continuity and changing with time. At the same time, the implicit distribution of the data stream will bring about the concept drift problem. In order to satisfy the requirements of stream data classification algorithms in big data, we must improve the traditional static offline data classification algorithms, and propose P-HT parallel algorithm based on distributed computing platform Storm. To meet the requirements of Storm stream processing platform, we improve the flexibility and versatility of the algorithm through sliding window mechanism, alternative tree mechanism and parallel processing mechanism, and the algorithm can adapt to the concept-drift of data stream very well. Finally, we experimentally verify the validity and high efficiency of the algorithm. The results show that the improved P-HT algorithm has better throughput and faster processing speed than the traditional C45 algorithm in the case of no reduction in accuracy.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return