Abstract:
The biclustering algorithms focus on clustering correlated patterns in sub-spaces. However, most of the biclustering algorithms nowadays address only the linearly correlated pattern or a certain linearly similar pattern, leaving the nonlinearly correlated patterns untouched, which are often hidden in a great many of real data sets. In this paper, a novel biclustering algorithm called MI-TSB is proposed to find and report all nonlinearly correlated patterns in time series gene expression data. It first deduces an efficient calculating formula of quadratic mutual information with matrix theory, and then based on the quadratic mutual information and sliding window technology, a time series data nonlinearly similar model and a simple general suffix tree variation version are introduced. Using suffix tree as index structure, the MI-TSB algorithm explores all of biclusters effectively and efficiently. Compared with general biclustering algorithms, the ability of discovering the nonlinearly correlated patterns in sliding window is one of the most important advantages of the MI-TSB algorithm. Additionally, experiments on real gene expression dataset and synthetic dataset show that the MI-TSB algorithm successfully discovers some nonlinearly correlated patterns which can not be found by other ordinary biclustering algorithms. Besides, gene annotating by gene ontology demonstrates that the MI-TSB algorithm can find biologically meaningful results.