Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
Author | Sirinukunwattana, Korsuk |
Author | Savage, Richard S. |
Author | Bari, Muhammad F. |
Author | Snead, David R.J. |
Author | Rajpoot, Nasir M. |
Available date | 2016-03-31T14:00:58Z |
Publication Date | 2013-10 |
Publication Name | PLoS ONE |
Resource | Scopus |
Citation | Sirinukunwattana K, Savage RS, Bari MF, Snead DRJ, Rajpoot NM (2013) Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics. PLoS ONE 8(10): e75748. |
ISSN | 1932-6203 |
Abstract | Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at at https://sites.google.com/site/gaussianbhc/. |
Sponsor | Korsuk Sirinukunwattana is partly funded by Qatar National Research Fund grant no. NPRP5-1345-1-228 and partly by the Department of Computer Science, University of Warwick. RSS acknowledges the support of an Medical Research Council Biostatistics Fellowship (G0902104). MFB acknowledges the support of Higher Education Commission and Dow University of Health Science, Pakistan. Funding for the collection of lung tissue was from the West Midlands Lung Tissue Consortium. |
Language | en |
Publisher | Public Library of Science |
Subject | Bayes theorem Bayesian hierarchical clustering classification algorithm cluster analysis conjugate gene cluster gene expression normal distribution nucleotide sequence tumor gene algorithm gene expression profiling gene expression regulation genetics |
Type | Article |
Issue Number | 10 |
Volume Number | 8 |
Files in this item
This item appears in the following Collection(s)
-
Computer Science & Engineering [2402 items ]