Show simple item record

AuthorSirinukunwattana, Korsuk
AuthorSavage, Richard S.
AuthorBari, Muhammad F.
AuthorSnead, David R.J.
AuthorRajpoot, Nasir M.
Available date2016-03-31T14:00:58Z
Publication Date2013-10
Publication NamePLoS ONE
ResourceScopus
CitationSirinukunwattana K, Savage RS, Bari MF, Snead DRJ, Rajpoot NM (2013) Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics. PLoS ONE 8(10): e75748.
ISSN1932-6203
URIhttp://dx.doi.org/10.1371/journal.pone.0075748
URIhttp://hdl.handle.net/10576/4302
AbstractClustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at at https://sites.google.com/site/gaussianbhc/.
SponsorKorsuk Sirinukunwattana is partly funded by Qatar National Research Fund grant no. NPRP5-1345-1-228 and partly by the Department of Computer Science, University of Warwick. RSS acknowledges the support of an Medical Research Council Biostatistics Fellowship (G0902104). MFB acknowledges the support of Higher Education Commission and Dow University of Health Science, Pakistan. Funding for the collection of lung tissue was from the West Midlands Lung Tissue Consortium.
Languageen
PublisherPublic Library of Science
SubjectBayes theorem
Bayesian hierarchical clustering
classification algorithm
cluster analysis
conjugate
gene cluster
gene expression
normal distribution
nucleotide sequence
tumor gene
algorithm
gene expression profiling
gene expression regulation
genetics
TitleBayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
TypeArticle
Issue Number10
Volume Number8
dc.accessType Open Access


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record