Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
المؤلف | Sirinukunwattana, Korsuk |
المؤلف | Savage, Richard S. |
المؤلف | Bari, Muhammad F. |
المؤلف | Snead, David R.J. |
المؤلف | Rajpoot, Nasir M. |
تاريخ الإتاحة | 2016-03-31T14:00:58Z |
تاريخ النشر | 2013-10 |
اسم المنشور | PLoS ONE |
المصدر | Scopus |
الاقتباس | Sirinukunwattana K, Savage RS, Bari MF, Snead DRJ, Rajpoot NM (2013) Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics. PLoS ONE 8(10): e75748. |
الرقم المعياري الدولي للكتاب | 1932-6203 |
الملخص | Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at at https://sites.google.com/site/gaussianbhc/. |
راعي المشروع | Korsuk Sirinukunwattana is partly funded by Qatar National Research Fund grant no. NPRP5-1345-1-228 and partly by the Department of Computer Science, University of Warwick. RSS acknowledges the support of an Medical Research Council Biostatistics Fellowship (G0902104). MFB acknowledges the support of Higher Education Commission and Dow University of Health Science, Pakistan. Funding for the collection of lung tissue was from the West Midlands Lung Tissue Consortium. |
اللغة | en |
الناشر | Public Library of Science |
الموضوع | Bayes theorem Bayesian hierarchical clustering classification algorithm cluster analysis conjugate gene cluster gene expression normal distribution nucleotide sequence tumor gene algorithm gene expression profiling gene expression regulation genetics |
النوع | Article |
رقم العدد | 10 |
رقم المجلد | 8 |
الملفات في هذه التسجيلة
هذه التسجيلة تظهر في المجموعات التالية
-
علوم وهندسة الحاسب [2402 items ]