Dissecting Crucial Gene Markers Involved in HPV-Associated Oropharyngeal Squamous Cell Carcinoma from RNA-Sequencing Data through Explainable Artificial Intelligence
Author | Sekaran, Karthik |
Author | Varghese, Rinku Polachirakkal |
Author | Krishnan, Sasikumar |
Author | Zayed, Hatem |
Author | El Allali, Achraf |
Author | Doss, George Priya C |
Available date | 2024-08-28T04:23:43Z |
Publication Date | 2024 |
Publication Name | Frontiers in Bioscience - Landmark |
Resource | Scopus |
ISSN | 27686701 |
Abstract | Background: The incidence rate of oropharyngeal squamous cell carcinoma (OPSCC) worldwide is alarming. In the clinical community, there is a pressing necessity to comprehend the etiology of the OPSCC to facilitate the administration of effective treatments. Methods: This study confers an integrative genomics approach for identifying key oncogenic drivers involved in the OPSCC pathogenesis. The dataset contains RNA-Sequencing (RNA-Seq) samples of 46 Human papillomavirus-positive head and neck squamous cell carcinoma and 25 normal Uvulopalatopharyngoplasty cases. The differential marker selection is performed between the groups with a log2FoldChange (FC) score of 2, adjusted p-value < 0.01, and screened 714 genes. The Particle Swarm Optimization (PSO) algorithm selects the candidate gene subset, reducing the size to 73. The state-of-the-art machine learning algorithms are trained with the differentially expressed genes and candidate subsets of PSO. Results: The analysis of predictive models using Shapley Additive exPlanations revealed that seven genes significantly contribute to the model's performance. These include ECT2, LAMC2, and DSG2, which predominantly influence differentiating between sample groups. They were followed in importance by FAT1, PLOD2, COL1A1, and PLAU. The Random Forest and Bayes Net algorithms also achieved perfect validation scores when using PSO features. Furthermore, gene set enrichment analysis, protein-protein interactions, and disease ontology mining revealed a significant association between these genes and the target condition. As indicated by Shapley Additive exPlanations (SHAPs), the survival analysis of three key genes unveiled strong over-expression in the samples from "The Cancer Genome Atlas". Conclusions: Our findings elucidate critical oncogenic drivers in OPSCC, offering vital insights for developing targeted therapies and enhancing understanding its pathogenesis. |
Sponsor | The authors would like to thank the Vellore Institute of Technology, India authorities, for providing the necessary support in completing the manuscript. The authors acknowledge the Indian Council of Medical Research (ICMR), the Government of India agency, for the research grants (No. BMI/12(13)/2021, ID No: 2021-6359) and (No. VIR/COVID-19/31/2021/ECD-I, ID. NO: 2021-5570). |
Language | en |
Publisher | IMR Press Limited |
Subject | biomarker discovery explainable artificial intelligence human papillomavirus oropharyngeal squamous cell carcinoma RNA-sequencing shapley additive explanations |
Type | Article |
Pagination | 220 |
Issue Number | 6 |
Volume Number | 29 |
Files in this item
This item appears in the following Collection(s)
-
Biomedical Sciences [738 items ]