MACHINE LEARNING PREDICTION OF DIABETES FROM A PUBLICLY AVAILABLE DATASET
Date
2024-01Metadata
Show full item recordAbstract
A wide array of medical conditions necessitate invasive diagnostic techniques, with diabetes being one of the most well-known among them. To address this challenge, a predictive model was developed using a publicly available dataset comprising over a million participants from the USA, sourced from the Behavioral Risk Factor Surveillance System (BRFSS). The dataset spans the years 2019, 2020, and 2021. After conducting a thorough literature review to establish connections with features defining a participant's diabetic status, three primary class features and 30 additional features were chosen from this three-year dataset. These class and feature selections were adapted to be compatible with the application used to construct the predictive model. The model itself was constructed using the Weka application. Missing data were transformed and the Principle Component filter was applied to reduce the data from a multidimensional space to a 2D space. The model was trained on four different classifiers, with Random Forest emerging as the most effective classifier. To validate the model, an unseen dataset from 2021 was employed, employing the supply test set method in Weka. The model demonstrated an ability to predict more than 70% of the cases and 98% of the controls. This predictive model can be used to predict various health conditions using publicly available data, reducing the reliance on invasive diagnostic methods. Furthermore, it holds promise for the Hamad Medical Corporation in assessing the risk of diabetes development among individuals in Qatar by leveraging the features incorporated in this model.
DOI/handle
http://hdl.handle.net/10576/51491Collections
- Biomedical Sciences [64 items ]