Predicting hypertension using machine learning: Findings from Qatar Biobank Study
Abstract
Hypertension, a global burden, is associated with several risk factors and can be treated by lifestyle modifications and medications. Prediction and early diagnosis is important to prevent related health complications. The objective is to construct and compare predictive models to identify individuals at high risk of developing hypertension without the need of invasive clinical procedures. This is a cross-sectional study using 987 records of Qataris and long-term residents aged 18+ years from Qatar Biobank. Percentages were used to summarize data and chi-square tests to assess associations. Predictive models of hypertension were constructed and compared using three supervised machine learning algorithms: decision tree, random forest, and logistics regression using 5-fold cross-validation. The performance of algorithms was assessed using accuracy, positive predictive value (PPV), sensitivity, F-measure, and area under the receiver operating characteristic curve (AUC). Stata and Weka were used for analysis. Age, gender, education level, employment, tobacco use, physical activity, adequate consumption of fruits and vegetables, abdominal obesity, history of diabetes, history of high cholesterol, and mother's history high blood pressure were important predictors of hypertension. All algorithms showed more or less similar performances: Random forest (accuracy = 82.1%, PPV = 81.4%, sensitivity = 82.1%), logistic regression (accuracy = 81.1%, PPV = 80.1%, sensitivity = 81.1%) and decision tree (accuracy = 82.1%, PPV = 81.2%, sensitivity = 82.1%. In terms of AUC, compared to logistic regression, while random forest performed similarly, decision tree had a significantly lower discrimination ability (p-value<0.05) with AUC's equal to 85.0, 86.9, and 79.9, respectively. Machine learning provides the chance of having a rapid predictive model using non-invasive predictors to screen for hypertension. Future research should consider improving the predictive accuracy of models in larger general populations, including more important predictors and using a variety of algorithms.
Collections
- Public Health [435 items ]