A COMPARISON STUDY: EVALUATING SOME STATISTICAL AND AI TECHNIQUES FOR MEDICAL APPLICATION
الملخص
Analyzing medical data using artificial intelligence and statistical techniques may contribute to improving healthcare by helping to accurately identify important and irrelevant features in data collection and disease diagnosis. Enhancing the accuracy of collected data through these technologies helps to develop healthcare quality and deliver effective treatment. Python and R-Studio were utilized for medical data analysis, employing machine learning algorithms and statistical techniques for data classification and prediction. The machine learning algorithms included Decision Tree, Random Forest, Logistic Regression, Support Vector Machine, Naïve Bayes, and K-Nearest Neighbors. Traditional statistical techniques, such as Logistic Regression and Discriminant Analysis, have also been used to evaluate data accuracy and performance of predictive models. The simulation results showed that working with data containing structural issues such as missing data and imbalance between patient and non-patient classes arise. Algorithms such as Random Forest and K-Nearest Neighbors were able to help address these data issues. As the sample size increased, the accuracy of Logistic Regression and Random Forest improved significantly, indicating their ability to handle large datasets. On the other hand, using SMOTE with algorithms reduced accuracy but improved the understanding of rare classes in terms of precision and recall. Discriminant analysis revealed a similarity in the average of variables across classes, which reduced prediction accuracy due to height similarity, thus not providing clear insights. In contrast, Decision Tree algorithms offer better clarity in interpreting variables through the decision tree diagram. Random Forest is the best algorithm for classifying data with missing values and imbalanced medical data. While machine learning is superior in terms of medical data accuracy, statistical techniques remain essential for understanding data and making informed decisions based on precise trends and patterns analysis.
DOI/handle
http://hdl.handle.net/10576/62728المجموعات
- الرياضيات والإحصاء والفيزياء [35 items ]