MULTI-TASK LEARNING MODEL FOR MOBILE MALWARE DETECTION AND CLASSIFICATION
Abstract
The rapid growth of mobile devices, especially those running the Android operating system, has made them attractive targets for cybercriminals. The increasing sophistication of mobile malware, including zero-day threats, challenges traditional signature-based detection methods, which struggle to identify newand evolving malware families. To address these limitations, this thesis proposes a multi-task learning (MTL) model capable of simultaneously performing binary classification (malware detection) and multi-class classification (malware family identification) by utilizing shared representations across tasks. Systematic Literature Review (SLR) was conducted to assess the current landscape of MTL applications in cybersecurity. While MTL has shown promise in other areas such as network intrusion detection, a significant research gap was identified in its application to mobile malware detection. This thesis aims to bridge that gap by developing an MTL model that improves both malware detection and classification performance, contributing to advancements in mobile security. The proposedMTLmodelwas trained and evaluated on the CCCS-CIC-AndMal- 2020 dataset, which contains API-based static features of Android applications. To enhance computational efficiency, Principal Component Analysis (PCA) was employed for feature reduction, and class imbalance was mitigated using a weighted loss function. Hyperparameter tuning with Optuna further optimized key parameters, including layer configurations, learning rate, and loss weights, ensuring robust model performance. Experimental results demonstrate that the MTL model outperforms Single-Task Learning models in both malware detection and malware family classification. The model achieved 97% accuracy in detecting malware and 91% accuracy in identifying malware families, demonstrating superior generalization across different malware types. The weighted loss function improved the detection of minority classes, addressing class imbalance challenges, while hyperparameter tuning resulted in reduced validation loss and improved stability. This research contributes to the field of mobile malware detection by introducing an MTL-based model that addresses the shortcomings of STL models. The findings indicate that MTL, combined with feature selection and optimized hyperparameters, provides a scalable and effective solution for improving the accuracy and robustness of malware detection systems. Future work could explore integrating dynamic analysis features and deploying the model for real-time malware detection.
DOI/handle
http://hdl.handle.net/10576/62825Collections
- Computing [103 items ]