Uncertain training data set conceptual reduction: A machine learning perspective

Rezk, Eman; Babi, Syrinne; Islam, Fahad; Jaoua, Ali

Date

2016

Author

Rezk, Eman
Babi, Syrinne
Islam, Fahad
Jaoua, Ali

Metadata

Show full item record

Abstract

Knowledge discovery from data is a challenging problem that has significant importance in many different fields such as biology, economics and social sciences. Real-world data is incomplete and ambiguous; moreover, its rapid increase in size complicates the analysis process. Therefore, data reduction techniques that consider data uncertainty are highly required. In this paper, our objective is to conceptually reduce uncertain data without losing information. Two reduction methods are proposed that are mainly rooted in formal concept analysis theory. The first method is targeting approximate data reduction; it uses the result of Baixeries et al. for detecting functional dependencies by transforming an instance of a database into an approximate formal context. The second method is based on fuzzy data reduction that employs the algorithm of Elloumi et al. in fuzzy data reduction using Lukasiewicz logic. These reduction methods have been compared to three other machine learning based reduction algorithms through a classification case study of breast cancer data. Classification accuracy, root mean square error and reduced data size have been reported to show that reduced training sets using our methods result in very accurate classifiers with minimal data size. Moreover, the reduced data has the advantage of decreasing communication time and memory space.

DOI/handle

http://dx.doi.org/10.1109/FUZZ-IEEE.2016.7737914
http://hdl.handle.net/10576/18262

Collections

Computer Science & Engineering [‎2484‎ items ]