Uncertain training data set conceptual reduction: A machine learning perspective
Abstract
Knowledge discovery from data is a challenging problem that has significant importance in many different fields such as biology, economics and social sciences. Real-world data is incomplete and ambiguous; moreover, its rapid increase in size complicates the analysis process. Therefore, data reduction techniques that consider data uncertainty are highly required. In this paper, our objective is to conceptually reduce uncertain data without losing information. Two reduction methods are proposed that are mainly rooted in formal concept analysis theory. The first method is targeting approximate data reduction; it uses the result of Baixeries et al. for detecting functional dependencies by transforming an instance of a database into an approximate formal context. The second method is based on fuzzy data reduction that employs the algorithm of Elloumi et al. in fuzzy data reduction using Lukasiewicz logic. These reduction methods have been compared to three other machine learning based reduction algorithms through a classification case study of breast cancer data. Classification accuracy, root mean square error and reduced data size have been reported to show that reduced training sets using our methods result in very accurate classifiers with minimal data size. Moreover, the reduced data has the advantage of decreasing communication time and memory space.
Collections
- Computer Science & Engineering [2402 items ]