A Novel Class Noise Detection Method for High-Dimensional Data in Industrial Informatics
Date
2021-03-01Author
Guan, DonghaiChen, Kai
Han, Guangjie
Huang, Shuqiang
Yuan, Weiwei
Guizani, Mohsen
Shu, Lei
...show more authors ...show less authors
Metadata
Show full item recordAbstract
The data in industrial informatics may be high-dimensional and mislabeled. Irrelevant or noisy features pose a significant challenge to the detection of high-dimensional mislabeling. The traditional method usually adopts a two-step solution, first finding the relevant subspace and then using it for mislabeling detection. This two-step method struggles to provide the optimal mislabeling detection performance, since it separates the procedures of feature selection and label error detection. To solve this problem, in this article, we integrate the two steps and propose a sequential ensemble noise filter (SENF). In the SENF, relevant features are selected and used to generate a noise score for each instance. Continuously, these noise scores guide feature selection in the regression learning. Thus, the SENF falls in the scope of sequential ensemble learning. We evaluate our approach on several benchmark datasets with high dimensionality and much label noise. It is shown that the SENF is significantly better than other existing label noise detection methods.
Collections
- Computer Science & Engineering [2402 items ]