CD Skripsi
Komparasi Teknik Resampling Dan Hybrid Resampling Untuk Penanganan Klasifikasi Kelas Data Tidak Seimbang
ABSTRACT
Class imbalance is a common problem in classification analysis, where one class of data is more abundant than the other. This study compares the performance of unbalanced data class handling techniques to overcome classification problems in three data types: simulation results from data, Germany credit card data, and housing and environmental health indicator data. The methods used to solve classification problems in unbalanced data classes are resampling techniques, namely SMOTE & Tomek Link, and hybrid resampling, SMOTE-Tomek. Based on the results of the application of the resampling technique, the results obtained that generated data that has been obtained by generating through simulation with the assumption of the normal distribution can classify data well without the resampling process with accuracy, precision, recall, F1 Score, AUC, and G-Mean values of 0.8994, 0.8999, 0.9992, 0.9470, 0.5021, and 0.0258, respectively. On Germany credit card data, the best classification results were achieved after the data was balanced using the SMOTE technique with an accuracy matrix value of 0.9016. In the housing and environmental health indicator data, the best classification results were achieved after the data was balanced using the SMOTE technique with an accuracy matrix value of 0.9998. Therefore, on the secondary data in this study, it can be concluded that the resampling technique has better performance than the hybrid resampling technique.
Keywords: Resampling technique, hybrid resampling, classification, unbalanced class data.
Tidak tersedia versi lain