Methods of Balanced Random Sets and Data Normalisation for Improvement of Classification Quality
Abstract
In many cases direct application of the standard classification models leads to poor quality of results. In this paper we consider two examples. The subject of the first example are popular imbalanced data «Credit» from the platform Kaggle. Standard function nnet (neural networks) in the program environment R is used as a classificator. This function is ignoring an important minority class. As a solution to this problem, we are proposing to consider a large number of relatively small and balanced subsets, where elements were selected randomly from the training set. The subject of the second example are famous data MNIST and standard function svm (support vector machine) in the environment Python. The necessity of normalisation of the original features is demonstrated.
References
[2] A. G. D'yakonov “Algoritmy dlya rekomendatel'noi sistemy: tekhnologiya LENKOR” [The algorithms for recommender systems: LENKOR technology], Business-Informatics, vol. 1, no. 19, pp. 32–39, 2012 (in Russian).
[3] V. N. Nikulin, S. A. Palesheva, D. S. Zubareva, “Ob odnorodnykh ansamblyakh pri ispol'zovanii metoda bustinga v prilozhenii k klassifikatsii nesbalansirovannykh dannykh” [On homogeneous ensembles using boosting method in the application to the classification of unbalanced data], Perm University Herald. Economy, no. 4, pp. 7–14, 2012 (in Russian).
[4] Y. Lu, H. Guo, and L. Feldkamp, “Robust neural learning from unbalanced data examples,” IEEE World Congress on Computational Intelligence, pp. 1816–1821, 1998; doi: 10.1109/IJCNN.1998.687133
[5] D. C. Cireşan, U. Meier, L. Gambardella, and J. Schmidhuber. ”Deep, Big, Simple Neural Nets for Handwritten Digit Recognition,” Neural Computation, vol. 22, no. 12, pp. 3207‒3220, 2010; doi: 10.1162/NECO_a_00052
[6] V. Nikulin, A. Bakharia, and T.-H. Huang, “On the Evaluation of the Homogeneous Ensembles with CV-passports,” in Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013 Workshops, LNCS 7867, J. Li et al. eds., Springer, 2013, pp. 109–120.
This work is licensed under a Creative Commons Attribution 4.0 International License.