Methods of Balanced Random Sets and Data Normalisation for Improvement of Classification Quality

Владимир Николаевич Никулин; Илья Сергеевич Канищев; Иван Владимирович Багаев

Владимир Николаевич Никулин Vyatka State University, Kirov, Russia
Илья Сергеевич Канищев Vyatka State University, Kirov, Russia
Иван Владимирович Багаев Vyatka State University, Kirov, Russia

Keywords: machine learning, data mining, neural networks, homogeneous ensemble, imbalanced data, patterns recognition, support vector machine

Abstract

In many cases direct application of the standard classification models leads to poor quality of results. In this paper we consider two examples. The subject of the first example are popular imbalanced data «Credit» from the platform Kaggle. Standard function nnet (neural networks) in the program environment R is used as a classificator. This function is ignoring an important minority class. As a solution to this problem, we are proposing to consider a large number of relatively small and balanced subsets, where elements were selected randomly from the training set. The subject of the second example are famous data MNIST and standard function svm (support vector machine) in the environment Python. The necessity of normalisation of the original features is demonstrated.

Author Biographies

Владимир Николаевич Никулин, Vyatka State University, Kirov, Russia

Vladimir N. Nikulin: PhD, Associate Professor in Computer Science, Department of Mathematical Methods, Vyatka State University

Илья Сергеевич Канищев, Vyatka State University, Kirov, Russia

Ilya S. Kanishchev

Иван Владимирович Багаев, Vyatka State University, Kirov, Russia

Ivan V. Bagaev

References

[1] A. Maytarattanakhon and I. A. Posov, “Avtomatizatsiya provedeniya distantsionnykh sorevnovanii, osnovannykh na issledovatel'skikh syuzhetakh po matematike i informatike” [Automation of distance contests based on research problems in mathematics and informatics], Computer tools in education, no. 6, pp. 45–51, 2014 (in Russian).
[2] A. G. D'yakonov “Algoritmy dlya rekomendatel'noi sistemy: tekhnologiya LENKOR” [The algorithms for recommender systems: LENKOR technology], Business-Informatics, vol. 1, no. 19, pp. 32–39, 2012 (in Russian).
[3] V. N. Nikulin, S. A. Palesheva, D. S. Zubareva, “Ob odnorodnykh ansamblyakh pri ispol'zovanii metoda bustinga v prilozhenii k klassifikatsii nesbalansirovannykh dannykh” [On homogeneous ensembles using boosting method in the application to the classification of unbalanced data], Perm University Herald. Economy, no. 4, pp. 7–14, 2012 (in Russian).
[4] Y. Lu, H. Guo, and L. Feldkamp, “Robust neural learning from unbalanced data examples,” IEEE World Congress on Computational Intelligence, pp. 1816–1821, 1998; doi: 10.1109/IJCNN.1998.687133
[5] D. C. Cireşan, U. Meier, L. Gambardella, and J. Schmidhuber. ”Deep, Big, Simple Neural Nets for Handwritten Digit Recognition,” Neural Computation, vol. 22, no. 12, pp. 3207‒3220, 2010; doi: 10.1162/NECO_a_00052
[6] V. Nikulin, A. Bakharia, and T.-H. Huang, “On the Evaluation of the Homogeneous Ensembles with CV-passports,” in Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013 Workshops, LNCS 7867, J. Li et al. eds., Springer, 2013, pp. 109–120.