Application of Transformers and Machine Learning Methods in the System of Recommendations of Academic Supervisor
Abstract
This paper proposes a recommendation system for academic supervisor selection based on transformer architecture and modern machine learning methods. The system analyzes a student’s academic data, including courses studied and grades for them, as well as the professional characteristics of teachers. The results of the experiment demonstrate a significant superiority of the proposed approach over traditional methods: during testing, the achieved accuracy of recommendations was 0.3230 versus 0.1106 (method based on the frequency of positive grades) and 0.1637 (approach using classification through machine learning). The obtained results confirm the effectiveness of the system in optimizing the process of selecting a supervisor, which helps to improve the quality of students’ research activities through a personalized comparison of competencies.
References
J. L. Herlocker et al., “Evaluating collaborative filtering recommender systems,” ACM Transactions on Information Systems, vol. 22, no. 1, pp. 5–53, 2004, doi: 10.1145/963770.963772
M. J. Pazzani and D. Billsus, “Content-Based Recommendation Systems,” in The Adaptive Web, Berlin: Springer Berlin Heidelberg, pp. 325–341, 2007; doi:10.1007/978-3-540-72079-9_10
X. N. Lam et al., “Addressing cold-start problem in recommendation systems,” in Proc. of the 2nd Int. Conf. on Ubiquitous Information Management and Communication, pp. 208–211, 2008, doi:10.1145/1352793.1352837
X. He et al., “Neural Collaborative Filtering,” in Proc. of the 26th International Conference on World WideWeb, pp. 173–182, 2017; doi:10.1145/3038912.3052569
W.-C. Kang and J. McAuley, “Self-Attentive Sequential Recommendation,” in Proc. of the 2018 IEEE Int. Conf. on Data Mining (ICDM), pp. 197–206, 2018, doi:10.1109/icdm.2018.00035
A. Vaswani et al., “Attention is all you need,” Advances in neural information processing systems, vol. 31, pp. 1–15, 2017.
Z. Zhao et al., “Recommender Systems in the Era of Large Language Models (LLMs),” IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 11, pp. 6889–6907, 2024; doi:10.1109/tkde.2024.3392335
W. X. Zhao et al., “A survey of large language models,” 2023, arXiv:2303.18223v16 [cs.CL].
V. R. Veerannagari, Developing A Professor Recommendation System, Ames, Io, USA: Iowa State University, 2024.
H. I. Pohan et al., “Recommender System Using Transformer Model: A Systematic Literature Review,” in Proc. of 2022 1st International Conference on Information System & Information Technology (ICISIT), pp. 376–381, 2022, doi:10.1109/icisit54091.2022.9873070
Sun F. et al, “BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer,” in Proc. of the 28th ACM international conference on information and knowledge management, pp. 1441–1450, 2019; doi:10.1145/3357384.3357895.
W.-C. Kang and J. McAuley, “Self-Attentive Sequential Recommendation,” in Proc. of the 2018 IEEE International Conference on Data Mining (ICDM), pp. 197–206, 2018; doi:10.1109/icdm.2018.00035
L. Breiman et al., Classification and Regression Trees, Belmont, CA, USA:Wadsworth Int. 1984.
L. Prokhorenkova et al., “CatBoost: unbiased boosting with categorical features,” in Advances in neural information processing systems, vol. 31, pp. 1–11, 2018.
A. P. Bradley, “The use of the area under the ROC curve in the evaluation of machine learning algorithms,” Pattern Recognition, vol. 30, no. 7, pp. 1145–1159, 1997; doi:10.1016/s0031-3203(96)00142-2
V. Solovyev et al., “A BERT-Based Classification Model: The Case of Russian Fairy Tales,” Journal of Language and Education, vol. 10, no. 4, pp. 98–111, 2024; doi:10.17323/jle.2024.24030
N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” in Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992, 2019; doi:10.18653/v1/d19-1410
A. Snegirev et al., “The Russian-focused embedders’ exploration: ruMTEB benchmark and Russian embedding model design,” in Proc. of the 2025 Conf. of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 236–254, 2025; doi:10.18653/v1/2025.naacl-long.12
Wang L. et al., “Multilingual e5 text embeddings: A technical report,” 2024, arXiv:2402.05672v1 [cs.CL].
J. Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186, 2019; doi:10.18653/v1/N19-1423
A. Paszke et al., “Pytorch: An imperative style, high-performance deep learning library,” 2019, arXiv:1912.01703v1 [cs.LG].
D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),”2016, arXiv:1606.08415v5 [cs.LG].
Hinton G. E. et al., “Improving neural networks by preventing co-adaptation of feature detectors,” 2012, arXiv:1207.0580v1 [cs.NE].
J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” 2016, arXiv:1607.06450v1 [stat.ML].
B. Barz and J. Denzler, “Deep Learning on Small Datasets without Pre-Training using Cosine Loss,” in Proc. of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020; doi:10.1109/wacv45572.2020.9093286
F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” Journal of machine Learning research, vol. 12, pp. 2825–2830, 2011.
M. D. Zeiler, “Adadelta: an adaptive learning rate method,” 2012, arXiv:1212.5701v1 [cs.LG].
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” 2017, arXiv:1711.05101v3 [cs.LG].
This work is licensed under a Creative Commons Attribution 4.0 International License.