Models for automatic generation of educational tasks: a comparative analysis

  • Dmitry Butenko Saint Petersburg State Electrotechnical University “LETI”, Professora Popova ul. 5, building 3, St. Petersburg, 197022, Russian Federation
Keywords: automatic task generation, large language models, code generation, academic dishonesty, automated generation models

Abstract

The article presents a comparative analysis of models for the automatic generation of assessment tasks for university courses in the context of a growing mismatch between student and instructor numbers and an increase in cases of academic dishonesty.
The aim of the study is to compare existing generation models according to three criteria — task variability, effort required for implementation, and explainability of results — in order to reduce instructors’ workload while preserving the quality of the educational process.
The methodology includes an analysis of publications from 2020–2025 and a classification of models into template-based, grammar-based, statistical, graph-based, recurrent neural networks, evolutionary algorithms, and large language models (LLMs).
Key findings: LLMs outperform alternative approaches in the diversity of generated content and computational efficiency when pre-trained models are used. Template-based and grammar-based systems are constrained by low variability, evolutionary algorithms require significantly more time, and recurrent networks are inferior in maintaining semantic coherence. Critical drawbacks of LLMs are limited explainability and a tendency to hallucinate, which necessitates mandatory expert oversight of outputs.
The work has practical relevance for developers of educational systems and for instructors seeking to scale instruction while retaining pedagogical control.

Author Biography

Dmitry Butenko, Saint Petersburg State Electrotechnical University “LETI”, Professora Popova ul. 5, building 3, St. Petersburg, 197022, Russian Federation

Assistant Lecturer at Department of Software Engineering and Computer Applications, ETU “LETI”

References

UNESCO, Higher education: figures at a glance. Paris, France: UNESCO, 2025. [Online]. Available: https://unesdoc. unesco.org/ark:/48223/pf0000394112

W. M. To and B. T. Yu, “Rise in higher education researchers and academic publications,” Emerald Open Research, vol. 1, no. 3, pp. 1–15, 2023; doi:10.1108/eor-03-2023-0008

Ministry of Science and Higher Education of the Russian Federation, “Report on the implementation of state policy in the field of higher education and corresponding additional professional education,” 2024 (in Russian).

R. Yavich and N. Davidovitch, “Plagiarism among higher education students,” Education Sciences, vol. 14, no. 8, p. 908, 2024; doi:10.3390/educsci14080908

P. M. Newton and K. Essex, “How Common is Cheating in Online Exams and did it Increase During the COVID-19 Pandemic? A Systematic Review, ”Journal of Academic Ethics, vol. 22, no. 2, pp. 323–343, 2023; doi:10.1007/s10805- 023-09485-5

OECD, “TALIS 2018 Results (Volume I): Teachers and School Leaders as Lifelong Learners,” TALIS, OECD Publishi- ng, 2019; doi:10.1787/1d0bc92a-en

R. Denkin, “On perception of prevalence of cheating and usage of generative AI,” 2024. [Online]. Available: https: //arxiv.org/abs/2405.18889

M. Hulme, G. Beauchamp, J. Wood, and C. Bignell, “Teacher workload research report 2024,” University of the West of Scotland, Paisley, Scotland, 2024.

G. Kurdi, J. Leo, B. Parsia, U. Sattler, and S. Al-Emari, “A systematic review of automatic question generation for educational purposes,” International Journal of Artificial Intelligence in Education, vol. 30, no. 1, pp. 121–204, 2020; doi:10.1007/s40593-019-00186-y

R. Weegar and P. Idestam-Almquist, “Reducing workload in short answer grading using machine learning,” International Journal of Artificial Intelligence in Education, vol. 34, no. 2, pp. 247–273, 2024; doi:10.1007/s40593- 022-00322-1

A. Gobrecht et al., “Beyond human subjectivity and error: a novel AI grading system,” 2024. [Online]. Available:

https://arxiv.org/abs/2405.04323

A. Formica, I. Mele, and F. Taglino, “A template-based approach for question answering over knowledge bases,” Knowledge and Information Systems, vol. 66, no. 1, pp. 453–479, 2024; doi:10.1007/s10115-023-01966-8

M. A. Maslova, “Review of existing methods for automatic generation of test tasks in natural language,” Com- putational Nanotechnology, vol. 10, no. 4, pp. 46–55, 2023 (in Russian); doi:10.33693/2313-223X-2023-10-4-46-55

L. Yun et al., “The Price of Format: Diversity Collapse in LLMs,” 2025. [Online]. Available: https://arxiv.org/abs/ 2505.18949

J. K. Pugh, L. B. Soros, and K. O. Stanley, “Quality diversity: A new frontier for evolutionary computation,” Fronti- ers in Robotics and AI, vol. 3, no. 40, pp. 1–17, 2016; doi:10.3389/frobt.2016.00040

D. Wright et al., “Epistemic Diversity and Knowledge Collapse in Large Language Models,” 2025. [Online]. Avai- lable: https://arxiv.org/abs/2510.04226

T. Speith, B. Crook, S. Mann, A. Schomacker, and M. Langer, “Conceptualizing understanding in explainable artificial intelligence (XAI): an abilities-based approach,” Ethics and Information Technology, vol. 26, no. 40, pp. 1–15, 2024; doi:10.1007/s10676-024-09769-3

F. Sovrano and F. Vitali, “An objective metric for Explainable AI: How and why to estimate the degree of explainability,” Knowledge-Based Systems, vol. 278, p. 110866, 2023; doi:10.1016/j.knosys.2023.110866

D. N. Biryukov and A. S. Dudkin, “Explainability and interpretability—important aspects of the safety of decisions madebyintelligentsystems(review),” Scientific and Technical Journal of Information Technologies, Mechanics and Optics, vol. 25, no. 3, pp. 373–386, 2025 (in Russian); doi:10.17586/2226-1494-2025-25-3-373-386

H. Ye, T. Liu, A. Zhang, W. Hua, and W. Jia, “Cognitive mirage: Are view of hallucination s in large language models,” arXiv preprint, arXiv:2309.06794, 2023. [Online]. Available: https://arxiv.org/abs/2309.06794

A. Saha, B. Gupta, A. Chatterjee, and K. Banerjee, “You believe your LLM is not delusional? Think again! a study of LLM hallucination on foundation models under perturbation,” Discover Data, vol. 3, no. 1, p. 20, 2025; doi:10.1007/s44248-025-00029-1

Z. Zhang et. al., “Llm hallucinations in practical code generation: Phenomena, mechanism, and mitigation,” Proceedings of the ACM on Software Engineering, vol. 2, no. ISSTA, pp. 481–503, 2025; doi:10.1145/3728894

E. Cambria et al., “Xai meets llms: A survey of the relation between explainable ai and large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2407.15248

Scientific electronic library ’CyberLeninka’, 2025. [Online]. Available: https://cyberleninka.ru/

Google Scholar, 2025. [Online]. Available: https://scholar.google.com/

R. Circi, J. Hicks, and E. Sikali, “Automatic item generation: foundations and machine learning-based approaches for assessments,” Frontiers in Education, vol. 8, p. 858273, 2023; doi:10.3389/feduc.2023.858273

V. V. Kruchinin and V. V. Kuzovkin, “Review of existing methods for automatic generation of problems with condi- tions in natural language,” Computer Tools in Education, no. 1, pp. 85–96, 2022 (in Russian); doi:10.32603/2071- 2340-2022-1-85-96

N. Willert and J. Thiemann, “Template-Based Generator for Single-Choice Questions,” Tech Know Learn, vol. 29, pp. 355–370, 2024; doi:10.1007/s10758-023-09659-5

A. N. Shvetsov and A. P. Sergusheva, “Experience in applying the method of automatic test task generation,” Open Education, no. 4, pp. 318–333, 2017 (in Russian).

H. Park et al., “SAGE: Specification-Aware Grammar Extraction for Automated Test Case Generation with LLMs,” 2025. [Online]. Available: https://arxiv.org/abs/2506.11081

P. M. Maurer, “Generating test data with enhanced context-free grammars,” IEEE Software, vol. 7, no. 4, pp. 50–55, 1990; doi:10.1109/52.56422

DeepMind, “code_contests,” Hugging Face Datasets. [Online]. Available: https://huggingface.co/datasets/ deepmind/code_contests

C. Hansen et al., “Sequence modelling for analysing student interaction with educational systems,” arXiv preprint, 2017. [Online]. Available: https://arxiv.org/abs/1708.04164

M. Lippi et al., “Natural language statistical features of LSTM-generated texts,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3326–3337, 2019; doi:10.1109/tnnls.2019.2890970

L. Galke et al., “Are we really making much progress in text classification? A comparative review,” 2022. [Online]. Available: https://arxiv.org/abs/2204.03954

M. Bugueno and G. De Melo, “Connecting the Dots: What Graph-Based Text Representations Work Best for Text ˜ Classification using Graph Neural Networks?,” 2023. [Online]. Available: https://arxiv.org/abs/2305.14578

S. Kuntur et al., “Comparative Analysis of Graph Neural Networks and Transformers for Robust Fake News Detection: A Verification and Reimplementation Study,” Electronics, vol. 13, p. 4784, 2024; doi:10.3390/ electroni- cs13234784

J. Protopopova and S. Kulik, “Educational intelligent system using genetic algorithm,” Procedia Computer Science, vol. 169, pp. 168-172, 2020; doi:10.1016/j.procs.2020.02.130

L. L. Custode et al., “Comparing large language models and grammatical evolution for code generation,” in Proc. of the Genetic and Evolutionary Computation Conference Companion, pp. 1830–1837, 2024; doi:10.1145/ 3638530.3664162

D. Sobania, M. Briesch, and F. Rothlauf, “Choose your programming copilot: a comparison of the program synthesis performance of github copilot and genetic programming,” in Proc. of the Genetic and Evolutionary Computation Conference, pp. 1019–1027, 2022; doi:10.1145/3512290.3528700

J. Xing, “Comparative and Performance Analysis of Different Deep Learning Models in Text Generation,” Applied and Computational Engineering, vol. 154, pp. 212–218, 2025; doi:10.54254/2755-2721/2025.tj23212

R. Meissner et al., “LLM-generated competence-based e-assessment items for higher education mathematics: Methodology and evaluation,” Frontiers in Education, vol. 9, 2024; doi:10.3389/feduc.2024.1427502

N. Scaria, S. Dharani Chenna, and D. Subramani, “Automated educational question generation at different bloom’s skill levels using large language models: Strategies and evaluation,” in International Conference on Artificial Intelligence in Education, pp. 165–179, 2024; doi:10.1007/978-3-031-64299-9_12

S. Pasch, “AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals,” 2025. [Online]. Available: https://arxiv.org/abs/2505.15365

S. Saadaoui and E. Alonso, “Coordinated LLM Multi-Agent Systems for Collaborative Question-Answer Generati- on,” Knowledge-Based Systems, vol. 330, p. 114627, 2025; doi:10.1016/j.knosys.2025.114627

Published
2026-03-31
How to Cite
Butenko, D. (2026). Models for automatic generation of educational tasks: a comparative analysis. Computer Tools in Education, (1), 74-90. https://doi.org/10.32603/2071-2340-2026-1-74-90
Section
Artificial intelligence and machine learning