Numerical experiment of computational capabilities of modern chat-bots in solving problems in mathematical analysis and computational mathematics
Abstract
The paper describes a numerical experiment on calculation of mathematical problems by chatbots (Yandex GPT 2, ChatGPT 3.5, Gemini, Copilot) on some topics of mathematical analysis (limits, derivatives, integrals), including 693 problems, and computational mathematics (solution of nonlinear equations, solution of systems of linear equations, interpolation of functions, numerical integration), consisting of 45 problems. The main characteristics of modern virtual assistants are considered. A review of research on the application of artificial intelligence in solving mathematical problems on various tests and data sets is presented. The paper considers the shortcomings manifested in the work of chatbots, analyzes their performance on specific data sets. A comparative analysis of the number of correctly solved problems in the considered systems is carried out. The main problems that can be encountered when solving computational mathematics problems in detail in each of the chatbots are discussed. This study may be of practical interest for researchers, developers, teachers and users who use these virtual assistants in their work. The conducted experiment will allow to better evaluate the effectiveness of the application of the considered systems in the field of mathematics.
References
M. T. Zemcık, “A Brief History of Chatbots,” DEStech Transactions on Computer Science and Engineering, pp. 1–19, 2019; doi:10.12783/dtcse/aicae2019/31439
Yandex LLC, “Alisa: Intelligent personal assistant,” in yandex.ru, 2024. [Online] (in Russian). Available: https://yandex.ru/yandexapp/ru/voiceassistant/yagpt/davay_pridumayem/3. OpenAI, Inc., “ChatGPT: Generative artificial intelligence chatbot,” in chat.openai.com, 2024. [Online] (in Russian). Available: https://chat.openai.com/
Google LLC, “Gemini: Generative artificial intelligence chatbot,” in gemini.google.com, 2024. [Online] (in Russian).Available: https://gemini.google.com/
Microsoft Corp., “Copilot in Microsoft Bing,” in copilot.microsoft.com, 2024. [Online]. Available: https://copilot.microsoft.com/chats/
Yandex LLC, “How Yandex applied generative neural networks to search for answers,” in habr.com, 2024. [Online] (in Russian). Available:https://habr.com/ru/companies/yandex/articles/561924/
OpenAI, Inc., “What is ChatGPT?,” in help.openai.com, 2024. [Online]. Available: https://help.openai.com/en/articles/6783457-what-is-chatgpt
OpenAI, Inc., “How ChatGPT and our language models are developed,” in help.openai.com. [Online]. Available:https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-language-models-are-developed
E. Collins and Z. Ghahramani, “LaMDA: our breakthrough conversation technology”, in blog.google, 18 May 2021. [Online]. Available: https://blog.google/technology/ai/lamda/
R. Thoppilan et al., “Lamda: Language models for dialog applications,” in blog.google, 2022. [Online]. Available: https://arxiv.org/abs/2201.08239
Y. Mehdi, “Reinventing search with a new AI-powered Microsoft Bing and Edge, your copilot for the web,” The Official Microsoft Blog, 07 Feb. 2023. [Online]. Available: https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/
Microsoft Corp., “What is Bing Chat, and How Can You Use It?,” in microsoft.com, 29 Sep. 2023. [Online]. Available: https://www.microsoft.com/en-us/bing/do-more-with-ai/what-is-bing-chat-and-how-can-you-use-it?form=MA13KP
А. I. Drozdov, “Primenenie nejronnyh setej v zadachah matematicheskogo analiza” [Application of neural networks in calculus], in Komp’yuternye sistemy i seti : sbornik statej 59-j nauchnoj konferencii aspirantov, magistrantov i studentov, Minsk, Belarus, pp. 473–479, 2023 (in Russian).
P. Shakarian, et al. “An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP),” in arXiv, 2023. [Online]. Available: https://arxiv.org/abs/2302.13814
D. Novak, “Analyzing the GPT-3 AI’s Ability to Predict the Answer to Algebraical Questions,” Journal of Student Research, vol. 12, no. 1, pp. 1–8, 2023; doi:10.47611/jsrhs.v12i1.3998
V. Plevris, G. Papazafeiropoulos, and A. Jimeˊ nez Rios, “Chatbots Put to the Test in Math and Logic Problems: A Comparison and Assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard,” AI, vol. 4, no. 4, pp. 949–969, 2023; doi:10.3390/ai4040048
P. P. Van Long, D. A. Vu, N. M. Hoang, X. L. Do, and A. T. Luu, “ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions,” in arXiv, 2023. [Online]. Available: https://arxiv.org/abs/2312.01661
S. Frieder et al., “Mathematical Capabilities of ChatGPT,” in arXiv, 2023. [Online]. Available: https://arxiv.org/abs/2301.13867
X.Q. Dao and N.B. Le, “Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination,” in arXiv, 2023. [Online]. Available: https://arxiv.org/abs/2306.06331
E. Davis and S. Aaronson, “Testing GPT-4 withWolfram Alpha and Code Interpreter plug-ins on math and science problems,” in arXiv, 2023. [Online]. Available: https://arxiv.org/abs/2308.05713
L. A. Kuznetsov, Sbornik zadaniy po vysshey matematike (tipovye raschety) [Collection of tasks in higher mathematics (typical calculations)], Moscow, Russia: Vysshaya Shkola, 1994 (in Russian).
A. V. Zenkov, Vychislitel’naya matematika dlya IT-spetsial’nostey: uchebnoe posobie [Computational mathematics for IT specialties: a textbook], Moscow, Vologda, Russia: Infra-Inzheneriya, 2022 (in Russian).
V. E. Zaliznyak, G. I. Shchepanovskaya, Teoriya i praktika po vychislitel’noy matematike: ucheb. posobie [Theory and practice in computational mathematics: a textbook], Krasnoyarsk, Russia: Siberian Federal University, 2012 (in Russian).
B. Gates, “The Age of AI has begun,” in www.gatesnotes.com, 21 Mar. 2023. [Online]. Available: https://www.gatesnotes.com/The-Age-of-AI-Has-Begun
P. Villalobos, J. Sevilla, L. Heim, et al., “Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning,” in arXiv, 2022. [Online]. Available: https://arxiv.org/abs/2211.04325
This work is licensed under a Creative Commons Attribution 4.0 International License.