Research Project as a Tool for Teaching Text Analysis Methods: Predicting the Post Class in the Social Network

  • Алена Владимировна Суворова St. Petersburg Institute for Informatics and Automation of the RAS, Saint Petersburg, Russia
  • Карина Руслановна Смирнова North-West Institute of Management, branch of RANEPA, Saint Petersburg, Russia
  • Евгений Александрович Будин North-West Institute of Management, branch of RANEPA, Saint Petersburg, Russia
  • Татьяна Валентиновна Тулупьева St. Petersburg Institute for Informatics and Automation of the RAS, Saint Petersburg, Russia
  • Александр Львович Тулупьев St. Petersburg Institute for Informatics and Automation of the RAS, Saint Petersburg, Russia
  • Максим Викторович Абрамов St. Petersburg Institute for Informatics and Automation of the RAS, Saint Petersburg, Russia
Keywords: проблемно-ориентированное обучение, социальные сети, машинное обучение, анализ текста, классификация, автоматизация исследований, язык R

Abstract

The article describes a student research project on predicting the class of a post on a social network based on its textual content. The features of the project are discussed as an integral part of the trajectory of teaching data analysis methods, including text analysis methods and tools that are often not included in machine learning courses. The formulation of the problem, the stages of its solution, the sequence of considering new methods as a way for solving students’ problems, as well as the used tool of the R environment are described. The possibilities of expanding the task and its modifications depending on the level of training of students are given.

Author Biographies

Алена Владимировна Суворова, St. Petersburg Institute for Informatics and Automation of the RAS, Saint Petersburg, Russia

Alena V. Suvorova: PhD, Senior Researcher, Theoretical and Interdiciplinary Computer Science Laboratory, SPIIRAS; Associate Professor, HSE University;199178, Russia, St. Petersburg,14-th Line VO, 39, suvalv@gmail.com

Карина Руслановна Смирнова, North-West Institute of Management, branch of RANEPA, Saint Petersburg, Russia

Karina R. Smirnova: student, NWIM RANEPA, Smirnova.KR@mail.ru

Евгений Александрович Будин, North-West Institute of Management, branch of RANEPA, Saint Petersburg, Russia

Evgeniy A. Budin: student, NWIM RANEPA, moyapochta456@gmail.com

Татьяна Валентиновна Тулупьева, St. Petersburg Institute for Informatics and Automation of the RAS, Saint Petersburg, Russia

Tatiana V. Tulupyeva: PhD, Associate Professor, Senior Researcher, Theoretical and Interdiciplinary Computer Science Laboratory. SPIIRAS; Associate Professor, NWIM RANEPA; Associate Professor, Computer Science Department, SPSU, tvt100a@mail.ru

Александр Львович Тулупьев, St. Petersburg Institute for Informatics and Automation of the RAS, Saint Petersburg, Russia

Alexander L. Tulupyev: PhD, Dc. Sci., Associate Professor, Leading Researcher, Theoretical and Interdiciplinary Computer Science Laboratory, SPIIRAS; Professor, Computer Science Department, SPSU, alt@iias.spb.ru

Максим Викторович Абрамов, St. Petersburg Institute for Informatics and Automation of the RAS, Saint Petersburg, Russia

Maxim V. Abramov: PhD, Researcher, Theoretical and Interdiciplinary Computer Science Laboratory, SPIIRAS; Senior Lecturer, Computer Science Department, SPSU, mva16@list.ru

References

1. Абрамов М. В. Автоматизация анализа социальных сетей для оценивания защищённости от
социоинженерных атак // Автоматизация процессов управления. 2018. № 1(51). С. 34–40.
2. Азаров А. А., Тулупьева Т. В., Суворова А. В., Тулупьев А. Л., Абрамов М. В., Юсупов Р. М. Социоинженерные атаки. Проблемы анализа. Наука, 2016. 352 с.
3. Батура Т. В. Методы автоматической классификации текстов // Программные продукты и системы. 2017. Т. 30. № 1. doi: 10.15827/0236- 235X.030.1.085-099
4. Бордовская Н. В., Тулупьева Т. В., Тулупьев А. Л., Азаров А. А. Возможности электронной социальной сети в решении профессиональных задач вузовского преподавателя // Психологическая наука и образование. 2016. Т. 21. № 4. C. 32–39. doi: 10.17759/pse.2016210403
5. Мухин А. М., Чернышев Г. А. MiniValgrind: простой детектор утечек памяти //Компьютерные
инструменты в образовании. 2017. № 2. С. 5–15.
6. Осипова Ю. А., Лавров Д. Н. Применение кластерного анализа методом k-средних для классификации текстов научной направленности //Математические структуры и моделирование.
2017. № 3 (43). C. 108–121. doi: 10.25513/2222-8772.2017.3.108-121
7. Полячков А. А. Классификация слабоструктурированного текста малого размера // Журнал
научных и прикладных исследований. 2015. № 5. С. 124–125.
8. Смирнова О. С., Шишков В. В. Выбор топологии нейронных сетей и их применение для классификации коротких текстов // International Journal of Open Information Technologies. 2016. Т. 4.
№ 8. C. 50–54.
9. Тулупьева Т. В., Суворова А. В., Азаров А. А., Тулупьев А. Л., Бордовская Н. В. Возможности
и опыт применения компьютерных инструментов в анализе цифровых следов студентовпользователей социальной сети // Компьютерные инструменты в образовании. 2015. № 5.
C. 3–13.
10. Тулупьева Т. В., Тафинцева А. С., Тулупьев А. Л. Подход к анализу отражения особенностей личности в цифровых следах // Вестн. психотерапии. 2016. № 60 (65). С. 124–137.
11. Тулупьева Т. В., Тулупьев А. Л., Ющенко Н. А. Проявление ценностных ориентаций пользователей социальных сетей в контенте персональных страниц (на примере сети «ВКонтакте») //
Вестник психотерапии. 2014. № 52. С. 37–50.
12. Фомин В. В., Фомина И. К., Осочкин А. А. Классификация текстов на основе частотного и морфологического анализов с применением алгоритмов data-mining // Информатизация образования и науки. 2016. № 3. С. 137–152.
13. Abdallah A., Maarof M. A., Zainal A. Fraud detection system: A survey // Journal of Network and
Computer Applications. 2016. Vol. 68. P. 90–113. doi: 10.1016/j.jnca.2016.04.007
14. Barnett J., Lodder J., Pengelley D., Pivkina I., Ranjan D. Designing student projects for teaching
and learning discrete mathematics and computer science via primary historical sources // Recent
developments on introducing a historical dimension in mathematics education. 2011. Vol. 78. P. 189–
201. doi: 10.5948/UPO9781614443001.018
15. Blei D. M., Ng A. Y., Jordan M. I. Latent dirichlet allocation // Journal of machine learning research.
2003. Vol. 3. Jan. P. 993–1022.
16. Bonilla Y., Rosa J. # Ferguson: Digital protest, hashtag ethnography, and the racial politics of social
media in the United States //American Ethnologist. 2015. Vol. 42. № 1. P. 4–17. doi: 10.1111/amet.12112
17. Boulianne S. Social media use and participation: A meta-analysis of current research // Information,
Communication & Society. 2015. Vol. 18. № 5. P. 524–538. doi: 10.1080/1369118X.2015.1008542
18. Bulmer M., Haladyn J.K. Life on an Island: A simulated population to support student projects in
statistics // Technology Innovations in Statistics Education. 2011. Vol. 5. № 1.
19. Centola D., van de Rijt A. Choosing your network: Social preferences in an online health community
// Social science & medicine. 2015. Vol. 125. P. 19–31. doi: 10.1016/j.socscimed.2014.05.019
20. Chawla N. V., Japkowicz N., Kotcz A. Special issue on learning from imbalanced data sets // ACM Sigkdd
Explorations Newsletter. 2004. Vol. 6. № 1. P. 1–6. doi: 10.1145/1007730.1007733
21. Fellows I. wordcloud: Word Clouds. R package version 2.5. 2014. URL: https://CRAN.R-project.org/
package=wordcloud
22. Ferreira M. M., Trudel A. R. The impact of problem-based learning (PBL) on student attitudes toward
science, problem-solving skills, and sense of community in the classroom // Journal of classroom
interaction. 2012. Vol. 47. № 1. P. 23–30.
23. Grun B., Hornik K. topicmodels: An R Package for Fitting Topic Models // Journal of Statistical
Software. 2011. Vol. 40. № 13. P. 1–30. doi: 10.18637/jss.v040.i13
24. Hallinger P., Bridges E. M. A systematic review of research on the use of problem-based learning in
the preparation and development of school leaders // Educational Administration Quarterly. 2017.
Vol. 53. № 2. P. 255–288.
25. Hone K. S., El Said G. R. Exploring the factors affecting MOOC retention: A survey study // Computers
& Education. 2016. Vol. 98. P. 157–168. doi: 10.1016/J.COMPEDU.2016.03.016
26. Kong L. N., Qin B., Zhou Y. Q., Mou S. Y., Gao H. M. The effectiveness of problem-based learning on development of nursing students’ critical thinking: A systematic review and metaanalysis // International journal of nursing studies. 2014. Vol. 51. № 3. P. 458–469. doi:
10.1016/j.ijnurstu.2013.06.009
27. Kuhn M. caret: Classification and Regression Training. R package version 6.0-77. 2017. URL: https:
//CRAN.R-project.org/package=caret
28. Loyens S. M., Jones S. H., Mikkers J., van Gog T. Problem-based learning as a facilitator of conceptual
change // Learning and Instruction. 2015. Vol. 38. P. 34–42.
29. Lunardon N., Menardi G., Torelli N. ROSE: a Package for Binary Imbalanced Learning // R Journal.
2014. Vol. 6(1). P. 82–92.
30. MyStem Технологии Яндекса. URL: https://tech.yandex.ru/mystem/
31. Park G., Schwartz H. A., Eichstaedt J. C., Kern M. L., Kosinski M., Stillwell D. J., Seligman M. E.
Automatic personality assessment through social media language //Journal of personality and social
psychology. 2015. Vol. 108. № 6. P. 934–952. doi: 10.1037/pspp0000020
32. Prain V., Cox P., Deed C., Dorman J., Edwards D., Farrelly C., Waldrip B. Personalised learning:
Lessons to be learnt // British Educational Research Journal. 2013. Vol. 39. № 4. P. 654–676. doi:
10.1080/18334105.2014.11082020
33. R Core Team R: A language and environment for statistical computing. R Foundation for Statistical
Computing, Vienna, Austria, 2017. URL: https://www.R-project.org/
34. Richter E., Nehorai A. Enriching the Undergraduate Program with Research Projects [SP Education] //
IEEE Signal Processing Magazine. 2016. Vol. 33. № 6. P. 123–127. doi: 10.1109/MSP.2016.2601652
35. Savery J. R. Overview of problem-based learning: Definitions and distinctions // Essential readings in
problem-based learning: Exploring and extending the legacy of Howard S. Barrows. 2015. Vol. 9. P.
5–15. doi: 10.7771/1541-5015.1002
36. Scherman A., Arriagada A., Valenzuela S. Student and environmental protests in Chile: The role of
social media // Politics. 2015. Vol. 35. № 2. P. 151–171. doi: 10.1111/1467-9256.12072
37. Schmidt H. G., Rotgans J. I., Yew E. H. J. The process of problem-based learning: what works and why
// Medical education. 2011. Vol. 45. № 8. P. 792–806. doi: 10.1111/j.1365-2923.2011.04035.x
38. Silge J., Robinson D. tidytext: Text Mining and Analysis Using Tidy Data Principles in R // Journal of
Statistical Software. 2016. Vol. 1. № 3. doi: 10.21105/joss.00037
39. Spyratos S., Vespe M., Natale F., Weber I., Zagheni E., Rango M. Migration Data using Social Media. JRC
Science Hub, 2018. 34 p. doi: 10.2760/964282
40. Therneau T., Atkinson B., Ripley B. rpart: Recursive Partitioning and Regression Trees. R package
version 4.1-11. 2017. URL: https://CRAN.R-project.org/package=rpart
41. Wickham H., Henry L. tidyr: Easily Tidy Data with ’spread()’ and ’gather()’ Functions. R package
version 0.7.1. 2017. URL: https://CRAN.R-project.org/package=tidyr
42. Wickham H., Francois R., Henry L., Kirill Muller K. dplyr: A Grammar of Data Manipulation. R package
version 0.7.3. 2017. URL: https://CRAN.R-project.org/package=dplyr
43. Wickham H. stringr: Simple, Consistent Wrappers for Common String Operations. R package version
1.2.0. 2017. URL: https://CRAN.R-project.org/package=stringr
Published
2018-06-29
How to Cite
Суворова, А. В., Смирнова, К. Р., Будин, Е. А., Тулупьева, Т. В., Тулупьев, А. Л., & Абрамов, М. В. (2018). Research Project as a Tool for Teaching Text Analysis Methods: Predicting the Post Class in the Social Network. Computer Tools in Education, (3), 49-64. https://doi.org/10.32603/2071-2340-3-49-64
Section
Computers in the teaching process