Leveraging Large Language Models for Textual Geotagging: A Novel Approach to Location Inference

Azamat Sultanov

doi:10.32603/2071-2340-2024-3-2

Azamat Sultanov The Ping IT Inc., 1712 Pioneer Ave Ste 179 Cheyenne, WY. 82001, USA

DOI: https://doi.org/10.32603/2071-2340-2024-3-2

Keywords: Large Language Model, LLM, GPT, Geotagging, Natural Language Processing, NLP, Artificial Intelligence

Abstract

This study explores the application of Large Language Models (LLMs), particularly GPT-4o, to textual geotagging, introducing a novel dataset of tweets with geographical annotations. Using zero-shot and few-shot approaches, we demonstrate GPT-4o's ability to infer location from explicit and implicit textual references in tweets, achieving average errors as low as 43 km for explicit mentions. Our experiments reveal LLMs' robust geographical knowledge and adaptability to geotagging tasks with minimal context. The research also highlights LLMs' potential in advancing geographical inference from text, identifying challenges and effects of data quality, and opportunities for improving model performance on implicit references and noisy data.

Author Biography

Azamat Sultanov, The Ping IT Inc., 1712 Pioneer Ave Ste 179 Cheyenne, WY. 82001, USA

Artificial Intelligence Engineer, The Ping IT Inc., USA. Location – Dushanbe, Tajikistan, azamat.sultanov@theping.co

References

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, et al., "Language Models are Few-Shot Learners," in 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 2020.

V. Sanh, A. Webson, C. Raffel, S. H. Bach, "Multitask Prompted Training Enables Zero-Shot Task Generalization," in ICLR 2022 Conference, 2022.

J. Huang, K. C. Chang, "Towards Reasoning in Large Language Models: A Survey," in Findings of the Association for Computational Linguistics: ACL 2023, 2023, pp. 1049–1065.

K. Harrigian, "Geocoding Without Geotags: A Text-based Approach for reddit," in Proc. of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, 2018, pp. 17–27.

D. S. Shah, G. K. Siddiqi, S. He, and R. Bansal, "Local Life: Stay Informed Around You, A Scalable Geoparsing and Geotagging Approach to Serve Local News Worldwide," arXiv:2305.07168, 2023.

M.-H. Tsou, Q. Zhang, J. Xu, A. Nara, and M. Gawron, "Building Dynamic Ontological Models for Place using Social Media Data from Twitter and Sina Weibo," arXiv:2303.00877, 2023.

R. Priedhorsky, A. Culotta, and S. Y. Del Valle, "Inferring the Origin Locations of Tweets with Quantitative Confidence," in CSCW '14: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, 2014, pp. 1523-1536.

W. Li, P. Serdyukov, A. P. de Vries, C. Eickhoff, and M. Larson, "The Where in the Tweet," in Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011.

S. Chandra, L. Khan, and F. B. Muhaya, "Estimating Twitter User Location Using Social Interactions–A Content Based Approach," in 2011 IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing, 2011.

Z. Cheng, J. Caverlee, and K. Lee, "You are where you Tweet: A content-based approach to geo-locating Twitter users," in Proceedings of the 19th ACM Conference on Information and Knowledge Management, 2010.

C. Li and A. Sun, "Extracting fine-grained location with temporal awareness in tweets: A two-stage approach," Journal of the Association for Information Science and Technology, vol. 68, no. 7, pp. 1652-1670, 2017.

Y. Ikawa, M. Enoki, and M. Tatsubori, "Location inference using microblog messages," in Proceedings of the 21st International Conference on World Wide Web, 2012, pp. 687-690.

J. Gelernter and N. Mushegian, "Geo-parsing Messages from Microtext," Transactions in GIS, vol. 15, no. 6, pp. 753-773, 2011.

K. M. Ryoo and S. Moon, "Inferring Twitter user locations with 10 km accuracy," in Proceedings of the 23rd International Conference on World Wide Web, 2014, pp. 643-648.

J. Eisenstein, B. O'Connor, N. A. Smith, and E. P. Xing, "A latent variable model for geographic lexical variation," in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010, pp. 1277-1287.

J. Mahmud, J. Nichols, and C. Drews, "Where Is This Tweet From? Inferring Home Locations of Twitter Users," in Proceedings of the International AAAI Conference on Web and Social Media, vol. 6, no. 1, 2012.

T. Sakaki, M. Okazaki, and Y. Matsuo, "Earthquake shakes Twitter users: real-time event detection by social sensors," in Proceedings of the 19th international conference on World wide web, 2010, pp. 851-860.

R. Li, K. H. Lei, R. Khadiwala, and K. C-C. Chang, "TEDAS: A Twitter-based Event Detection and Analysis System," in 2012 IEEE 28th International Conference on Data Engineering, 2012, pp. 1273-1276.

M. Sasaki, S. Okura, and S. Ono, "A Simple Text-based Relevant Location Prediction Method using Knowledge Base," in Proceedings of the 12th Language Resources and Evaluation Conference, 2020, pp. 116-121.

M. A. Radke, N. Gautam, A. Tambi, U. A. Deshpande, and Z. Syed, "Geotagging Text Data on the Web—A Geometrical Approach," IEEE Access, vol. 6, pp. 22045-22060, 2018.

T. Louf, B. Gonçalves, J. J. Ramasco, D. Sánchez, and J. Grieve, "American cultural regions mapped through the lexical analysis of social media," Humanities and Social Sciences Communications, vol. 10, no. 1, 2023.

T. Kew, A. Shaitarova, I. Meraner, J. Goldzycher, S. Clematide, and M. Volk, "Geotagging a Diachronic Corpus of Alpine Texts: Comparing Distinct Approaches to Toponym Recognition," in Proceedings of the Workshop on Language Technology for Digital Historical Archives, 2019, pp. 11-18.

B. Hecht, L. Hong, B. Suh, and E. H. Chi, "Tweets from Justin Bieber's Heart: The Dynamics of the "Location" Field in User Profiles," in CHI '11: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2011, pp. 237-246.

H. Chang, D. Lee, M. Eltaher, and J. Lee, "@Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage," in 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2012.

S. A. Reddy and M. Ramchander, "Location Prediction For Tweets Content Using Machine Learning Algorithms," IJCSPUB, vol. 12, 2022.

M. Alsaqer, S. Alelyani, M. Mohana, K. Alreemy, and A. Alqahtani, "Predicting Location of Tweets Using Machine Learning Approaches," Applied Sciences, vol. 13, no. 5, p. 3025, 2023.

K. Indira, E. Brumancia, P. S. Kumar, and S. P. T. Reddy, "Location prediction on Twitter using machine learning Techniques," in 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), 2019.

S. Brunsting, H. De Sterck, R. Dolman, and T. van Sprundel, "GeoTextTagger: High-Precision Location Tagging of Textual Documents using a Natural Language Processing Approach," arXiv:1601.05893, 2016.

S. Kinsella, V. Murdock, and N. O'Hare, "'I'm Eating a Sandwich in Glasgow': Modeling Locations with Tweets," in SMUC '11: Proceedings of the 3rd international workshop on Search and mining user-generated contents, 2011, pp. 61-68.

P. Mishra, "Geolocation of Tweets with a BiLSTM Regression Model," in Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020, pp. 283–289.

B. Han, P. Cook, and T. Baldwin, "Text-Based Twitter User Geolocation Prediction," Journal of Artificial Intelligence Research, vol. 49, 2014.

K. Lutsai and C. H. Lampert, "Predicting the Geolocation of Tweets Using transformer models on Customized Data," JOURNAL OF SPATIAL INFORMATION SCIENCE, 2023.

L. F. Simanjuntak, R. Mahendra, and E. Yulianti, "We Know You Are Living in Bali: Location Prediction of Twitter Users Using BERT Language Model," Big Data Cogn. Comput., vol. 6, no. 3, p. 77, 2022.

C. Huang, H. Tong, J. He, and R. Maciejewski, "Location Prediction for Tweets," Front. Big Data, vol. 2, 2019.

A. Ritter, S. Clark, Mausam, and O. Etzioni, "Named Entity Recognition in Tweets: An Experimental Study," in EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011, pp. 1524-1534.

J. Lingad, S. Karimi, and J. Yin, "Location Extraction From Disaster-Related Microblogs," in WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web, 2013, pp. 1017-1020.

E. Belliardo, K. Kalimeri, and Y. Mejova, "Leave no Place Behind: Improved Geolocation in Humanitarian Documents," GoodIT '23: ACM International Conference on Information Technology for Social Good, 2023.

R. Lamsal, A. Harwood, and M. Rodriguez Read, "Where did you tweet from? Inferring the origin locations of tweets based on contextual information," arXiv:2211.16506, 2022.

L. S. Snyder, M. Karimzadeh, R. Chen, and D. S. Ebert, "City-level Geolocation of Tweets for Real-time Visual Analytics," in GeoAI '19: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, 2019, pp. 85-88.

H. N. Serere, B. Resch, and C. R. Havas, "Enhanced geocoding precision for location inference of tweet text using spaCy, Nominatim and Google Maps. A comparative analysis of the influence of data selection," PLoS One, vol. 18, no. 3, 2023.

O. Ajao, "Content-aware Location Inference and Misinformation in Online Social Networks," Sheffield Hallam University Research Archive (SHURA), 2019.

S. Hasni and S. Faiz, "Word embeddings and deep learning for location prediction: tracking Coronavirus from British and American tweets," Social Network Analysis and Mining, vol. 11, article number 66, 2021.

D. Dogan, B. Altun, M. S. Zengin, M. Kutlu, and T. Elsayed, "Catch Me If You Can: Deceiving Stance Detection and Geotagging Models to Protect Privacy of Individuals on Twitter," Proceedings of the International AAAI Conference on Web and Social Media, vol. 17, pp. 173-184, 2023.