Who and Why Quits Smoking in Russia (Based on Social Media Data and the Use of Neural Networks)
DOI:
https://doi.org/10.14515/monitoring.2025.4.2996Keywords:
self-preservation behavior, smoking, generative artificial intelligence, large language models, digital demography, social networksAbstract
The aim of the work is to identify the specifics of motivation to quit or not to quit smoking among Russian-speaking users of social media. The authors study the system of opinions of Russian-speaking users of social networks on issues of self-preservation behavior based on thematic analysis of social media content using large language models (LLM). The dataset formed for these purposes includes more than 58 thousand comments in Russian. The comments were collected under videos on the topic of smoking, manually selected by the authors in the Russian-language YouTube segment.
The study presents and tests an algorithm for classifying arguments of social media users on issues of motivation for smoking and motivation for quitting smoking, develops and validates algorithms for classifying the gender and age of the author of a comment, and construts distributions of reasons for (not) quitting smoking in general and by demographic characteristics of users (gender and age). The analysis of the compiled dataset showed that the main arguments in favor of quitting smoking are health and saving money, with the former occurring twice as often as the latter; among the arguments for maintaining this habit, concerns related to excess weight stand out. At the same time, no significant gender and age differences in the arguments for quitting or not quitting smoking were revealed.
Acknowledgments. The study was conducted within the framework of the research project “Population Reproduction in Socio-Economic Development” 122041800047-9 and an internal grant from the Faculty of Economics of Lomonosov Moscow State University “Demographic Determinants of Assessing the Quality of Medical Services and Smoking Cessation: Analysis of Russians’ Opinions Based on the Use of Neural Networks and Generative Artificial Intelligence”. The authors thank their colleagues Anton Kolotusha and Sofia Zhuravleva who participated in annotating the arrays of social media user comments.
References
Калабихина И. Е., Казбекова З. Г., Банин Е. П., Клименко Г. А. Демографические ценности и социально-демографический портрет пользователей ВКонтакте: есть ли связь? // Вестник Московского университета. Серия 6. Экономика. 2023. № 3. С. 157―180. https://doi.org/10.55959/MSU0130-0105-6-58-3-8.
Kalabikhina I. E., Kazbekova Z. G., Banin E. P., Klimenko G. A. (2023) Demographic Values and Socio-Demographic Profile of the VKontakte Users: Is There a Connection? Lomonosov Economics Journal. Vol. 58. No. 3. P. 157―180. https://doi.org/10.55959/MSU0130-0105-6-58-3-8. (In Russ.)
Калабихина И. Е., Казбекова З. Г., Зубова Е. А. Доводы пользователей социальных медиа по поводу отказа от табакокурения (на основе методов машинного обучения) // Вопросы управления. 2024. Т. 18. № 5. С. 48―67. https://doi.org/10.22394/2304-3369-2024-5-48-67.
Kalabikhina I. E., Kazbekova Z. G. Zubova E. A. (2024). Arguments of Social Media Users Regarding Smoking Cessation (Machine Learning-Based Data). Management Issues. Vol. 18. No. 5. P. 48―67. https://doi.org/10.22394/2304-3369-2024-5-58-67. (In Russ.)
Кузнецова П. О. Почему не снижается курение у женщин: результаты микроанализа // Женщина в российском обществе. 2019. № 3. С. 91―101.
Kuznetsova P. O. (2019) Why the Number of Smoking Women Does not Decrease: A View from Microanalysis Level. Woman in Russian Society. No. 3. P. 91―101. (In Russ.)
Сбоев А. Г., Рыбка Р. Б., Молошников И. А., Наумов А. В., Селиванов А. А. Сравнение точностей методов на основе языковых и графовых нейросетевых моделей для определения признаков авторского профиля по текстам на русском языке // Вестник НИЯУ МИФИ. 2023. Т. 10. № 6. С. 529―539. https://doi.org/10.56304/S2304487X21060109.
Sboev A. G, Rybka R. B., Moloshnikov I. A., Naumov A. V., Selivanov A. A. (2023). Comparison of the Accuracies of Methods Based on Language and Graph Neural Network Models for Determining Author Profile Features from Russian Texts. Vestnik Nacional’nogo Issledovatel’skogo Yadernogo Universiteta “MIFI”. Vol. 10. No. 6. P. 529―539. https://doi.org/10.1134/S2304487X21060109. (In Russ.)
Соболев А. А., Федотова А. М., Куртукова А. В., Романов А. С., Шелупанов А. А. Методика определения возраста автора текста на основе метрик удобочитаемости и лексического разнообразия // Доклады Томского государственного университета систем управления и радиоэлектроники. 2022. Т. 25. № 2. С. 45―52.
Sobolev A. A., Fedotova A. M., Kurtukova A. V., Romanov A. S., Shelupanov A. A. (2022) Methodology to Determine the Age of the Text’s Author Based on Readability and Lexical Diversity Metrics. Proceedings of TUSUR University. Vol. 25. No. 2. P. 45―52. (In Russ.)
Bickel W. K., Tomlinson D. C., Craft W. H., Ma M., Dwyer C. L., Yeh Y. H., Tegge A. N., Freitas-Lemos R., Athamneh L. N. (2023) Predictors of Smoking Cessation Outcomes Identified by Machine Learning: A Systematic Review. Addict Neuroscience. Vol. 6. Art. 100068. https://doi.org/10.1016/j.addicn.2023.100068.
Cheng N., Chandramouli R., Subbalakshmi K. P. (2011) Author Gender Identification from Text. Digital Investigation. Vol. 8. No. 1. P.78―88. https://doi.org/10.1016/j.diin.2011.04.002.
Chu K.-H., Colditz J., Malik M, Yates T., Primack B. (2019) Identifying Key Target Audiences for Public Health Campaigns: Leveraging Machine Learning in the Case of Hookah Tobacco Smoking. Journal of Medical Internet Research. Vol. 21. No. 7. Art. e12443. http://dx.doi.org/10.2196/12443.
Coughlin L. N., Tegge A. N., Sheffer C. E., Bickel W. K. (2020). A Machine-Learning Approach to Predicting Smoking Cessation Treatment Outcomes. Nicotine & Tobacco Research: Official Journal of the Society for Research. Vol. 22. No. 3. P. 415―422. https://doi.org/10.1093/ntr/nty259.
Culotta A. (2010) Towards Detecting Influenza Epidemics by Analyzing Twitter Messages. In: Proceedings of the First Workshop on Social Media Analytics. New York, NY: Association for Computing Machinery P. 115―122. https://doi.org/10.1145/1964858.1964874.
Dieleman L. A., van Peet P. G., Vos H. M. M. (2021) Gender Differences within the Barriers to Smoking Cessation and the Preferences for Interventions in Primary Care a Qualitative Study Using Focus Groups in The Hague. BMJ Open. Vol. 11. No. 1. Art. e042623. https://doi.org/10.1136/bmjopen-2020-042623.
Guida M., Otmakhova Y., Hovy E., Frermann L. (2025) LLMs for Argument Mining: Detection, Extraction, and Relationship Classification of Pre-Defined Arguments in Online Comments. arXiv. Preprint arXiv:2505.22956. https://doi.org/10.48550/arXiv.2505.22956.
Himdi H., Shaalan K. (2024) Advancing Author Gender Identification in Modern Standard Arabic with Innovative Deep Learning and Textual Feature Techniques. Information. Vol. 15. No. 12. Art. 779. https://doi.org/10.3390/info15120779.
Kalabikhina I., Zubova E., Loukachevitch N., Kolotusha A., Kazbekova Z., Banin E., Klimenko, G. (2023). Identifying Reproductive Behavior Arguments in Social Media Content Users’ Opinions through Natural Language Processing Techniques. Population and Economics. Vol. 7. No. 2. P.40―59. https://doi.org/10.3897/popecon.7.e97064.
Kalabikhina I., Kazbekova Z., Moshkin V. (2025) (Non)Smoking Comments Classified by Arguments, Gender and Age [Data Set]. Zenodo. January 31. Version v1. https://doi.org/10.5281/zenodo.14782953.
Kavuluru R., Sabbir A. K. M. (2016) Toward Automated E-Cigarette Surveillance: Spotting E-Cigarette Proponents on Twitter. Journal of Biomedical Informatics. Vol. 61. P. 19―26. http://dx.doi.org/10.1016/j.jbi.2016.03.006.
Kim K., Kim S. (2025) Large Language Models’ Accuracy in Emulating Human Experts’ Evaluation of Public Sentiments about Heated Tobacco Products on Social Media: Evaluation Study. Journal of Medical Internet Research. Vol. 27. Art. e63631. https://doi.org/10.2196/63631.
Klein A. Z., Magge A., Gonzalez-Hernandez G. (2022) ReportAGE: Automatically Extracting the Exact Age of Twitter Users Based on Self-Reports in Tweets. PloS One. Vol. 17. No. 1. Art. e0262087. https://doi.org/10.1371/journal.pone.0262087.
O’Connor K., Golder S., Weissenbacher D., Klein A. Z., Magge A., Gonzalez-Hernandez, G. (2024) Methods and Annotated Data Sets Used to Predict the Gender and Age of Twitter Users: Scoping Review. Journal of Medical Internet Research. Vol. 26. Art. e47923. https://doi.org/10.2196/47923.
Ritchie H., Roser M. (2023, November) Smoking. Our World in Data. URL: https://ourworldindata.org/smoking (date of access: 20.08.2025).
Romanov A. S., Kurtukova A. V., Sobolev A. A., Shelupanov A. A., Fedotova A. M. (2020) Determining the Age of the Author of the Text Based on Deep Neural Network Models. Information. Vol. 11. No. 12. Art. 589. https://doi.org/10.3390/info11120589.
Sboev A., Litvinova T., Gudovskikh D., Rybka R., Moloshnikov I. (2016) Machine Learning Models of Text Categorization by Author Gender Using Topic-Independent Features. Procedia Computer Science. Vol. 101. P.135―142. https://doi.org/10.1016/j.procs.2016.11.017.
Sboev A., Moloshnikov I., Gudovskikh D., Selivanov A., Rybka R., Litvinova T. (2018) Automatic Gender Identification of Author of Russian Text by Machine Learning and Neural Net Algorithms in Case of Gender Deception. Procedia Computer Science. Vol. 123. P. 417―423. https://doi.org/10.1016/j.procs.2018.01.064.
Visweswaran S., Colditz J.B., O’Halloran P. H., N. R., Taneja S. B., Welling J., Chu K. H., Sidani J. E., Primack B. A. (2020) Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study. Journal of Medical Internet Research. Vol. 22. No. 8. Art. e17478. http://dx.doi.org/10.2196/17478.
Younkin V., Litvak M., Rabaev I. (2024) Automatic Gender Identification from Text. Applied Sciences. Vol. 14. No. 24. Art. 12041. https://doi.org/10.3390/app142412041.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Monitoring of Public Opinion: Economic and Social Changes

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.




