Published October 11, 2021 | Version v1
Dataset Open

Automatic extraction of opinions of users of social networks on reproductive behavior issues

  • 1. Lomonosov Moscow State University
  • 2. NRC "Kurchatov institute"
  • 3. MGIMO UNIVERSITY

Contributors

Project leader:

  • 1. Lomonosov Moscow State University

Description

The database contains an upload of text comments in Russian from the social network Vkontakte in .tsv format (UTF-8 encoding). Comments are collected from communities, which discuss pregnancy, childhood, motherhood, paternity, etc. Comments are collected from communities, which discuss pregnancy, childhood, motherhood, paternity, etc. 
The database contains train, test and valid sets for machine learning processing. For the analysis of opinions in the field of reproductive behavior, nine groups of the social network Vkontakte were selected, in the names or descriptions of which the words "childfree" and their variations were clearly present, and 341 groups, in the names or descriptions of which the keywords "mother", " mothers "," children ", etc. and the number of subscribers of which was more than 10,000 people. This distribution of groups depended on the different activity of groups - supporters of a childless lifestyle produced significantly more posts and comments on our topics. Using data from different groups of the social network Vkontakte avoids data homogeneity - one of the weak points of sentiment analysis. Thus, the presented database is suitable for the analysis of specific demographic groups, which we conditionally called “anti-natalists” and “pronatalists”. Note that in the group of pronatalists there are largely representatives of a small child model of reproductive behavior. The sample contains data about stance on 6 topics: "maternity capital / benefits", "abortion", "large families", "childlessness", "parental leave", "individualism".The topics are selected by a set of keywords such as abortion, childfree, rest, no child and so on.

Sentences from the collected sample were randomly selected for annotator marking. Each sentence was marked with three annotators. Since each sentence could discuss several issues, the annotator marked each sentence on all seven topics. The proposals were marked up mainly by professional demographers and linguists.

Notes

The database is a addition and development of the topic outlined in the publication Kalabikhina IE, Banin EP (2020) Database "Pro-family (pronatalist) communities in the social network VKontakte". Population and Economics 4(3): 98-130. https://doi.org/10.3897/popecon.4.e60915 and Kalabikhina, I.E.; Banin, E.P.; Abduselimova, I.A.; Klimenko, G.A.; Kolotusha, A.V. The Measurement of Demographic Temperature Using the Sentiment Analysis of Data from the Social Network VKontakte. Mathematics 2021, 9, 987. https://doi.org/10.3390/math9090987

Files

dem_dataset.zip

Files (574.6 kB)

Name Size Download all
md5:45997f47c4064c8c45c565acfac4982b
573.1 kB Preview Download
md5:760fba723917538aa7c61e293d180d4c
1.5 kB Preview Download