Automatic extraction of opinions of users of social networks on reproductive behavior issues
Creators
- 1. Lomonosov Moscow State University
- 2. NRC "Kurchatov institute"
- 3. MGIMO UNIVERSITY
Description
The database contains an upload of text comments in Russian from the social network Vkontakte in .tsv format (UTF-8 encoding). Comments are collected from communities, which discuss pregnancy, childhood, motherhood, paternity, etc. Comments are collected from communities, which discuss pregnancy, childhood, motherhood, paternity, etc.
The database contains train, test and valid sets for machine learning processing. For the analysis of opinions in the field of reproductive behavior, nine groups of the social network Vkontakte were selected, in the names or descriptions of which the words "childfree" and their variations were clearly present, and 341 groups, in the names or descriptions of which the keywords "mother", " mothers "," children ", etc. and the number of subscribers of which was more than 10,000 people. This distribution of groups depended on the different activity of groups - supporters of a childless lifestyle produced significantly more posts and comments on our topics. Using data from different groups of the social network Vkontakte avoids data homogeneity - one of the weak points of sentiment analysis. Thus, the presented database is suitable for the analysis of specific demographic groups, which we conditionally called “anti-natalists” and “pronatalists”. Note that in the group of pronatalists there are largely representatives of a small child model of reproductive behavior. The sample contains data about stance on 6 topics: "maternity capital / benefits", "abortion", "large families", "childlessness", "parental leave", "individualism".The topics are selected by a set of keywords such as abortion, childfree, rest, no child and so on.
Sentences from the collected sample were randomly selected for annotator marking. Each sentence was marked with three annotators. Since each sentence could discuss several issues, the annotator marked each sentence on all seven topics. The proposals were marked up mainly by professional demographers and linguists.
Notes
Files
dem_dataset.zip
Files
(574.6 kB)
Name | Size | Download all |
---|---|---|
md5:45997f47c4064c8c45c565acfac4982b
|
573.1 kB | Preview Download |
md5:760fba723917538aa7c61e293d180d4c
|
1.5 kB | Preview Download |