Helicopter parenting through the lens of reddit: A text mining study

The study aimed to understand Reddit users’ experience with helicopter parenting through first-hand accounts. Text mining and natural language processing techniques were employed to extract data from the subreddit r/helicopterparents. A total of 713 original posts were processed from unstructured texts to tidy formats. Latent Dirichlet Allocation (LDA), a popular topic modeling method, was used to discover hidden themes within the corpus. The data revealed common environmental contexts of helicopter parenting (i.e., school, college, work, and home) and its implication on college decisions, privacy, and social relationships. These collectively suggested the importance of autonomy-supportive parenting and mindfulness interventions as viable solutions to the problems posed by helicopter parenting. In addition, findings lent support to past research that has identified more maternal than paternal models of helicopter parenting. Further research on the implications of the COVID-19 pandemic on helicopter parenting is warranted.


Introduction
Social media is gaining ground in our everyday life.Online communication and sharing on these platforms (e.g., Facebook, Twitter, and Instagram) often provide information about our psychological characteristics, attitudes, and behaviours [1].The availability of social media data provides researchers with opportunities to gain deeper insights into users' social behaviour in an uncontrolled setting [2].Assessing big data thus allows the unobtrusive analysis of trends and patterns that are specific to a particular issue.The present study examines data from the social media platform Reddit.Reddit users (N = 430,000,000; [3]) subscribe to user-created discussion groups termed 'subreddits' that often reflect specific interests (e.g., r/baking and r/dogs) and experiences (e.g., r/depression and r/mindfulness).Reddit's anonymous user identity allows for free responses without inhibition [4].This enables the collection of sensitive and meaningful information to create an accurate map of users' experience with helicopter parenting (HP) in the present study.Posts from the subreddit r/helicopterparents are analysed using computational linguistics and qualitative synthesis.It is anticipated that findings contribute to the ongoing research on the impact of HP among emerging adults.
of parental overinvolvement and micro-management [5][6][7].HP commonly referred as overparenting, describes parents who exhibit high involvement, control, and autonomy-limiting behaviours toward their child [8].Behaviourally, these are expected to manifest in an increase in advice and other directive behaviours, an attempt to shield the child from negative outcomes, instrumental support, and a preoccupation with the child's happiness [7].Examples of HP include problem-solving and decision-making for their children, and intervening in their affairs [9,10].The cause of HP is yet to be fully understood with recent studies suggesting parental anxiety and regret [11,12], culture [13]; [14], and parent-child gender combinations [15][16][17] as strong contributing factors to HP.
HP has been primarily studied in the emerging adult population, which refers to the phase of life between adolescence and adulthood (typically ages of 18-29; [18].During this transitional period, individuals experience a growing need for autonomy, whereby their decisions are guided only by personal preferences [13,19].However, HP tends to impede autonomy development.Emerging adults encounter a multitude of stressors, such as identity and relationship formation and those with helicopter parents may face even greater difficulties in this stressful climate.
Insights from advanced imaging technology have demonstrated the structural and functional changes of the brain during the transitional period of development [20] and how it relates to social cognition.Social cognition refers to the ability to infer and reason, which plays a critical role in the successful negotiation of complex social interactions and decisions [21].Coherently, research has shown HP exacerbates poor social adjustment and greater levels of alienation from college peers [22].Reports from emerging adults also indicate HP may foster a family and social environment that impairs their relationship with others [7].
There is a significant amount of evidence linking high levels of anxiety, depression, low life satisfaction, and poor psychological well-being to HP [6,7,23,24,25].Identifying and addressing the sources of distress among emerging adults has become paramount to practitioners, higher education institutions, and families [26].As emerging adults increasingly take to sharing their distress on social media, a growing body of research focuses on such self-disclosures as an unobtrusive way to uncover the hidden or suppressed views of their adversities and how it affects them [27].

Mining social data
With the advent of social media, online communities have emerged to provide help, advice, and support for those who share similar experiences [28].Perceived support in these communities plays a prominent role in improving mental health by reducing stress [29], increasing self-efficacy [30], and fostering positive behaviour changes [31].Most online platforms allow their users to post anonymously, providing a sense of security to discuss their experiences without the fear of being stigmatized or discriminated against [32][33][34].Notably, social media facilitates communication among its users and produces a colossal amount of social data [35].
Social data is mined through text mining, an artificial intelligence technology that attempts to extract meaningful information from unstructured textual data [36].The data is automatically indexed in specific ways (e.g., via common schemas) to create models which explain patterns and trends.Researchers have used this technology to analyse large amounts of textual data in business [37], health science [36], and educational [38,39] domains.They have also leveraged data to investigate human behaviour and interaction [40].These have collectively provided solutions to real-world problems such as the detection of depression [41], childhood sexual abuse [42], and suicidal ideation [43].
Recent text mining applications have allowed social scientists to observe natural online behaviour, gaining deeper insights into users' real-time psychosocial characteristics [2].Natural Language Processing (NLP), a subfield of Artificial Intelligence, is an emerging technology that uses machines to understand human languages [44].The NLP process comprises the following steps: (a) text preprocessing, in which the dataset is cleaned by removing non-textual information (e.g., emojis, images, and HTML tags); (b) text representation, where the dataset is transformed into word vectors; (c) model training, during which algorithms are utilised to train a model (e.g., sentiment analysis, opinion mining, or topic modeling); and (d) model evaluation, where the model is evaluated to ensure it has optimal generalisability to other corpora [45].
The use of NLP in psychological studies requires the text corpus to be processed according to vocabulary [46].An open-vocabulary approach does not rely on a priori word or category judgments [47,48].Instead, it aims to identify distinct sets of linguistic features (e. g., words, n-grams, and topics) in the corpus.In contrast, a closed-vocabulary approach is based on theoretical and empirical evidence on linguistic features [49].This typically involves using word dictionaries that match words with a target psychological variable (e.g., anxiety, disgust, and happiness).Taken together, NLP offers social science a novel approach to data analysis which, in turn, creates accurate maps of a phenomenon.
The complexity of HP [50] necessitates more nuanced and in-depth examination through various scholarly inquiries such as text mining publicly available social data.Furthermore, despite the vast amount of literature on the adverse impact of HP on emerging adults [13,[51][52][53], more significant effort is warranted to understand their negative experiences through first-hand perspectives and words [14].This eliminates the disparities between lived experiences and their accounts, thus allowing authentic communication of their problems.

The present study
Given that research on HP has heavily relied upon empirical research [14], the present study aims to understand Reddit users' experience with HP using NLP techniques, particularly its open-vocabulary approach.In this paper, a social media platform that discusses user experiences with helicopter parents (r/helicopterparents) will be examined.This serves as an attempt to contribute to the ongoing efforts to amalgamate both the distinct fields of psychology and computational linguistics research.It is thus anticipated that this study will make novel, scholarly contributions to the existing literature on HP.Due to the exploratory nature of this study, a C. Keerthigha et al. research question (RQ), rather than an a priori hypothesis, is employed.
RQ: What are Reddit users' experiences with HP?

Research design and participants
A Big Data approach was used to extract and analyse textual data from an online forum, Reddit.This study used secondary data thus, no participants were recruited.Reddit users are typically between the ages of 18 and 29 years [54], thus assumed emerging adults.

Ethical considerations
Ethical clearance for the study was granted by the Human Research Ethics Committee James Cook University, Australia (Ref.H8491).There are no formal guidelines in the National Statement on Ethical Conduct in Human Research [55] about Big Data approaches.Hence, the study adhered to the recommendations of [56,57] for best practices.The recommended ethical code of conduct and the corresponding explanation for compliance in the present study is presented in the Appendix.

Corpus
The corpus was based on r/helicopterparents, a subreddit that discusses user experiences with HP.Reddit (http://www.reddit.com), is a popular online social networking and news exchanging platform.Reddit is largely based on threaded conversations which are common communication patterns that effectively capture information on a particular topic [35].Reddit has multiple publicly available subreddits which are topically focused sub-communities.In these subreddits, users bring up conversations by sharing their experiences while others can choose to respond to the thread.As of January 24, 2022, r/helicopterparents has been active for approximately seven years with 15,118 members.

Data acquisition
The corpus was constructed using Reddit's official Application Programming Interface (API; Reddit, 2021), called the Python Reddit API Wrapper (PRAW; [58].An API is a software intermediary that enables data transmission between one application to another without compromising underlying implementation [59].PRAW is designed to respect all of Reddit's API rules.Data extraction of posts from November 11, 2014 to January 24, 2022 was conducted in compliance with Reddit API Terms of Service [60].The API was instructed to eliminate the usernames and titles of the posts while collecting texts for the corpus.Only textual data found in the body of the posts were extracted.Responses to the original posts were not included in the corpus as this study aimed to identify common experiences with helicopter parents rather than describe the conversations surrounding these situations.Therefore, a total of 713 original posts across all threads were retained.

Data pre-processing
The corpus was prepared for analysis using the Natural Language Toolkit (NLTK) [61] on Python 3.9.5.The NLTK provided a collection of text processing libraries such as lowercase conversion, tokenization, stop-word removal, and punctuation removal.Stemming was also applied to reduce words to their root form and group them by similarity.Web addresses and Unicode characters (e. g., symbols and emojis) were automatically removed.These collectively helped eliminate noise and correct inconsistencies in the data, transforming unstructured texts to tidy formats.The pre-processed data was used to produce basic corpus statistics (i.e., time series and n-gram analyses) and extract topics.

Feature extraction
The present study used latent Dirichlet allocation (LDA), an open-vocabulary feature extractor to study the corpus [62].LDA uses the Bayesian probabilistic modeling method to extract a set of topics from a corpus of text.Specifically, LDA generates estimates of words associated with each topic (i.e., word-topic probabilities) and estimates of topics describing the document (i.e., document-topic probabilities; [63].Inspecting the highest word-topic and document-topic probabilities allows the identification of the theme of each latent topic.Furthermore, the unsupervised learning algorithm of LDA reduces researcher bias in selecting keywords, generating topics, and ranking topic prevalence in the document [64].

Qualitative synthesis
A qualitative approach was employed to contextualise the LDA model.This approach relied on the manual, in-depth interpretations of topics from authors, which allows for exploratory analysis of the findings.Each set of topical words was collectively assigned meanings, inductively developing themes for the latent topics generated by machine algorithms.Fig. 1 illustrates the methodological C. Keerthigha et al. framework used in this study.

Corpus statistics
A series of uptrends in the subscriber count were observed since late 2019 [65].Fig. 2 illustrates the subscription rate for the subreddit r/helicopterparents.

Unigram analysis
The top 100 most frequently occurring unigrams are presented in a word cloud (see Fig. 3).The more prominent a word appears in the corpus, the bigger its representation in the word cloud.Findings consistently revealed the terms 'mother' and 'mom' as the most frequently used word in the data.

Hyperparameter tuning
Topic coherence measures the degree of semantic similarity between high-scoring words in a dataset [66] which is essential for human interpretation of topic models.Therefore, Gensim's coherencemodel was imported to evaluate topic models [67].Fig. 4 outlines the optimal number of topics against the coherence score with a fixed alpha of .01 and beta of 0.1.The maximum coherence score of 0.26 was achieved when the number of topics was set to four.

Topic model visualization
Gensim's pyLDAVis was imported to produce an interactive visualization of topics [68].Fig. 5 shows the topic model visualization of the present study.While each bubble represents a topic, the more prevalent a topic is, the bigger its representation.Overlapping bubbles indicate similarities while the words listed on the right panel represent the top 30 most salient keywords found in the dataset.
Each topic is represented by its top 30 most salient and frequent keywords.Table 1 summarises the most representative words correlated with each topic as generated from the dataset and its corresponding themes.

Discussion
This paper sought to understand Reddit user experience with HP by identifying common experiences shared by users with helicopter parents.The key findings revealed four main domains, including the common environmental contexts of HP (i.e., school, college, work, and home) and its implication on college decisions, privacy, and social relationships.The following sections will first discuss these specific impacts of HP and its implications for emerging adults.Next, highlight the study's overall contributions, limitations, and directions for future research.

Environmental contexts
The data suggest that places such as school, college, work, and home are environmental contexts in which Reddit users are exposed to HP behaviours the most.This is in line with research that indicates helicopter parents readily intervene in their children's affairs with peers, professors, and employers [9,10].Given that autonomy is a key characteristic of the emerging adulthood developmental period [18], parents who impose excessive restrictions may imply a lack of faith in their children's ability to navigate situations independently.An autonomy-restrictive parenting style, such as HP, may negatively impact an individual's psychological well-being and ability to adjust in college or work settings [69].Therefore, it is crucial to provide developmentally appropriate levels of autonomy suitable to their children's age.

Fig. 1. Methodological Framework of Present Study
Note.The process consists of data selection, data cleaning, feature extractions, and evaluation and interpretation of results.

Loss of control over college decisions
The data suggest that HP poses control over users' college decisions and this is manifested in terms of a lack of control over peer relationships, school, accommodation, and lifestyle choices.This is consistent with past findings that suggest HP is related to a host of negative outcomes in college, including poor academic achievement, self-esteem, life satisfaction, and peer relationships [9,22,24,50,70,71].For example, peer relationships are central to emerging adulthood as they offer significant provisions such as support, security,   and companionship in times of stress [72].The quality of peer relationships has been linked to self-esteem, prosocial behaviour, life satisfaction, and scholastic competence [22].This suggests that peer relationships play a pivotal role in remedying the ill effects of HP.It is thus proposed that colleges actively create opportunities for students to connect with peers and assist students in the arduous task of establishing healthy boundaries with helicopter parents by encouraging individual decision-making.

Invasion of privacy
The data suggest that helicopter parents supervise users both online (e.g., usage of the phone) and offline (e.g., people they meet), thus directly invading privacy.This is in line with previous research that highlights the association between HP and excessive parental monitoring [73].HP stunts youth disclosure and parental knowledge [74], which may in turn lead to youth relational aggression [75] and internalising problems [76].Although HP may have legal ramifications related to privacy rights [77], research on privacy management by helicopter parents remains largely unexplored [78].found that open communication among family members predicted the likelihood of emerging adults revealing private information to their parents.This suggests that privacy invasion is counterproductive to parents' efforts to remain knowledgeable about their children.Taken together, findings discourage excessive parental monitoring while encouraging enhanced family communication patterns (e.g., adopting autonomy-supportive parental monitoring strategies [79] and negotiations on privacy.Note.Four unique but broadly defined experiences that users face with helicopter parents were identified through manual interpretation of the topical words.

Impaired social relationships
The data suggest that HP interferes with users' social lives with family members and romantic partners.Research has shown that HP is strongly associated with emerging adults' insecure attachment to romantic partners [80] and stronger beliefs that being single is more advantageous than being married [25].Overparented emerging adults generalise their low self-efficacy to their dating incompetence [81].Moreover, HP may underlie attachment insecurity given that helicopter parents have constantly fed their children with attention, approvals, and reassurances which may not be received in similar amounts from their romantic partners.That is, helicopter parents are inadvertently promoting long-term singlehood while preventing their children from having negative experiences [25].Therefore, it is recommended that helicopter parents consider autonomy-supportive parenting, which, in turn, may facilitate healthy relationship formation and maintenance.

Reddit as an inexpensive source for high-quality data
This study is among the first to elucidate the substantital potential of social data in understanding parenting outcomes.Empirical research have been often critiqued for impeding generalisability of findings due to social desirability bias exhibited by respondents [82] and experimental manipulations that may not be applicable in real-world settings [83].In the present study, Reddit data offers the distinct advantage in obtaining candid qualitative responses [84,85] which stands out as a strength compared to previous studies that are primarily reliant on questionnaires [14].Reddit is also uniquely responsive to topic modelling, given that all its posts are publicaly available unlike other social media platforms such as Facebook and Instagram [64].The abundance of social media data thus provides a valuable chance for researchers to address research inquiries in innovative manners.

Parental gender differences in HP
The result of analysis consistently revealed that the terms 'mother' and 'mom' were most frequently used in the data.This is in line with studies that have found mothers engage in more HP than fathers from the perspectives of parents [86][87][88], children [22,86], and college administrators [89].Notably, this study provides evidence that in conditions where users are not restricted by the methodological choice of researchers, spontaneous responses revealed predominantly maternal models of HP.This suggests that HP is present among fathers albeit to a much lesser degree than mothers.Given HP manifests itself differently for mothers and fathers, there is merit in further examining gender differences in HP.

Parenting during the COVID-19 pandemic
The results also showed a rapid increase in the number of subscribers of the subreddit r/helicopterparents since late 2019.In January 2020, the World Health Organization (WHO) declared COVID-19 a pandemic [90].The impact of COVID-19 on parenting stress has negatively impacted parent-child relationships [91].The series of uptrends in the subscriber count is parallel to parents working from home.This may be an indicator that HP is burgeoning in times of COVID-19.Research has also shown more salient effects of HP among students living with their parents as compared to those living away from their parents [92].Therefore, when conducting research during or in the aftermath of the COVD-19 pandemic, it is important to consider how the pandemic has affected different contents around the respondents, especially family relationships and its potential long-term effects.

Mindfulness as a potential solution
The present study offers the potential to tailor meaningful interventions for HP that are directly relevant to users' lived experiences.Pactitioners can provide parent-focused prevention and interventions that emphasise the importance of parental autonomy support in mitigating the ill effects of HP, [13].Furthermore, the prominence of HP among mothers and the COVID-19 pandemic can become important targets for intervention.The effectiveness of mindfulness interventions has been extensively studied with parenting [93][94][95][96], the pandemic [97,98], and emerging adults [99].Thus, mindfulness training poses a viable solution to the problems of HP.

Limitatios and future directions
While this study is the first to provide insights from social data about HP, there are two notable limitations.First, the sociodemographic information of Reddit users is not publicly available [84].Hence caution must be exercised when generalising the results of this study.Perhaps, creating a subreddit specific to an age group (e.g., emerging adulthood), gender, or geographic location may provide additional context to the phenomenon of HP.Further, information on the users and their parents' social, cognitive, emotional, and mental functioning is warranted to fully understand the normativeness of the research population.This may appraise the generalisability of findings obtained from online forum conversations.
Second, not all topics are identified by the LDA modeling since responses to the original Reddit posts were not included in the corpus.Comments for each Reddit post may possess broad and varied discussions on the main topic [100].These threads may contain information that is sensitive or less commonly experienced [64].Thus, it is recommended that future research analyse Reddit posts and C. Keerthigha et al. comments in tandem to increase the potential of topic extraction while new findings add to the existing literature on HP.Last, using a traditional approach of human coding may have provided more context to the findings given that the computer-based approach of topic modelling is developed solely based on a cluster of words.Although recent research suggests that both methods are comparable [101], more work is required to ascertain the viability of natural language processing techniques in studying psychological research.

Conclusion
This research adopted a novel, Big Data approach to understanding Reddit users' experience with HP, contributing to the emerging efforts to amalgamate the distinct fields of psychology and computational linguistic research.Results revealed common environmental contexts of HP (i.e., school, college, work, and home) and its implication on college decisions, privacy, and social relationships.Our findings emphasised the lack of autonomy that HP fosters and encourages enhanced family communication patterns and mindfulness interventions.Furthermore, as with past research, our exploratory study supports more maternal than paternal models of HP, warranting further examination of gender in behavioural manifestations of HP.Considering the COVID-19 pandemic, future research must study the implications of high levels of parent-child contact on HP.Overall, these findings offer meaningful insights into HP through first-hand accounts.

Fig. 2 .
Fig. 2. Subscription Rate for the Subreddit r/helicopterparentsNote.A rapid increase in the number of subscribers is observed since late 2019.

Fig. 4 .
Fig. 4. Determining the Optimal Number of Topics based on Topic Coherence Score Note.The optimal number of topics (k) vs coherence score (C_v) is presented.

Fig. 5 .
Fig. 5. Topic Model Visualization Using pyLDAVis in Gensim with four Topics Note.The bubbles illustrate overall topic distribution while the bars represent overall word frequencies.

Table 1
Summary of topical words and corresponding themes.
1 mom, time, friend, school, home, house, life, dad, college, work Environmental Contexts School, college, work, and home are environemental contexts in which users are exposed to HP behaviours the most. 2 time, mom, friend, life, home, school, room, mother