Observation of public sentiment toward human papillomavirus vaccination on Twitter

Abstract Background: Although human papillomavirus (HPV) is a vaccine-preventable illness, many individuals continue to resist vaccination for themselves and their children. We aimed to systematically analyze Twitter messages to obtain a unique view into public sentiment around HPV vaccination. Methods: We developed a Python-based tool to collect one week of live tweets from February 7–13, 2015 using Twitter’s automated programming interface. We retrieved data related to the HPV vaccine via 22 purposefully-selected key search terms. We developed a codebook using a hybrid approach that involved both a grounded theory approach and the addition of several key important codes based on prior work. Two trained coders independently coded tweets, and interrater reliability was compared using Gwet’s AC1. Results: We collected 20,408 usable tweets. To maintain feasibility, we used a computerized random generator to obtain a sub-sample of 2,000 of these tweets for in-depth qualitative coding. The four categories that accounted for the largest proportion of tweets included news and media coverage of current events related to the HPV vaccine, discussion of possible associations between receiving the vaccine and sexual behavior, safety of the vaccine, and effectiveness of the vaccine. Multiple inaccurate myths surrounding the vaccine, such as the misconception that it is only appropriate for females, were noted. Conclusions: Examination of Twitter chatter around HPV vaccination offers valuable insights, particularly into barriers around vaccination. It would be valuable to develop interventions aimed at countering misinformation promoted on this medium and augmenting valuable information found on it.

ABOUT THE AUTHOR Priam Chakraborty studied infectious diseases and microbiology while earning her Master of Public Health degree at the University of Pittsburgh. The current paper reports on the research undertaken for completion of that degree. Through this process, she received project mentorship from coauthors associated with the Department of Infectious Diseases and Microbiology and the School of Medicine at the University of Pittsburgh.
Project data were collected in collaboration with the Center for Research on Media, Technology, and Health (MTH). MTH conducts research investigating associations between media messages, technological innovations, and health outcomes (particularly behavioral and mental health, preventive medicine, and online health literacy). The research reported in this paper is a stepping stone toward future projects, with an overarching goal of leveraging Twitter to improve public health.

PUBLIC INTEREST STATEMENT
With social media, information and misinformation can travel quickly. We examined 2,000 tweets collected from Twitter that discussed the human papillomavirus (HPV) to gain insight into public opinion. We found that the largest proportion of tweets included news and media coverage of current events related to the HPV vaccine. Other prevalent discussions were related to possible associations between receiving the vaccine and sexual behavior, vaccine safety, and vaccine effectiveness. There were multiple inaccurate myths surrounding the vaccine, such as the misconception that it is only appropriate for females. Because HPV is a vaccine-preventable illness, this study is especially valuable in that it provides insight into publicly held misconceptions that might prevent people from getting vaccines for themselves or their children. Using Twitter to monitor real-time health topics can be valuable for being able to react immediately to misinformation by dispelling myths and posting reliable information.

Introduction
Human papillomavirus, commonly referred to as HPV, is a DNA papillomavirus that is transmitted through sexual contact. While infections from many of the 170 known strains of HPV are asymptomatic, certain strains have been definitively linked to conditions such as genital warts or cancer (Ghittoni, Accardi, Chiocca, & Tommasino, 2015). Persistent HPV infections are most commonly associated with cancers of the cervix, vulva, vagina, penis, and anus (Stanley, Winder, Sterling, & Goon, 2012). HPV has also recently been associated with other cancers, such as oropharyngeal cancer, among those who engage in oral sex (US Centers for Disease Control & Prevention, 2017).
In 2006, the United States Food and Drug Administration approved Gardasil, a prophylactic vaccine that protects against four of the most prevalent types of HPV (Colgrove, 2009;Kaplan & Haenlein, 2011;McNab, 2009;Yeganeh, Curtis, & Kuo, 2010). By 2008, 41 states approved and recommended Gardasil (Colgrove, 2009). In December of 2014, Gardasil 9, which protects against an additional 5 serotypes, was approved by the FDA (US Food & Drug Administration, 2017). Despite evidence of safety and efficacy of Gardasil and Gardasil 9, vaccination rates remain low (Reagan-Steiner et al., 2015). This has resulted in continued increases in the prevalence of HPV. For example, in the United States, approximately 79 million people are estimated to be currently infected with HPV, with about 14 million newly infected every year (US Centers for Disease Control & Prevention, 2017).
Previous research on low vaccination rates has uncovered several barriers to HPV vaccination such as parental concerns about vaccine safety and stigma, lack of vaccine promotion, and issues of accessibility. A primary barrier is that parents receive seemingly authoritative information, which is not evidence-based, related to the safety and acceptability of the HPV vaccination. For example, some religious institutions or notable public figures perpetuate misinformation about vaccination risks or directly discourage vaccination on moral grounds (Perkins, Pierre-Joseph, Marquez, Iloka, & Clark, 2010). The age at which children receive the vaccine may also worry parents who experience denial that their children are in or approaching a developmental stage in which sexual activity is increasingly likely (Barth, Cook, Downs, Switzer, & Fischhoff, 2002;Yeganeh et al., 2010). This is a particular concern as Gardasil is recommended for boys and girls aged 11-12 years old, and is permissible for girls as young as age 9 (Markowitz et al., 2007). In relation to vaccine promotion, many schools and colleges require incoming students to be vaccinated for infectious diseases, but HPV is generally not among the requirements (Ciolli, 2008). Physicians' general lack of advocacy for the Gardasil vaccine has also been noted as a barrier to widespread uptake (Vadaparampil, Murphy, Rodriguez, Malo, & Quinn, 2013). Further, as the Gardasil vaccine schedule requires multiple office visits to complete, families of low socioeconomic status or with limited access to medical care have disproportionately higher barriers to overcome (Chando, Tiro, Harris, Kobrin, & Breen, 2013).
Social media platforms offer potential in-routes to improve in dissemination of reliable health information related to HPV vaccination. This is particularly relevant as social media is the preferred source for health information among teenagers and young adults aged 18-30 (Vance, Howe, & Dellavalle, 2009), and presents a fast and cost-effective method of disseminating health information on a large scale (Dredze, 2012;McNab, 2009). A particularly salient form of social media use that allows users to publicly share thoughts in succinct posts is called microblogging (Kaplan & Haenlein, 2011). Of microblogging platforms, Twitter is the most popular (Aichner & Jacob, 2015). As of the first quarter of 2015, Twitter had roughly a half-billion worldwide users, with about 300 million regularly active users among them (Twitter, Inc., 2015). Twitter is also unique as it over-represents individuals in the 18-29 age group and minorities (Duggan, Ellison, Lampe, Lenhart, & Madden, 2014). These are demographics which present ideal targets for increasing knowledge about availability and acceptability of HPV vaccination (Daniel-Ulloa, Gilbert, & Parker, 2016).
The Twitter platform provides an opportunity to observe consistently sized messages, known as "tweets," which are a maximum of 140 characters. Tweets are publicly available from an estimated 88% of Twitter users (Beevolve, Inc., 2014), including individuals and non-personal accounts (e.g. corporations, governments, non-profit organizations). Public health agencies, such as the World Health Organization, the Center for Disease Control and Prevention, and the New York City Department of Health also leverage data and engage with the public using this platform (Thackeray, Neiger, Smith, & Van Wagenen, 2012). Researchers have also leveraged the organic nature of Twitter conversations to better understand topics such as electronic cigarette use, H1N1, and suicide attempts (Colditz, Welling, Smith, James, & Primack, 2017;Jashinsky et al., 2013;Paul & Drezde, 2011).
Recently, Twitter data have helped to understand broad public sentiment toward HPV vaccination (Massey et al., 2016). These findings indicate that Twitter users are generally exposed to more positive than negative information about the HPV vaccines on this platform. However, a substantial amount of negativity is also present, and remains to be contextualized in depth. Such additional context about divisive aspects of HPV vaccination may provide useful insights into communication approaches to enhance reliable health information and counter misinformation about HPV vaccination online. Therefore, the purpose of this study was to systematically assess tweets to better understand overarching themes of discussion related to public sentiment about HPV vaccination. We also specifically wished to assess content related to the relatively newly marketed 9-valent vaccine.

Data collection
We used Twitter's Public Streams Application Programming Interface (API) to collect live data from Twitter (Twitter, Inc., 2017). To access the stream of Twitter data, we developed a Python script, using Python(x, y) software (Raybaut & Nyo, 2014), which relied on basic functionality of the Twython package (McGrath, 2014). Our script allowed us to selectively retrieve data from the Twitter API that was specifically relevant to our topic. Technical difficulties can arise if the stream flow collected from Twitter's public streams exceeds 1% of the entire flow (Morstatter, Pfeffer, Liu, & Carley, 2013). However, because our topic was highly specific and the code created to filter the tweets used highly specific terms, the stream did not exceed the 1% threshold. Thus, we were able to successfully capture all relevant data, and no known relevant content was omitted.

Search terms
Between November of 2014 and January 2015, we systematically endeavored to select an optimal set of terms that would be parsimonious enough to be feasible yet broad enough to capture sufficient relevant information. This process alerted us to the importance of searching for common misspellings (e.g. "cervarix" and "vaxine") as well as commonly accepted slang (e.g. "cervical shot" and "cervical vaxx") in order to ensure that sufficient relevant information was captured. Other times, however, terms were too inclusive. For example, the word "cervical" on its own nearly always returned irrelevant messages related to cervical spine and/or neck problems. By combining two terms within a single search string, it required both words to be present in the tweets, but did not require the words to appear consecutively. The following is the finalized list of keywords that were employed during data collection: HPV, papilloma, pappiloma, papiloma, pappilomavirus, gardasil, gardisil, guardasil, guardisil, cervarix, cervical shot, cervical shots, cervical vaccine, cervical vaccines, cervical vax, cervical vaxine, cervical vaxines, cervical vaxx, cervical vaxxine, cervical vaxxines, cervical vaccination, and cervical vaccinations.

Search procedures
We collected all tweets matching at least one of the search strings stated above during the period from 12:00 am on Saturday, 7 February 2015 until 11:59 pm on Friday, 13 February 2015 (Eastern Standard Time, . This time frame was selected to include each day of the week and for convenience. While data collection had been attempted in the previous two weeks, each prior period had lapses in data collection due to technical issues related to Twitter API errors. Therefore, we selected the first complete week-long period without such a lapse for data collection. This process resulted in a total of 20,408 usable tweets. Tweets from both personally maintained accounts and organization-managed accounts were included. To maintain feasibility, we used a computerized random generator to obtain a sub-sample of 2,000 of these tweets for coding. In order to maximize confidentiality, all personal identifiers such as Twitter usernames were omitted during the coding process. Data collection procedures were approved by the University of Pittsburgh Institutional Review Board (IRB # PRO14070505).

Codebook development
We developed our codebook using a hybrid approach that involved direct assessment of the tweets themselves in a grounded theory approach with the addition of several key important codes based on prior work and prior theory (Strauss & Corbin, 2007). The grounded theory phase involved three iterative rounds of axial coding by individually working researchers who met periodically to discuss adding, deleting, and/or combining codes. During this process, the decision was made to make all codes dichotomous. For example, we originally divided sentiment into a 3-level categorical variable (positive, neutral, or negative). However, we ultimately determined that it was important to capture whether there were simultaneous positive and negative sentiments in the same tweets. Therefore, we ended up with two dichotomous codes, one for each of negative and positive. In addition to determining codes based on grounded theory, we supplemented our code list based on prior work in this area and current events of importance we wished to capture. For example, we were acutely interested in attitudes and discussion around the new 9-valent vaccine. Thus, we included this as a code. Similarly, we ended up with specific variables capturing whether a tweet was related to factors such as policy, cost, and access to vaccines.
Practice coding was performed on tweets not included in the final set. During this process, two independently working coders assessed sets of 200 tweets each, met to discuss any differences, subsequently met with a supervisor to resolve any remaining discrepancies, and then modified the codebook as necessary. Using this iterative process, a final codebook was developed with clear definitions and positive and negative examples of each code.
The final codebook included 13 codes. An initial code was used to determine whether the tweet was related to our topic of interest (instead of, for example, an organization with the acronym "HPV"). Two separate codes were used to assess whether overall sentiment was positive or negative. A pair of codes were also used to signify whether the vaccine was safe or unsafe. Two separate codes were also used to assess whether the tweet explicitly claimed that the HPV vaccine increases sexual behavior or does not increase sexual behavior. If a tweet suggested that the HPV vaccine actually decreased sexual behavior, it was coded in this latter category.

Coding procedures
Two trained coders independently coded each of the first 200 tweets in the data-set. We assessed interrater reliability using Gwet's AC1 coefficient, which is a preferred method of computing interrater reliability when code counts are relatively low (Gwet, 2008). Because interrater reliability was sufficient for each of the 13 variables (AC1 > 0.95 for all variables except news-related, where AC1 = 0.77), coders divided the remaining tweets. In the rare cases of disagreement among the two initial coders, they worked together to achieve consensus. In a few cases, the two coders met with a third researcher on the team for adjudication.

Analysis
We generated counts and frequencies for all codes. Then, we convened to explore examples of each code and assess for deeper meanings. We then synthesized findings and selected exemplary quotations for illustrative purposes. This process was guided by the principles of thematic synthesis in which codes are organized into descriptive and then analytic themes (Braun & Clarke, 2006).

Results
Of the 2,000 tweets, 1,887 (94.4%) were relevant to HPV. Thus, these 1,887 tweets composed our final data-set. The vast majority of tweets (n = 1,668, 88.4%) originated from unique user accounts. Two hundred and four tweets (10.8%) were from users who posted between two and five times, and 30 (1.6%) were from users who posted 10 or more times.
Ninety-eight (5.2%) of the 1,887 tweets were coded as having positive sentiment. These tweets actively encouraged vaccination or otherwise described vaccination in a highly positive manner. Words and phrases found in this set of tweets included terms such as "works well," "recommend," "vaccines work," and "vaccinate your kids." Ninety-five (5.0%) tweets were coded as having negative sentiment. These tweets generally discouraged Gardasil vaccination or expressed a particularly negative view toward it. Specific terms characteristic of these tweets included "beware," "destroys lives," and "mystery illness." While sentiments were nearly always directly incorporated in to the tweets themselves, negative sentiment could also often be inferred by the hashtags at the end of the tweet. For example, #CDCWhistleBlower was a common hashtag used in negative sentiment tweets; this hashtag was included by users who disagreed with the CDC's encouragement of vaccination.
One hundred thirty-two (7.0%) tweets explicitly claimed that the vaccine was safe. Examples of phrases that implied safety included "safety of the HPV vaccination is reaffirmed." However, 78 (4.1%) tweets explicitly claimed lack of safety. Examples of phrases included in tweets describing lack of safety included "Gardasil ruins live [sic]" and "girl dies shortly after receiving HPV vaccine." About one-fourth of all tweets in the sample (n = 516, 27.3%) of tweets explicitly rejected a negative effect of getting the HPV vaccine on sexual behavior (See Table 1). Most of these tweets referenced an online news commentary from the Harvard Medical School titled "HPV vaccination not linked to riskier sex" (Miller, 2015), based on recent work published in JAMA Internal Medicine (Jena, Goldman, & Seabury, 2015). Specific wording used by this category of tweets included "HPV vaccine will not turn your daughter in to a slut" and "HPV vaccine linked to less risky behavior." Only 17 (0.9%) tweets explicitly suggested that the HPV vaccine increases risky sexual behavior. One example claimed that "HPV Vaccines make you promiscuous." Over 40% (n = 787) of tweets were coded as being related to a news report. These were often direct re-tweets of posts from major newspapers, magazines, or TV channels that had reported a story related to the HPV vaccine. Coders identified such tweets with both key terms such as "coverage," "article," and "story," and/or direct mentions of known media corporations such as @TorontoStar, @ USATODAY, or @ABC (Table 1).
The codebook contained four other categories-legal and policy matters, barriers to vaccination, the Gardasil 9 vaccine, and parental attitude-but few tweets were coded in these categories. Legal or policy related tweets represented 1.4% (n = 26) of the sample and often included terms such as "conservative" or "liberal," and they frequently directly mentioned politicians, government agencies, or possible policy measures. One tweet, for example, claimed that "Governor Perry's Gardasil vaccine mandate cost young girls lives." Tweets coded with the term "barriers" represented 0.9% (n = 17) of the sample and often referred to factors such as access to the vaccination and prohibitive cost. Only 0.6% (n = 11) of tweets referred specifically to the 9-valent vaccine. Finally, 0.3% (n = 6) of tweets were coded as related to parental attitudes; for example, some individuals specifically indicated terms such as "my child" or "my son/daughter" as they discussed the vaccine.

Table 1. Prevalence and examples of coded variables among 1,887 tweets in February of 2015
*All individual Twitter user names have been replaced with "@UserName" to protect confidentiality. Similarly, we replaced links originally provided by users with the generic "[web link]" in order to reduce the risk of breach of confidentiality.

Discussion
Our systematic analysis of one week of tweets related to the HPV vaccine yielded four major findings. First, it was noteworthy that a plurality of messages (over 40%) were related to news (generally, media interpretations of scientific findings). Second, we noted that over a quarter of tweets discussed the question of whether the HPV vaccine engenders increases in sexual behavior, with the vast majority of tweets suggesting that it does not. Third, overall positive sentiment and overall negative sentiment were approximately equally represented. Fourth, although slightly more tweets suggested that the vaccination is safe, a sizeable number used anecdotes to suggest that it is not safe.
One reason that there were so many news and media related tweets during the collection period was related to a specific story in the Toronto Star. The newspaper printed a front-page article titled "A wonder drug's dark side" about potential concerns around use of Gardasil (Toronto Star, 2015). However, the article was subsequently heavily criticized by the medical, scientific, and public health communities for being anecdotal and lacking scientific accuracy. The Toronto Star acknowledged this criticism, retracted the article, and replaced the online version with one called "Science shows HPV vaccine has no dark side" (Guichon & Kaul, 2015). Individual users expressed personal opinions about the situation such as "Never lost respect for a publication as fast as I lost respect for @ TorontoStar with their HPV vaccine coverage." Other quotations from tweets included "@TorontoStar botched a story about #HPV vaccine," and "this is appalling, ignorant, irresponsible journalism." Another reason for many news and media related tweets was that a scientific study was published that week suggesting that "HPV vaccines do not lead teen girls to risky sex." This may also be a reason why there were so many more tweets suggesting that the vaccine does not lead to increased sexual promiscuity.
The specific news messages noted in this data-set were quite interesting and could be used for prevention and/or intervention. For example, the Toronto Star issue demonstrated that Twitter remains an important self-policing community in which medical professionals and other advocates can correct misinformation. This also suggests that more could be done to correct misinformation on this highly influential platform. Similarly, though Twitter users shared the Harvard Medical School article about HPV vaccination not increasing risky sexual behavior, this message could have been more widely distributed with improved infrastructure around online public health communication.
We found that positive and negative sentiments were each about equally represented (~5%). One reason for the relatively low values were that our codebook, which was designed to improve interrater reliability, specified that only tweets that directly promoted or discouraged Gardasil were to be labeled as positive or negative. While it was out of the scope of the current research, it may be valuable for future research to explore whether finer-grained assessments (e.g. not positive, somewhat positive, very positive) could be made in this regard.
Our findings regarding the approximately even rate of positive and negative sentiment are not entirely consistent with offline research demonstrating strong acceptance of Gardasil among young adults (Boehner, Howe, Bernstein, & Rosenthal, 2003;Gerend, Lee, & Shepherd, 2007;Lambert, 2001). However, most studies that chose to focus on young adults recruited participant samples solely from college or university settings and used typical survey methodology that may have been prone to social desirability bias. While the present study found 49% of the tweets with sentiment to be negative, recent work by Massey et al. (2016) utilized a broader sampling frame and estimated negativity at roughly 39%. This still reflects a substantial proportion of Twitter users who advocate against HPV vaccination.
The 9-valent Gardasil vaccine was approved by the US Food and Drug Administration (FDA) in November 2014. While only very few tweets specifically addressed this version of the vaccine, specific tweets noted give clues as to barriers that at least some individuals face with regard to acceptance of this vaccine. For example, one message read "One reason Merck could possibly have for DOUBLING the amount of ALUMINUM in the new Gardasil 9 shot is to kill faster," and this message was re-tweeted 5 times and favorited 2 times. Even though there was not overwhelming support for this statement, it still offers a window into one specific concern (increase in amount of aluminum). Thus, this information could be translated into interventions aimed at demystifying this and other myths noted.
While few tweets specifically captured the perspective of a parent, those tweets were generally consistent with prior literature around parental attitudes and behaviors around HPV vaccination. For example, parents are moderately knowledgeable of HPV and accepting of the Gardasil shot for their children; studies have suggested that willingness to use the vaccine is in the 60-75% range (Constantine & Jerman, 2007;Davis, Dickman, Ferris, & Dias, 2004). However, parental concerns noted in our data, such as potential harm from the vaccine and possible increase in sexual activity due to the vaccination, have also been noted in prior research (Boehner et al., 2003;Constantine & Jerman, 2007;Davis et al., 2004). In this way, survey studies and qualitative assessments such as ours complement each other; survey studies provide accurate prevalence data while descriptive qualitative uncovers compelling and specific examples that may help guide intervention.
Similarly, recent research has focused on quantifying longer-term sentiment toward HPV vaccination on the Twitter platform-using machine learning algorithms to expand the breadth of data classification and bolster generalizability (Massey et al., 2016). Our findings on the qualitative contexts of HPV-related sentiment directly complement this approach. In particular, while broader classification offers opportunities to estimate the prevalence sentiment across a wide breadth of tweets, qualitative approaches are useful to uncover the depth of "lived experience" underlying such sentiment (Colditz et al., 2017). With quantitative and qualitative groundworks better established, future studies in this realm might expand on extant work using other novel methodologies. Future possibilities include time-series analysis for trend detection, or geo-spatial and network analyses to better understand how content about HPV vaccination spreads on the Twitter platform. Such approaches constitute pivotal next steps toward leveraging the Twitter platform to effectively communicate evidence-based information, counter misinformation, and encourage broader public discourse related to HPV vaccination.

Limitations
One factor that limited the generalizability of this study was the relatively short one-week span over which tweets were collected. We were able to collect valuable information on a few specific topics related to HPV vaccination, including opinions on current Gardasil coverage in newspapers and newly published data on the effect of Gardasil on sexual behavior. However, these topics were only representative of issues that were important that particular week. It should also be emphasized that we selected a random subset of about 2,000 tweets for analysis because of concerns of feasibility around examining all 20,000 tweets. While basic analyses suggest that our subset was representative of the larger population, it is possible that different information may have surfaced had we selected a different subsample. Finally, it should be acknowledged that coding natural language such as tweets can be challenging. For example, if a user is being sarcastic, something that seems to have positive sentiment may actually have negative sentiment. While this is a known limitation of qualitative coding of complex data, we endeavored to develop a well-structured codebook to maximize consistency. A final factor that limited the generalizability of our data was our population. Twitter is a platform in which all posts are voluntary, and our particular data-set provided limited discernibility as to the identity of the person or organization that communicated the information. Therefore, the views and opinions we collected on Twitter about HPV vaccination may not represent the view of the general public but rather the views of those who are robustly either for or against HPV vaccination. However, it remains important to capture these "loudest voices" because they are the ones likely to be preferentially diffused and influential.

Conclusion
Despite these limitations, systematic analysis of one week of Twitter messages centering on the HPV vaccine seems to have offered valuable insights that may be useful in the honing and crafting of interventions. While coding can be challenging, collecting data through Twitter is fast and efficient; it requires less time and labor than methods such as interviews and surveys. Ultimately, the Twitter platform might be leveraged to gain timely insights into public health topics, and to quickly and effectively communicate important public health messages.