Raising Awareness About Cervical Cancer Using Twitter: Content Analysis of the 2015 #SmearForSmear Campaign

Background: Cervical cancer is the 2 cause of cancer among women under age 45. To deal with the decrease of smear test (ST) coverage in United Kingdom, a Twitter campaign called #SmearForSmear has been launched in 2015 for the European Cervical Cancer Prevention Week. Its aim was to encourage women to take a selfie showing their lipstick going over the edge and post it on Twitter with a raising awareness message promoting cervical cancer screening. Estimated audience was 500 million of people. Other public health campaigns have been launched on social medias such as Movember to encourage participation and self-engagement. Their result was unsatisfactory as their aim had been diluted to become mainly a social buzz. Objective: Our objectives were to identify the tweets delivering a raising awareness message promoting cervical cancer screening (sensitizing tweets) and the characteristics of Twitter users posting about this campaign. Methods: 3-step content analysis of the English tweets tagged #SmearForSmear posted on Twitter for the 2015 European Cervical Cancer Prevention Week. Data were collected using Twitter API. Their extraction was based on an analysis grid generated by two independent researchers using a thematic analysis, validated by a strong Cohen’s kappa coefficient. 7 themes coded for sensitizing tweets and 13 for Twitter users’ status. Verbatims were thematically then statiscally analyzed. Results: 3018 tweets were collected, 1881 analysed. 70% of tweets had been posted by people living in United Kingdom. 57.4% of users were women, sex was unknown in 36% of cases. 54.4% of the users had posted at least one selfie with smeared lipstick. 32.2% of tweets were sensitizing. Independent factors associated with posting sensitizing tweets were: woman who experienced an abnormal ST (OR 13.456 [CI95% 3.101-58.378], p = 0.0005), female sex (OR 3.752 [CI95% 2.1336.598], p < 0.0001) and live in the United-Kingdom (OR 2.097 [CI95% 1.447-3.038], p < 0.0001). Non-sensitizing tweets were statistically more posted by non-health or non-media company (OR 0.558 [CI95% 0.383-0.814], p = 0.0024). Conclusions: This study demonstrates that the success of a public health campaign using a social media depends on its ability to get its targets involved. It also suggests the need to use social marketing to help its dissemination. The clinical impact of this Twitter campaign to increase cervical cancer screening is yet to evaluate.


MAIN ARTICLE BODY:
Introduction: Cervical cancer is the second cause of cancer among women under age 45 and leads to significant mortality [1]. Cervical cancer is caused by human papilloma virus (HPV) [2]. Smear test (Papanicolaou test) detects precancerous changes and earlier stage cervical cancers. Its introduction has allowed a dramatic decline of cervical cancer incidence and death rates in many countries, such as the developed countries [3]. In England, an organized national screening program has been established in 1988. Incidence of cervical cancer in women aged 20-79 in England has almost halved from 1982 to 2006, thanks to this program. However, its incidence is now rising in 20-29-year-olds from 1996 onwards in most regions in England [4]. From 1999 to 2013, the number of women that didn't attend their smear test for a 5-year period has progressively increased from 16% to 22% [5]. It suggests that organized screening isn't intrinsically strong enough to keep a high coverage rate. Social medias would have a great potential to improve behavior change as interactive tools, encouraging participation and self-engagement instead of a « descending » information [6][7][8]. Facebook, Twitter or Instagram had respectively more than 1.86 billion, 317 million and 500 million monthly active users in December 2016. For Twitter, more than 500 million tweets are traded every day [9]. These social media have become a valuable source of information for health professional and clinicians to effectively discovering health related topics and behaviors online [32, 36,37,38]. Public health campaigns have already tried to take advantage of the ability of social medias to make a campaign viral. The ALS Ice Bucket challenge's goal was to mediatize and fundraise for the amyotrophic lateral sclerosis. It had involved many celebrities worldwide. On 1 September 2014, more than 17 million videos had been shared on Facebook and had been watched more than 10 billion times by more than 440 million people [10]. Thanks to this campaign, more than 100 million dollars had been raised by the ALS association [11]. Hundreds of thousands of people had tweeted daily about ALS, which is a much higher number of tweets than those emitted about multiple sclerosis, a disease better known to the public [12]. Movember is an annual event organized each November since 2003 whose goal is to raise awareness and fundraise about masculine disease such as prostate or testicle cancer. Participants let their moustache grow and post selfies on social medias to raise the awareness of their contacts and show their involvement in this campaign. In Denmark, after the initiation of the 2011 Movember campaign, a significant decline in the PSA level at referral and an increase in the number of patients referred under suspicion of prostate cancer was observed. However, only minor differences in referral patterns and prostate cancer diagnosis were detected [13]. These campaigns may be parasitized by the buzz they seeked to create and may vehicle non-health-related messages. A content analysis of the 2013 Movember Canada campaign on Twitter showed that it did not meet the stated campaign objective of creating conversations about men's health and, specifically, about prostate and testicular cancers [14]. To deal with the decrease of smear coverage in United Kingdom, a Twitter campaign called #SmearForSmear has been launched in 2015 by the Jo's Cervical Cancer Trust for the European Cervical Cancer Prevention Week. Its goal was to encourage women to take a picture of themselves (selfie) showing their lipstick going over the edge and post them on Twitter with an awareness message promoting cervical cancer screening. Estimated audience was 500 million people [15]. Our objectives were to identify the tweets delivering raising awareness messages about cervical cancer screening (RAM) and the characteristics of Twitter users posting about this campaign.

Methods:
We conducted a 3-step content analysis of the English tweets posted on Twitter during the 2015 European Cervical Cancer Prevention Week.

Data collection and extraction:
To collect the tweets, we used Twitter Application Programming Interface (API). It allows the user to conduct manual searches for keywords in tweets with specific parameters such as hashtags, language, and date range. The ones used for this research were: #SmearforSmear, English language, and tweets posted between 25 January 2015 to 31 January 2015 inclusive (European Cervical Cancer Prevention Week). All tweets have been manually collected and assessed. Only original tweets, rather than retweets, were analyzed. In the tweets, only the verbatim were transcribed. Hashtags and content preceded by "@" were removed if that action didn't make the verbatim unintelligible. We also considered all hypertexts linked to another verbatim on another web-platform (e.g., Instagram). The corresponding verbatim were transcribed only if they were informative.

Data analysis:
A total of 3019 tweets that met the search criteria were imported into Excel for data extraction. An analysis grid had been created based on 200 tweets initially collected and thematically analyzed by two independent researchers to extract the themes (topics) of tweets' verbatim and Twitter users' statuses. Then, this grid had been tested on 50 new tweets. No new themes had been identified, confirming that category saturation was achieved. The thematic analysis methodology used consists of transforming qualitative content into a quantitative form by establishing coding categories. The number of data units that fall into each coding category are counted (such as phrases, messages, responses). Finally, they are categorized based on similar meanings and overt or inferred communication [16,17]. Themes were not restricted to preexisting themes. They emerged through an inductive process whereby open coding of data revealed themes that moved from the specific to the general [18]. The two researchers, both general practitioners and trained in qualitative study, elaborated a 7-themes codebook, based on tweets, to identify if the tweets delivered raising awareness messages about cervical cancer screening: incentive to carry out the smear test, evocation of smear test importance without any precision, reminder of the smear test preventive nature, reminder of the low incidence of smear test, allusion to the mortality or morbidity of cervical cancer, reminder of the incidence of cervical cancer and testimony of an experience related to smear test or cervical cancer. If a tweet had at least one of these awareness-raising messages, it was considered a sensitizing tweet. Reproducibility of the classification of 300 tweets by the two independent researchers was tested and calculated with Cohen's kappa coefficient. The agreement was strong and varied between 0.8842 and 1.
The following information was collected about each tweet: verbatim, posting date, retrieval date, presence of a selfie with lipstick going over the edge, picture or video referring to the campaign, user sex, user location, number of followers at the date of retrieval, presence of a selfie with lipstick going over the edge, and user's status. To classify the users, we used their Twitter status. If it didn't exist or was incomplete, we extracted these information from links on their Twitter profile, whenever possible. The analysis grid enlisted 14 themes regarding Twitter user's status: health company, media company, non-health and non-media company, marketing company, fashion company, blogger or youtuber, health professional, National Health Service (NHS), politician, woman who experienced cervical cancer or relatives, woman with unspecified cancer or relatives, woman who experienced an abnormal smear test, general public and unknown. The "unknown" status was attributed when no information to categorize the user was available. Only the "unknown", "general public" or "NHS" status were exclusive. An initial global description of the sample has been performed, using the frequencies of the different categories for the qualitative variables. As the distribution of quantitative variables wasn't always Gaussian (Shapiro-Wilk test), they were expressed by their mean, standard deviation, median, minimum and maximum values and interquartile. Comparison of means was executed through the Student test when distribution was Gaussian, otherwise based on Mann-Whitney test. Comparison of qualitative variables was executed through the Chi2 test for parametric tests, or Fisher exact test when the conditions for applying Chi2 were not observed. A multivariate logistic modeling process was then carried out to identify the independent factors associated with the presence of a sensitizing message in the tweets and associated with each type of sensitizing message. A "step by step" selection procedure of the variable was used with an input and output variable set at 0.10 and 0.05 respectively. The significance threshold was set to 5%. Statistical analysis had been performed by the Department of Medical Information at Montpellier teaching hospital with SAS version 9. This research did not require ethics approval since we only used publically available Twitter content.

Results:
3018 tweets met the search criteria. 1138 were removed (retweets or copies of tweets) and 1881 were analysed. 608 tweets (32,2%) were sensitizing and included from 1 to 5 raising awareness message (RAM), mean 0.54 +/-0.93 (Table 1). Incentive to carry out the smear test was the most frequent RAM. Univariate and multivariate analysis Significant statistic associations between emitting sensitizing tweets and Twitter user status are detailed on Table 2. The "step by step" selection procedure has allowed to identify independent factors influencing the sensitizing characteristic of tweet (Table 3).  [13]. Many factors may explain this gap. On one hand, this campaign had been created using social marketing in a holistic approach. Its objective was clear, its title referred to its objective. Jo's cervical cancer trust posted key messages reflecting the need to adhere to the screening of cervical cancer, used to fill the content of tweets. A slogan had also been created "Attend your smear, reduce your risk", widely retweeted in this campaign. On the other hand, this campaign created to detect an exclusive feminine cancer was based on elements of two women's social construct, lipstick and selfies [19,20]. This approach was possible because this campaign had been designed for the UK, where the cervical cancer screening is organized. Targeted women automatically receive a letter explaining them what to do to get screened and where. Receiving an invitation letter is an independent sensitizing factor associated with greater likelihood of cervical cancer screening [21]. As for the Twitter users, our expectations were broadly confirmed. From a general point of view, Twitter users posting sensitizing tweets were people personally involved in cervical cancer screening: women, women concerned by a feminine cancer or their relatives, people living in the UK (where this English-speaking campaign took place), the NHS as a partner of this campaign and women who experienced an abnormal ST. As peer, women raised awareness by insisting on the preventive aspect of ST and directly encouraged other women to attend their ST. Peer influence is known as an important social lever for health-related behaviour change [22]. Likewise, women or their relatives who experienced a pathological state (abnormal ST, cervical cancer or an unspecified cancer) had the greatest potential among categories of Twitter users to post a sensitizing tweet. Hashtags, such as #SmearforSmear, tend to create communities behaving as support group [23]. Unveiling elements of private life is conducive to trust and emotional bond [24]. Fashion company was a user status that has a significant potential to post tweets about the importance of smear test without any precision. Actively participating to the campaign by posting selfies and pictures or videos linked to the #SmearforSmear campaign helped encouraging people to attend their ST and disseminate the importance of ST. Women's magazines also act as a guidebook and reinforce women's individual responsibility to create and maintain good health for themselves and their families [25]. As for the other user categories, the RAM in their tweets was in line with expectations. Politicians broadcasted information about the low incidence of ST and how it helped preventing cervical cancer, in relation with their use of social media to communicate with the press and the public [26]. Health companies' RAM was more direct, encouraging people to attend their ST. The general public cautioned about incidence of cervical cancer. The NHS insisted on the importance of ST without giving more information. It was probably in relation with the fact that NHS was only a partner of this campaign and that it only helped disseminating it. Finally, there was a scotoma of health professionals. This status didn't emerge as a relevant category. Their participation to a health campaign on social medias is interesting as it has been shown that the information contained in their posts are more likely to be true compared to other groups [27]. This under-representation was probably due to the shortness of the studied campaign period. Conversely, "non-sensitizing" tweets had a much greater probability to be sent either by users not directly concerned by cervical cancer such as men (exclusively feminine cancer) or by users that participated but only broadcasted information, without getting involved: medias, marketing companies and non-health/non-media companies. It questions their participation to this campaign. Was it about an opportunistic appropriation of a viral campaign? It is probably one of the main limitations of the virality of health campaigns on social media. Most tweets posted for the 2013 Movember campaign and the breast cancer prevention month did not spark conversations about prostate and testicular cancer nor promote any specific preventive behavior about breast cancer [14,28]. They may also be an interesting lever for social stimulation.

Strengths & Limitations:
To our knowledge, no study analyzing the content of the #SmearforSmear campaign on Twitter has been published yet. Our findings are corroborated by the content analysis of others health campaigns on Twitter. We used a content analysis method based on a double analysis of the sensitizing capacity of each tweet, in an exploratory process. We also mined Twitter to gather information about users' characteristics and complete the tweets' content. This highly demanding method made us decide early to restrict our study to one week. This choice was also relevant, in our opinion, as this campaign had been created for the European Cervical Cancer Prevention Week. Compared to other Twitter campaigns, our relatively high results must question its ability to keep a high proportion of sensitizing tweets in other countries (particularly where the cervical cancer screening is not organized) and if it remains high over time.
The choice to collect the tweets based on the hashtag #SmearforSmear may have limited their number, by omitting those not using it. As for the content analysis, two safeguards have been used: analyzing the content of tweets to create the categories before the study and evaluate the reproducibility of the classification by two independent researchers with Cohen's kappa coefficient, which was strong in this study. The shortness of Twitter posts, limited to 140 characters may have created a loss of information as users often used hyperlinks to be exempt from this limit. We then chose to manually mine Twitter to complete the tweets' content and gather information about users' characteristics.

Perspectives:
The #SmearforSmear campaign has allowed to disseminate sensitizing messages about cervical cancer screening and became viral. It was based on a well-designed campaign, on a facilitating audience and a facilitating health system using an organized screening. Choosing a social media adapted to the target is a major concern for a successful campaign. Twitter is interesting as it is well-suited for appointment campaigns such as #SmearforSmear or the ALS ice bucket challenge. It also is a social media used by young adults to keep up in real-time with news [29]. But its audience is mainly male, living in urban areas. And although diverse, its percentage of users with college educations and incomes over $50,000 is much higher than those of Facebook or Instagram. Users of Instagram are mainly female but 72% of online American adults use Facebook and its audience is the most engaged with 70% logging on daily [29]. Health campaigns on social medias must be seen as a way to reduce social inequities in health. In the UK, the main decline in screening was about 25-49-year-old women and black and Asiatic ethnic minorities [11]. Targeted audience must be on the social media chosen and then adapt to the shift of the evolution of their audiences.
The impact of facilitators is to be studied. As previously shown, many Twitter users of this campaign didn't engage in this campaign as they didn't post sensitizing tweets. But they participated and helped broadcasting to their audience. Models such as Cara Delevingne also posted a selfie to support campaign to her millions of followers (8.5 million in May 2017) [30]. They may boost a campaign as influencers and role model. The present findings show a clear need for studies that are capable of automatically analyze the data and extracting useful insights from the #Smear4Smear Twitter campaign. We propose the use of machine learning to tackle these challenges, and we suggest three perspectives for future directions. First, we plan to undertake a large-scale analysis using a collection of tweets that we are currently crawling since February 2017. Thisanalysis will include (i) the application of the Latent Dirichlet Allocation (LDA) in order to extract the topics [31][32][33] emerging from the discussions about the campaign, as well as (ii) the exploration of the linguistic style [34] of the Twitter' users. Second, we could benefit from statistical learning techniques [35] to predict automatically the categories of all tweets about the campaign. This study may allow us to assess our findings and generalize our results.
We will learn a model with the One-Vs-the-Rest multi-class classifier based on an annotated data set and we will apply it on all tweets about the campaign. We will compare the results to the manual processing and annotation done so far. Moreover, within a sufficiently large data set, we can take advantage of machine learning models to use features that are more complex to characterize the users tweeting about the campaign. We suggest focusing on user groups including health professionals, celebrities, general public politicians. This will lead us to understand which group of users is prominent, so that it could influence others making them to action by retweeting the messages relevant to the campaign, liking and replying to tweets or more importantly donating money. Third, we plan to