Mapping HCI research methods for studying social media interaction: A systematic literature review

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


J o u r n a l P r e -p r o o f
To grasp the wide range of work in studying 'what people do on social media,' we refer here to the term of social media interaction. The notions of social media interaction (Hall, 2018) and social media exhibition (Hogan, 2010) allow us better situate our understanding of social media interaction. Hall's concept of social media interaction comprehends unfocused interaction (e.g., "liking" and "favoriting"), routine impersonal interaction (e.g., happy birthday messages, retweeting, reposting, and sharing), and focused social interaction (e.g., chatting, commenting, tagging photos) (Hall, 2018). Hogan's understanding of social media exhibition means the extended presentation of the self in the social media environment (Hogan, 2010), this "sort of 'interaction' where people view and react to the submitted content of others" (p.384) includes social attention behaviors (e.g., browsing, lurking, scanning friends' updates, tweets, or photos). In this vein, social media interaction is understood as the term including the meanings of the concept of social media interaction (Hall, 2018) and social media exhibition (Hogan, 2010) still social media interaction.
Despite the copious research interest in the study of social media interaction, no systematic literature reviews have so far been conducted in HCI regarding the research methods applied in this domain. This is surprising because of the HCI interest in the study of emerging technologies that bring to the fore new socio-technical arrangements, that inevitably demand a reconsideration of the methodological apparatus in use (Barkhuus & Rode 2007, McGrath 1994. Taking stock of the research methods applied in the study of social media interaction can contribute to reflect on the relationship between methodological choices and the knowledge gained so far. Moreover, such an intellectual exercise is critical for pushing further the study of social media interaction in HCI and a useful means to increase clarity and collective understanding of methods (Gentles et al. 2016). Against this background, we conducted J o u r n a l P r e -p r o o f a systematic literature review on research methods applied for the study of social media interaction that examined 149 peer-reviewed articles published between 2008 to 2020 in major HCI conference proceedings and journals.
The research questions guiding this systematic literature review study are:  RQ1: What are the most prominent research topics studied?
 RQ2: What are the research methods that are most/least applied for the study of social media interaction?
 RQ3: What types of data have been collected?
Through addressing the above research questions, we point out what is the state of HCI research on social media interaction and how the choice of methods can be shaping the future of social media interaction.
The remainder of the paper is structured as follows: we introduce the related work (Section 2) and the methodology applied in this systematic literature review (Section 3). Following this, we provide the results of the systematic literature review of 149 peer-reviewed scientific HCI articles in the domain of social media interaction (Section 4). Here, we further scrutinize the selected works in terms of social media topics studied, methodological trends, and knowledge gaps so far. Based on these results, we point out future research opportunities. (Section 5).
First, we observe that there has been an interest in reflecting on the progress of the HCI and CSCW fields with respect to the choice of the research methods applied in J o u r n a l P r e -p r o o f these fields. This is the case of for example the work conducted by Barkhuus & Rode (2007) that point at changes in the role of evaluation, changes in empirical evaluation, and changes in subject selection throughout 24 years of research in HCI. Reflections on the use of evaluation methods and how they have affected HCI/CSCW fields can also be traced in the work conducted by Wallace et al. (2017). By reviewing 1209 publications published between 1990-2005 at the ACM Conference on Computer Supported Cooperative Work (CSCW), these authors show that the choice of research methods has changed with the study of collaborative work environments in practice, rather than with the development of novel systems in experimental settings at CSCW.
The authors also point that CSCW research has methodologically speaking been predominantly concentrated on single-device studies, despite the upsurge of artifact ecologies at home and in the workplace (Wallace et al. 2017).
Second, literature reviews focused on research methods to study social media interaction are not commonplace in HCI. For instance, we only find Snelson's study (2016) which offers a systematic literature review on trends in qualitative and mixed methods approaches used in social media research. Snelson (2016) finds that the majority of the qualitative and mixed methods social media studies were conducted with established methods such as interviews, surveys, focus groups, or content analysis while emergent social media research designs with network analysis. Snelson also finds that terminologies associated with mixed methods research designs have not yet been widely adopted by researchers conducting social media studies (Snelson, 2016).
Third, most of the literature reviews on methods in HCI focus on data-collection methods, design methods, or special groups of users. However, they do neither connect their reviews with the field of social media interaction nor they contribute to an overall reflection on the evolution of the HCI fieldas the previous reviews mentioned.

J o u r n a l P r e -p r o o f
Regarding data-collection methods, we find reviews on user experience evaluation methods by HCI scholars (Pettersson et al., 2018;Saket et al., 2016). Such works are more concentrated on the type of methods used and the reasons behind these choices. For instance, Pettersson et al. (2018) finds a research trend to conduct mixed methods in the field of HCI in recent years. In particular, the combination of questionnaire and interview data has been the most common way to mix methods.
Similarly, Saket et al. (2016) reviewed HCI evaluation metrics specifically in visualization. The authors argued that evaluations of visualization need to utilize not only traditional usability-focused evaluation methods but the user-experience-focused evaluation methods, such as visualizations' memorability, engagement, and the sense of enjoyment and fun.
Regarding design methods, we found work focused on persona methods (Moser et al. 2012, Salminen et al., 2020 exploring how HCI scholars have used qualitative, quantitative, and mixed-methods research approaches to create personas of elderly and children (Moser et al. 2012 (Salminen et al., 2020).
Literature reviews on research methods in HCI have also addressed special groups of users (Wärnestål et al., 2014;Brule et al. 2020). This is the case of studies on visually impaired people by Brule et al. (2020) who conducted a literature review over J o u r n a l P r e -p r o o f 6 method usages comprising 178 articles. The authors find challenges to implementing quantitative empirical evaluations in HCI, including a limited number of participants and heterogeneity of visual abilities, age, associated impairments, access to education, type of assistive technology, and previous experience.
Summarizing our search for related work revealed a variety of literature reviews on specific data-collection methods, design methods, and methods for studying specific groups of people in HCI. But all these reviews of methods in HCI are not specifically looking at social media interaction, even though social media research became an important research area in HCI. Therefore, we notice the lack of knowledge on HCI methods specifically pointing at the study of social media interaction. Our work aims at filling this knowledge gap. Systematic literature reviews, particularly those reviewing early-stage and growing research areas, have proven to be instrumental in HCI to identifying research trends and discover unexplored or underexplored research areas of opportunity (Dillahunt et al., 2017;Elisabeth et al., 2017;Mekler et al., 2014;Paraschivoiu et al., 2019;Quinn & Bederson, 2011;Wärnestål et al., 2014;Moser et al. 2012;Hansson et al., 2021).
Systematic method overviews are intricate and less addressed in diverse research fields yet they are essential for further research development (Gentles et al., 2016). By conducting a systematic literature review on methods, this study provides an overview of an epistemic value for the study of social media interactions in HCI. We believe this is important for HCI because method choices have a great impact on the knowledge we gain and influence how research fields will evolve in the future (Audy Martínek 2021; Gentles et al., 2016;McGrath, 1994).
In the next section, we introduce the methodology adopted for conducting our systematic literature review.
PRISMA has already been used to target similar research goals by reviewing the growing HCI literature (Dillahunt et al., 2017;Elisabeth et al., 2017;Mekler et al., 2014;Paraschivoiu et al., 2019;Quinn & Bederson, 2011). PRISMA consists of an evidence-based minimum set of items for reporting in systematic reviews, designed to help systematic reviewers transparently report why the review was done, what the authors did, and what the authors did (Page et al., 2021).

Source selection
To construct our corpus, we accessed Conference on Human Factors in Computing Systems (CHI) and Computer Supported Cooperative Work (CSCW) proceedings through the ACM Digital Library (https://dl.acm.org/). These two conference proceedings have been used as sources in previous HCI literature review articles (e.g., Schlesinger et al., 2017;Wallace et al., 2017). In addition, we included the Web of Science (https://login.webofknowledge.com/) as an additional literature source database. We did so because systematic reviews should reduce the selection bias by avoiding literature retrieval from only one single database . Web of Science is chosen because it has a wider scope than ACM Digital Library and has been used as a literature review source among HCI scholars (see Hardy et al., 2019;Razi et al., 2020).
J o u r n a l P r e -p r o o f

Search criteria
To add articles into our corpus, both of the following related criteria need to be fulfilled: (1) Social Media search term: an article's title, abstract, author keywords or full text have to have at least one social media related-terms (i.e., "SNS(s)", "social media", "Twitter", "Facebook", "Tumblr", "Myspace", or "social network").
(2) HCI search term: To focus on the HCI context, we added the keyword "Human (-) Computer Interaction" in the Web of Science. When querying the ACM Digital Library, we did not include the HCI search term.
With the above criteria, we found 285 articles in Web of Science, 360 articles in CHI proceedings, and 161 articles in CSCW proceedings. The corpus collection was conducted on August 24th-29th, 2020.

Eligibility assessment for the final analysis corpus
To evaluate the eligibility of the retrieved articles, the first author manually checked the inclusion criteria listed below by reading the collected articles' titles and abstracts.
When it was impossible to reach a clear decision, other contents of the article, especially the methodology and data collection descriptions, were examined in consultation with the second author.
We only incorporated an article into the corpus when all three inclusion criteria were met: ・ Peer-reviewed: an article is a peer-reviewed publication either in conference proceedings or peer-reviewed journals.
J o u r n a l P r e -p r o o f ・ Topic relevance: an article's topic is relevant to social media interaction. To be more precise, we manually evaluated whether the aim, goal, or research questions of a study is related to social media interaction, including social attention (e.g., browsing, lurking, scanning friends' updates, tweets, or photos), unfocused interaction (e.g., "like" and "favorite"), routine impersonal interaction (e.g., Happy birthday messages; retweets, reposts, and sharings), focused social interaction (e.g., chatting, commenting, tagging photos) (Hall, 2018) and social media exhibition (Hogan, 2010) as well as related phenomena around these social media interaction.
・ Language: an article is written in English.
We excluded an article if it meets any of the following criteria: ・ No-match topic: An article whose topic is irrelevant to social media interaction was excluded. For example, an article that used social media only for recruiting survey or experiment participants did not fit this criterion because it only uses social media as a part of their tool for recruitments but it did not study social media.
・ Bibliographic, design and evaluation, and not-empirical type articles: We excluded the three research type categories: an article whose type is bibliographic, systematically collecting publications and describes patterns of publication and other features, design, and evaluation, describing mainly technical aspects of newly developed algorithms or systems, and not empirical, generally presenting a new tool/model but no evaluation (Wallace et al., 2017;Wärnestål et al., 2014).
J o u r n a l P r e -p r o o f ・ Non-English articles: We excluded articles whose main bodies are written in non-English language even if their titles and abstracts are in English. The reason is that we could not conduct the later review in a non-English language.
・ Lack of method and data descriptions: If there is a lack of explanations about study methods or data, it is excluded. For example, we found that many extended abstracts and workshop abstracts lack methodological explanations.
The data collection and screening process is shown in Figure 1. This procedure resulted in 149 papers representing our corpus.

Categorizing the articles based on methods
The first author read each article in the corpus and categorized them according to their methodology design: quantitative, qualitative, and mixed methods. In many cases, methods are described in articles' titles and abstracts while sometimes they are not. In the latter case, we checked the main body of articles, particularly the sections related to methods and data. For the categorizing process, we used the following definitions: ・ Quantitative methods: "an approach for testing objective theories by examining the relationship among variables. These variables, in turn, can be measured, typically on instruments, so that numbered data can be analyzed using statistical procedures" (Creswell, 2009, p.4).
・ Qualitative methods: "The process of research involves emerging questions and procedures, data typically collected in the participant's setting, data analysis inductively building from particulars to general themes, and the researcher making interpretations of the meaning of the data" (Creswell, 2009, p.4).
・ Mixed-methods: research that "combines elements of qualitative and quantitative research approaches [...] for the broad purpose of breadth and depth of understanding and collaboration" (Johnson et al. 2007, p. 123).
With this definition, a study using combinations of two qualitative methods does not fall into mixed methods even if a study's authors claim it would be mixed methods.
J o u r n a l P r e -p r o o f

Categorizing the articles based on research types
Following the HCI and CSCW research type categories (Wallace et al., 2017;Wärnestål et al., 2014), we categorized the articles' research types into whether descriptive or explanatory research. The operational definitions of each research type are as follows: ・ Descriptive: describing a particular work environment/situation where collaboration or communication, in general, is a central aspect.
・ Explanatory: explaining experiments that support or falsify some hypothesis.

Extracting topics by author keywords of the articles
To identify topic trends, we conducted an author keyword analysis. To capture how topic trends have changed, we collected the articles' keywords given by their authors.
The authors' keywords are provided by the original authors. Almost all of the articles have a keyword section except for 12 articles (e.g., articles published in the International Journal of Human-Computer Interaction and Human-Computer Interaction do not have the author keywords sections). Thus, we collected the author keywords from all articles except from these 12 articles.
For topic analysis, we excluded the keywords which correspond to our search criteria as they did not contain additional information. Such author keywords are: "social media,", "social", "SNS", "social networking (sites)," "HCI," and "Human(-)Computer Interaction", and "interaction." Afterward, we decomposed keywords into single terms (e.g., "participatory design" to "participatory" and "design") and standardized the terms by the lemmatization (e.g., "gratifications" is converted to "gratification." "media" is converted into "medium"). In this way, it was possible to bundle the decomposed author keywords for a frequency analysis to identify topic trends.

J o u r n a l P r e -p r o o f
Lastly, we created a word cloud, which provides a text data visualization based on the author keywords' word-frequencies contained in the selected articles. We created word clouds with the Python package WordCloud. In the word clouds, the font sizes represent the frequency of topic appearances within the corpus' author keywords (the maximum font size is set to 50 pt and decreases analogous to the decreasing frequency of the word per 1pt).

Results
This paper structures the results of our analysis as follows. This section (Section 4) provides descriptive results, and the next section (Section 5) discusses the descriptive results in relation to methodological reflections provided in previous work in HCI and provides and discusses examples from the corpus that were identified by the manual readings of individual studies.
In the next section, we first characterize the corpus (Subsection 4.1). Secondly, we zoom in and identify the topics studied in the selected works (Subsection 4.2), and provides the trends in methods (Subsection 4.3). Following this, we describe the data collection methods and data types identified in the corpus (Subsection 4.4). Lastly, the relationships between topic trends, data types, and methods are presented in Subsection 4.5.

General Description of the corpus
The articles in our corpus represent 38 publication venues, including CHI and CSCW, and other peer-reviewed conference proceedings and journal articles (see Figure 2). The most frequent venue was CHI (48 articles), followed by CSCW (43 articles). Aside from journals International Journal of Human-Computer Interaction and Human-Computer interaction, we have 5 articles respectively in our corpus.
Other venues in our corpus are bundled as Others in Figure  J o u r n a l P r e -p r o o f Figure 2. Distribution of venues and journals in the corpus, i.e., publication journals/conference proceedings. The venues and journals that have less than five articles were included in "Others".
In our corpus, the majority of the articles are either based only on quantitative method studies (n = 67) or qualitative method studies (n = 61). We found that mixedmethods studies are however less applied in the domain (n = 22). In our corpus, the earliest work we could find with our keyword-based search, that did not include a time frame criterion, is from 2008: we found two articles from 2008 (Dugan et al., 2008;Joinson et al., 2008) and one article from 2010 (Ji et al., 2010). Joinson (2008) and Ji et al. (2010) studied users' motivations to use social media while Dugan et al. (2008) investigated user profiles on social media.

Topic trends
In this subsection, we address the most prominent research topics studied (RQ2). We identified the topic trends through the analysis of author keywords (see 3.4.3 for detailed method design). We acknowledge that the author keyword analysis only gives us limited dimensions of topic trends in our corpus although we believe it provides us with some elements for identifying trends. Thus, to complement the author keyword analysis, the first author manually checked and read the articles in our corpus to gain a deeper understanding of the topics. While we give descriptive results in this section, we will discuss topic trends in the Discussion section (Section 5.1). The above year boundaries were found through our analysis: We tested several periods of time (e.g., two, three, and four years per phase, and a specific volume of articles per phase) until we saw clear patterns of keyword trends. In addition, the first author manually read the articles in each phase to test if year boundaries are aligned with the articles' contents. Furthermore, we additionally found that the above year boundaries can be connected with landmarks in social media development. In 2013, Facebook reached about 1 billion monthly active users in 2013 (the boundary between (1) and (2)) and reached 2 billion in 2017 (the boundary between (2) and (3)) (Our World in Data, n.d.). In addition, 2017 was the year where social media platform companies began to grudgingly accept their responsibility for how they are affecting the real world by, for example, removing disturbing children's videos from YouTube J o u r n a l P r e -p r o o f (Manjoo, 2017). The topic trends of the three distinct phases are visualized in Figure 5 while all-year topic trends are presented in Figure 4. Figure 5. Topic frequencies of the corpus' author keywords in the three phases. There was no earlier study than that of 2008.

Early Studies (2008-2012): Studying users' behavioral patterns
In the early studies, the most frequently used topic within the author keywords was "Facebook," followed by "user" (Figure 6). Facebook began to expand its users and became the most popular social media platform since 2010 (Our World in Data, n.d.). In this stage, more articles seemed to be interested in users' motivation and behavioral patterns with social media. The example studies that chose "user" as keywords include Duang et al. (2008) studying the "user" profiles. Other articles covered social media users' commitments and retention of online communities' members (Farzan et al., 2011) and how Facebook usages affected romantic relations (Zhao et al., 2012). Another remarkable keyword is "disaster" in phase one. Large disasters (e.g., the 2008 Sichuan earthquake, the 2008 California wildfire the 2011 Tohoku earthquake and tsunami) have fostered social media interaction and increase social media users.
J o u r n a l P r e -p r o o f

The latest studies (2017-2020): From understanding people on social media to designing human-centered online spheres
In the past four years (2017-2020), "design" is the most used topic that has appeared in articles' author keywords ( Figure 6). This may be the reply to the rising need to think about how the designs of social media impact people and how we can (re)design better social media interaction. For example, Alvarado and Waern (2018) conducted user participatory workshops exploring social media users' experiences related to platforms' algorithms with the aim of suggesting recommendations for social media platform designs. The trend is opposed to the early years when scholars' focuses were more on understanding people's behaviors by putting efforts on generalizing the findings to understand broader or "general" users (Zolkepli & Kamarulzaman, 2015).
For more details, all terms manually categorized can be found in Table A1 in Appendix. In Table A1 in Appendix, we categorized the research topics according to the studies' research fields and their original keywords. This categorizing process was done manually and in recursive ways mainly by the first author in consultation with the coauthors.
In addition, as shown in Table 3, the topics covered by the corpus have become broader in the last four years along with the publication number growth. For example, "security," "transparency," "democracy," and the "digital divide" on "social media" have been covered by multiple HCI articles while privacy has been continuously studied Furthermore, the mental and psychological perspectives of social media have been included lately, as topics related to mental health and cyberbullying have frequently appeared. We will further discuss these results in Section 5.1.

Distribution of the methodological approaches reflected in the corpus
In this subsection, we address the research methods that are most and least applied for studying social media interaction (RQ 1).
The number of research publications about social media interaction has been steadily growing since the first appearance in 2008 (see Figure 3). This growth may correspond to public acceptance toward social media technology and adopting it as a research topic in the HCI community. Major social media platforms launched in the early 2000s (i.e., MySpace in 2003, Facebook in 2004, and Twitter in 2006, and the number of social media users started to grow in the aftermath. Over time, we can see that since 2013 HCI studies on social media interaction increased, and mixed methods became more popular. In 2018, the proportion of mixed-methods research has decreased to zero and slowly increased again until 2020.
In particular, we observe the following distribution in terms of quantitative, qualitative, and mixed methods studies.
J o u r n a l P r e -p r o o f  (Park et al., 2015), exploring users' motivations to use special media based on a survey (Kim et al., 2017), identifying and quantifying the outcomes of users' common and critical real-world experiences from social media data (Olteanu et al., 2017).
Some of the selected articles apply multiple quantitative methods. Such articles include those taking advantage of social media's digital trace data combined with J o u r n a l P r e -p r o o f conventional quantitative research methods. For example, Riedl et al. (2013) used two types of quantitative methods: conducting an online survey and analyzing the digital log about social media usages. Nichols et al. (2013) conducted a content analysis on online reviews on social media and a survey study.

Qualitative studies
Likewise, qualitative methods have been widely and continuously used by HCI scholars to study social media interaction. Interestingly, in the last years, qualitative methods have been the most prevalent in our corpus. We assume this trend corresponds to the respective studies' topics (we will discuss this point in Section 4.4).

Mixed-methods studies
Among the 22 mixed-methods research articles in our corpus, some articles explicitly describe why they used mixed methods while some implicitly or shortly stated their decision about the methods chosen. According to mixed-method literature (Greene & Caracelli, 1989), the purposes of conducting mixed-methods can be categorized into five types: (1) Triangulation (seeking convergence, corroboration, and contextualization of results from different methods studying the same phenomenon), (2) Complementarity (seeking elaboration, enhancement, illustration, clarification of the results), (3) Development (seeking to develop or inform the other method), (4) Initiation (seeking J o u r n a l P r e -p r o o f the discovery of paradox, contradiction, and new perspectives), and (5) Expansion (seeking to extend the breadth and range of inquiry). With these five categories in our mind, we investigated the mixed-methods paper in our corpus.
Most studies in our corpus chose mixed methods with the aim to have a richer and more concrete understanding of people's online interactions by contextualization, which can be related to Greene & Caracelli's (1989)  (2017) explained that they asked: "why there is a gender gap" in the qualitative study.
Similarly, Laumer et al. (2017) explained that their qualitative study identified "why and how interventions can positively influence" (p. 988) user acceptance. These whys and hows might have motivated scholars to use mixed methods and contextualize quantitative data with qualitative data. In other words, we can assume that a quantitative J o u r n a l P r e -p r o o f study is used to capture a trend that appeared at the surface of a technological phenomenon, while qualitative research helps to examine why the observed trend occurred by contextualizing it within an ecosystem that goes beyond technological environments.

Relations of Topics and Methods
We also looked at the relationship between topics and methodologies in our corpus. Figure 6 visualized the topic frequencies in connection with the methodology applied in the selected studies. For supplemental information, Table A2 in Appendix showed the term frequencies and method selections. . In this figure, the quantitative and qualitative approaches' topics, whose appearances are three or above, and the mixed-methods topics, whose appearances are two or above, are shown. For detailed term frequencies, see Table A2 in Appendix. See 5.1 for more discussion on the relationships between methods and topics of our corpus. Figure 6 presents compelling trends among the three methodologies. It shows that each method has unique connections with topics. Here we provide descriptive findings of the relationships between topics and methods driven from Figure 6 while we will provide a deeper discussion on these relationship trends in Section 5, the Discussion.
The results drove from Figure 6

Data Collection Methods
In this subsection, we turn our attention to the data that scholars have collected in the articles of our corpus. In doing so, we manually checked each work's data usages for their analysis. Through the observations, we found eight types of data in our corpus: J o u r n a l P r e -p r o o f ・ survey and questionnaire data, (e.g., survey results, questionnaire answers) ・ interview data and focus group discussion data (e.g., transcriptions) ・ social media content data (e.g., communication contents and user profile contents) ・ workshop data and fieldwork data (e.g., ethnographic data) ・ diary data (e.g., user report data, cultural probes) ・ digital trace log data (e.g., number of likes, shares, and connections with users; when and how many times user login and stayed in social media) ・ physical trace log data and biometrics data (e.g., eye-tracking data) According to how researchers collected data, survey and questionnaire data, and social media content data can be regarded as quantitative and qualitative data. Figure 7. Distribution of data types in the corpus (in absolute numbers). A study can have more than one type of data. The numbers in brackets represent the total numbers of articles in each data type.
As summarized in Figure 4, the most used data type in the corpus is surveys and questionnaires data, followed by interview and focus group data. Whereas physical trace J o u r n a l P r e -p r o o f log data and diary data are the least used data categories to study social media interaction.
Among quantitative studies, surveys and questionnaires are the most used data.
Interestingly, the second most used type of data for quantitative methods is digital trace log data, followed by social media content data, which are unique to the domain of social media analysis. Digital trace log data includes the number of likes, users' network data, and usage logs (e.g., when a user logs in and what feature she or he used for how many minutes).
Among qualitative studies only, interviews and focus group discussion are the most used data type. The second most used data type, was social media contents data, including users' post contents data, comments data, photo data, and video data, and primarily used for content analysis.
The most common data type for mixed studies is data from surveys and questionnaires while the second most common data type is interview data.

Relationships between data types and methods over time
Additionally, we here zoom in to the relationships between data types, methods, and three phases that we find through topic analysis (see Section 4.2). Figure 8 summarizes the distributions of data types and methods per phase. In Figure 8, we allowed one article to be counted more than twice if the article use more than one type of data.
Among the early studies (2008)(2009)(2010)(2011)(2012), interview and focus group discussion data are the most common data types, followed by survey and questionnaire data. Among articles in the growing stage (2013-2016), survey and questionnaire data types became more popular among HCI scholars. Also, during the growing stage, various types of data types, including interview and focus group discussion data, digital trace log data, social media content data, and workshop and fieldwork data, is used, reflecting various types J o u r n a l P r e -p r o o f of research topics covered by the articles of this phase. Among the latest studies (2017-2020), scholars use social media content data, and workshop and fieldwork data more, reflecting the rise of popularity of qualitative methods in the corpus in this phase.
Interestingly, despite the increasing availability of digital trace log data within quantitative research, the use of digital trace log data is less part of the latest studies compared to the growing phase. This might be corresponding to increasing concerns on privacy and other socio-technical issues. In addition, quantitative research seems to slow down as scholars understood better user behaviors, which led to design suggestions that need qualitative perspectives to capture more comprehensive user actions on social media. Figure 8. Distribution of data types in the corpus in each phase (See 4.2 for phase description). A study can have more than one type of data. The numbers in parentheses represent the total numbers of articles in each data type per phase.

Summary of the results
Our systematic literature review found that studies in HCI on social media interaction have continuously grown since 2008.
In terms of topic trends (RQ1), we distinguish three phases. Early studies ( Regarding the data types in the corpus (RQ3), we also observe that HCI researchers used both qualitative and quantitative data (i.e., interview, questionnaire, and survey data). In addition, social media unique data (i.e., digital trace log data and social media content data) have been used. Relation between method and topics revealed the trends of topic appearances in each method. Topics related to "design" and specific groups and communities have been mainly covered by qualitative methods while behavioral data-related topics seem to be covered more by quantitative methods.
In the next section, we discuss these results and point to knowledge gaps.

Discussion
Our work distinguished three phases in the study of social media interaction in HCI.
Namely, the early phase (2008)(2009)(2010)(2011)(2012), the growing phase (2013)(2014)(2015)(2016), and the latest phase (2017-2020) (see Section 4.2). In this section, we discuss our findings according to these three phases. In Section 5.1, trends of methodology, topic, and data are discussed. In Section 5.2, less-studied research areas are discussed. Lastly, in Section 5.3, we list the limitations of our systematic literature review and provide our insights regarding future studies.
J o u r n a l P r e -p r o o f

Methodological, topic, and data trends
Throughout all three phases, the majority of the articles were built upon either a quantitative (67 articles) or qualitative (60 articles) approach. While both quantitative and qualitative approaches have been used during the early phase (2008)(2009)(2010)(2011)(2012) and the growing-stage phase (2013)(2014)(2015)(2016), qualitative approaches seemed to be increasingly applied during the latest phase (2017-2020), (particularly in the last couple of years). In contrast, mixed-methods approaches have been less conducted throughout the three phases (22 articles in all phases). This methodological trend seems to be interconnected with topic trends.
The identified topic trends indicate that there has been a shift from "understanding aggregated data on people" to "designing inclusive online spheres" through exploring specific groups of people on social media with qualitative and mixedmethods approaches ( Figure 6). During the early phase, users in social media have been rather studied in a general and aggregated way. In the growing phase, critical aspects of social media, e.g., studying users' health and privacy-related issues, have started to gain more attention. In the latest phase, the conceptions of target users for the analyses have become increasingly diverse. In detail, to study a specific community and group, qualitative approaches have been chosen mainly to define them precursory ( Figure 6).
Regarding data trends, HCI scholars have used both qualitative and quantitative data (e.g. surveys and questionnaire data). In our corpus, unique quantitative data in studying social media are digital trace and social media content data. Similar to the trend in wider study fields related to social media (Snelson, 2016), however, these digital trace and social media content data are not the most chosen data throughout all the phases. Data collection among mixed-method approaches in our corpus is also similar to those in broader social science research fields, in which more than 80 percent of mixed-method research used surveys, followed by interviews (Bryman, 2006).
Notably, mixing digital trace data and qualitative data is still in its infancy but has large potential as it can overcome some of the shortcomings of studying social media data (Bakhshi et al., 2016;Chan et al., 2014;Dugan et al., 2008;Kairam et al., 2016;Lampe et al., 2014;Laumer et al., 2017;Pielot et al., 2014). The potential of digital trace data includes its ability to capture the granular level of interactivity and dynamics of longitudinal changes in interactions (Shibuya 2020), and to discover neglected outliers and isolates can deepen our understanding of human communication behavior (Choi, J o u r n a l P r e -p r o o f 2020). However, social media trace data has been criticized as being oversimplifying and flatting constructed, situated, and timely aspects (Ledford, 2020, Luka et al., 2018boyd & Crawford, 2012;Strasser & Edwards 2017) as well as the ways the data are gathered, processed, and provided (Choi, 2020). Mixed methods can provide one of the ways to construct an integrated, reflexive stance toward the relationship between data captured through technology and the underlying social phenomena (Bornakke & Due, 2018;Charles & Gherman, 2019;Evans & Aceves, 2016;Goggins et al., 2013;Hamm et al., 2020). We observed this conception in our corpus too. In the corpus, for example, Laumer et al. (2017) leveraged log data to capture social media usage trends of office workers while their interview data helped to dig down the reasons behind the users' behavioral changes.
In summary, this systematic literature review revealed that mixed methods have been the least conducted methods in the selected works. Nonetheless, the mixed methods applied in the included articles clearly present the benefits of their method choices. A major benefit is the contextualization of quantitative data by combining it with qualitative data. Further, it is engaging with the broader context of the data for investigating people and related phenomena both inside and outside of the social media environments.
Note that we here do not try to show which methodology is better but to explore alternative ways to establish an essential dialogue between methods and research aims in HCI. For a long time, HCI has emphasized: "context" which is all the relevant temporal, locational, and situational features of the "surround" within which those human systems are embedded, to understand phenomena (McGrath 1995, p.153). Given the complexity of socio-technical research topics, i.e., privacy, security, mental-health, cyberbullying, democracy, and under-resourced communities, the advantages of using J o u r n a l P r e -p r o o f mixed-method approaches would be beneficial to study growing fields of social media interaction. We do not intend to devaluate quantitative methods or qualitative methods.
Rather, we believe that methods should be determined for the issue under study and research questions. In general, as stated by previous research, quantitative methods usually emphasize words rather than quantification in the collection and analysis of data while qualitative methods usually emphasize quantification in the collection and analysis of data (Bryman, 2016). On the other hand, Qualitative methods are often criticized for not meeting the quality standards of quantitative research, such as transparency and comprehension of interpretations and results (Flick, 2018). Similarly, drawbacks of using mixed methods include that it does not have universal agreements on what constitutes qualities and criteria by which it should be judged (Fàbregues et al., 2019). In addition, mixed methods may consume more resources and not all researchers have skills and can afford to conduct them (Frauenberger and Purgathofer, 2019;McDonalds et al., 2019).
In the next subsection, we unpack the insights gained regarding this under-used approach, mixed methods, by reflecting on philosophical underpins.

Toward an alternative philosophical paradigm for the study of social media interaction in HCI
To understand methods trends in HCI research, philosophical positions are critical to consider. In particular, approaches and uses of mixed methods align with the scholarly discussion around methods in general. Generally, quantitative research connects with the philosophical paradigm of post-positivism, which the elements of benign reductionistic, logical, empirical, cause-and-effect-oriented, and deterministic based on prior theories, considering that all-cause and effect is a probability that may or may not Mixed methods try to embrace both perspectives by dialectical pragmatism (Teddlie and Johnson, 2009) in which researchers must carefully listen to, consider, and dialogue with qualitative studies' subjective reality (e.g. individual, personal, experiential) and intersubjective reality (e.g. social structure, languages, institutions, nonmaterial cultures), and quantitative studies' objective reality (e.g. material/physical things, physical/causal process), and learn from and reconcile the natural opposing tensions between these perspectives (Teddlie andJohnson 2009, Johnson andOnwuegbuzie, 2007;Johnson and Grey, 2010). The mixed-methods design takes an anti-dualistic stance, in where individuals view the world in terms of continua rather J o u r n a l P r e -p r o o f than binaries and positions as ontological pluralism which fully acknowledges the "realities" discussed in both qualitative and quantitative research, and it rejects singular reductionists and dogmatisms (Johnson and Gray, 2010). In our corpus, limited numbers of articles build upon the dialectical paradigm, yet it seems to gain more attention during the growing phase. In our corpus, for example, Chan et al. (2014) combined large-scale survey, task-based diary study, and reenactment sessions to provide "a broad description of the kinds of contexts in which co-watching occurs, the social influence on when and what to watch, and co-watching of TV inside the home". Pretorius et al. (2020) explores key design factors for online mental health resources that can support young people's help-seeking by using mixed-methods.
In the wake of the post-positivist philosophy of science, feminist contributions to these debates and applies to HCI, in which scholars have increasingly engaged with matters of social change and taken on scientific and moral concerns (Bardzell and Bardzell, 2011). Feminist HCI is concerned by the study of marginalized or vulnerable groups, e.g., persons of color, the LGBTQ+ community, the disabled, low socioeconomic-status individuals, the global south and not only women (D'Ignazio and Klein, 2019;Bardzell, 2010;Costanza-Chock, 2020: Crenshaw K. Black Feminist thought). According to Star (1990), feminism as a method creates robust findings through the articulation of multiplicity, contradiction, and partiality, while standing in a politically situated model collective, to understand core problems in information systems design: how to preserve the integrity of information without a priori standardization and its attendant violence. Mixed methods are recommended to pursue feminist approaches (Bardzell and Bardzell, 2011;Hesse-Biber 2010;Sprague and Zimmerman, 1989;Reinharz and Davidman 1992) while qualitative methods have been recognized as more feminist than quantitative approaches for a long time (Hesse-Biber, J o u r n a l P r e -p r o o f 2010). The feminist approach can integrate seamlessly and productively in all HCI research stages of the design process, including user research, prototyping, and evaluation (Bardzell, 2010). Bardzell and Bardzell (2011) argued that a feminist methodology will not lose sight of the fact that methods must be chosen and used based on assumptions, commitments, and goals, and that these should cohere and be acknowledged as a methodology.
In our corpus, we found no articles explicitly building upon the feminist approaches with mixed methods. However, given the social media interaction' recent research topic trends, in which marginalized groups and communities have been addressed, our literature review sheds light on the potential to apply feminist approaches in the domain.
In the light of the feminist approach, the designs of social media providers have been found to have limited affordances for marginalized communities to organize and foster movement building (Costanza-Chock, 2020). In addition, D'Ignazio and Klein (2019) showed how data and data science are representative of unequal power and the need to "trace biased data back to their source" (p.13), including the economic system around data that wielded unjustly and be monetized. As D'Ignazio and Klein (2019) pointed out, there is the need to re-emphasize what Haraway (1991) called the "situated knowledge" in the social media and online spheres and acknowledge that there is no neutral or objective data and the context is essential for conducting accurate, ethical analysis.
Furthermore, our literature review underscored the benefits of using mixed methods to go beyond one dimension of social media data to the various dimensions of the context behind it (Arora et al., 2018;McGrath, 1994 (Bardzell, 2010).
On this note, one of the topics that future studies can pursue with the feminist approach is approaching "invisible people," such as a person using social media does so on behalf of another person, a person create a one-time tentative "throw-away" account to post or question something without connecting their identity, multiplicity for considering users (Baumer and Brubaker, 2017), users who are only "listening" to online discourses without "having a voice"  or consciously choosing not to participate in discourse and activities on social media (Portwood-Stacer, 2013).
These wide ranges of people should be incorporated more in social media interaction studies in HCI by embracing those marginalities are not pre-given but the systemic consequences of new socio-technical arrangements that create new and fluid forms of exclusion (Bowker et al., 2016).
Lastly, considering the rise of complex socio-technical research topics in HCI, it is surprising to observe less mixed research with feminist approaches so far. Further research has yet to be done to better understand the skillsets of HCI researchers, and we recommend revising the curricula of study programs so that future researchers and practitioners can write software but also critically reflect on their positionality and contribution to society (Frauenberger and Purgathofer, 2019).

J o u r n a l P r e -p r o o f
In sum, our work helps to understand methodological, topic, and data trends according to the following three research phases in the study of social media interaction in HCI: (1) the early phase (2008)(2009)(2010)(2011)(2012) where the majority of the studies were built upon quantitative post-positivism paradigms or constructivism paradigms, (2) the growing phase (2013)(2014)(2015)(2016) where dialectic paradigms based studies seemed to become one of the major approaches, and (3) the latest phase (2017-2020) where constructivism-paradigm-based studies seem to become the majority.
Considering the various topics in recent years and foci on underprivileged groups, we argued that HCI researchers could gain a better understanding of those complex phenomena by considering using mixed-methods research with feminist approaches, which have been less chosen by HCI scholars to study social media interaction.

Limitations
This study has the following limitations.
First, our corpus is configured by the source database selections made. For instance, papers from other conference venues related to HCI and access to other databases, such as IEEE Xplore, have not been researched and limit our search. Similarly, our corpus's keywords criteria selection and the CCS selection have the possibility of not covering all existing social media platforms and related works.
Second, the screening of the articles in the corpus was manual and the way to assess a paper's relevance to the target topic and other relevant criteria could have included errors and misinterpretation.
Third, only English was used as the target language may have also affected the results.

J o u r n a l P r e -p r o o f
Fourthly, in this study, topics are drawn from author keywords (See Section 3.4). This approach cannot cover topics without the authors' explicit intentions. If authors are, for example, looking at motivations in their study but don't put it as an author keyword, it will not appear in our discussion of topics. For the analysis of authors keywords, we have tried to make a word cloud with two-word-topics, but it was not possible to aggregate meaningful groups from two-word-topics, because topics were too diverse, for this reason, we stuck with the one-word-cloud for visualization and interpreted these figures only together with the manual reading. J o u r n a l P r e -p r o o f