Digital platforms as socio-cultural artifacts: developing digital methods for cultural research

ABSTRACT Social media platforms are increasingly looked at as means to investigate social phenomena like collective events, issues or causes. Digital methods – techniques exclusively focused on online data and shaped by the environment hosting these data – have become part and parcel of these investigations, often approaching platforms as hybrid assemblages of users, infrastructures, and algorithms. In its ‘online groundness’, this type of digital methods research, however, often tends to skim over the socio-cultural, contextual dimension of both wider social phenomena and social media uses and practices. In this paper, we advance a threefold contribution aimed at both sparking future efforts to address this limitation and aligning digital methods inquiry with contemporary epistemological debates that counter universalistic views of platforms and data. First, we question the degree to which digital methods can inform social investigations of collective events, issues or causes. Second, we advance a digital methods paradigm that addresses platforms as socio-cultural artifacts rather than hybrid assemblages. Finally, by reflecting on how we accessed, handled, and explored 9,000 Instagram visuals and around 400,000 Facebook comments to understand influences on middle class understandings of food consumption in Brazil and South Africa, we illustrate a way to design culturally sensitive digital methods research built on ‘quanti-quali’ practices.


Introduction
In this paper, we look at the potential of digital methods for culturaland cross-culturalresearch. With 'digital methods' we refer to techniques that 'follow the medium' and focus on 'born digital data' (Rogers, 2019), namely, medium-specific methods thatusually exploiting computational techniques or toolsallow researchers to collect and/or analyse data that originated online.
We argue that within discussions focused on the extent to which digital methods can enhance social research, medium research and/or a mix of the two (e.g., Marres, 2015;Pearce et al., 2020;Rogers, 2019), further attention should be drawn to the way these methods are relevant to the study of the contextual dimension of collective phenomena. This paper advances a threefold contribution. First, it problematizes the way digital methods can inform social investigations of collective events, issues or causes. Second, talking to debates that counter universalistic approaches to platforms (Chan, 2013;Steinberg & Li, 2017) and data (Milan & Treré, 2019), it advances a digital methods paradigm that addresses platforms as socio-cultural artifacts rather than hybrid assemblages. Finally, it illustrates a way, among others, to design culturally sensitive digital methods research built on 'quanti-quali' practices (e.g., Rogers, 2019, p. 211;Venturini et al., 2015).
In the following sections we frame the emergence and development of digital methods as techniques designed to access and analyse platform data to understand both platforms as mediaand social phenomenaas intrinsically related to everyday platform use. We then address contemporary debates on the centrality of cross-platform multimodal approaches to develop comprehensive medium and social research of collective phenomena. Finally, we draw from our own work with SCArFEthics (Sustainable Consumption, the middle classes and AgriFood Ethics in the Global South), a multi-country project researching middle class understandings of sustainable food consumption in Brazil, China and South Africa, to show how cross-cultural digital methods research can be put in practice.

Digital methods across 'open land' and 'walled gardens'
Web 1.0, namely the 'hyperlink web', could be easily 'scraped' and 'crawled', with automated tools (e.g., HTTrack) allowing researchers to massively download (i.e., scrape) web data and map (i.e., crawl) hyperlink connections. Crawling, in particular, became essential to develop (hyperlink) network analyses (Park & Thelwall, 2003) aimed at mapping, for instance, protest or advocacy networks (e.g., Vicari, 2014Vicari, , 2017 or, more broadly, issue networks (e.g., Marres, 2015;Marres & Moats, 2015;Rogers, 2019, pp. 43-49) and issue publics (e.g., Bruns, 2007). These early digital methods had the fascinating strength to allow the exploration of different 'spaces' in and of the web. In other words, they made users' potentially web-wide navigations of issues, events and topics accessible and traceable. Ultimately, as part of the computational turn in the social sciences, digital methods seemed to grant an unprecedented big-scale traceability of collective phenomena (Venturini et al., 2015).
'Platformization' (Helmond, 2015), however, namely the increasing influence of social media on web data flows and online user practices, marked the shift from the 'hyperlink web' (i.e., web 1.0) to the 'social web' (i.e., web 2.0), bringing the sudden proliferation of 'walled gardens' (i.e., platforms) where once was 'open land' (i.e., the web) (see, for instance, Plantin et al., 2018, pp. 301-394;Rogers, 2019, p. 204). In infrastructural terms, this translated into the transition from an open and decentralised architecturethat allowed 'uniform access by humans and computational agents through browsers and other web-based apps (e.g., Google's web crawler)' (Plantin et al., 2018, p. 302)to the contemporary gateway system of platform APIs. In this gateway system, APIs dictate the terms for the flow of data in and out of platforms, control the landscape of data access and exchange, and ultimately centralise a process that was once decentralised (see also Pearce et al., 2020, pp. 162-163).
It is exactly this transition from 'open land' to 'walled gardens'coupled with digital methods' medium-dependency and rising concerns related to user privacy in the context of social media research (Nissenbaum, 2009)that ultimately resulted in (1) the proliferation of single-platform digital methods studies (see, for instance, Özkula et al., 2022) and (2) the narrowing of their object of investigation to 'events, disasters, elections, revolutions and social causes' (Rogers, 2019, p. 221). This happened via a concatenation of events: the early API gateway system allowed researchers to generate big datasets from almost any platform, bolstering the emergence of single-platform studies. However, some of these early studiescoupled with data breaches like in the Cambridge Analytica scandal (Bruns, 2019;Venturini & Rogers, 2019)raised ethical concerns in both the academic circles and the wider public, with platform companies reacting with the implementation of often questionable restrictions on data access 1 (Walker et al., 2019). Following these restrictions, studies of the self gradually resorted to 'post-API' (Perriam et al., 2020) methodological scenarios. These scenarios have included resorting to the web scraping techniques that were common with web 1.0 research or identifying innovative ways to access data, for instance circumventing the ephemerality of Instagram stories by accessing their repurposed version on Youtube (Bainotti et al., 2021).
Ultimately, platform restrictions have narrowed the application of API-dependent data collections to studies interested in collective phenomena. In this transition, Twitter has remained the only Western mainstream platform still allowing non-platform affiliated researchers to generate big datasets in a reasonably accessible way (Weller, 2015, p. 284)albeit with a number of standing limitations (e.g., on historical data analysis or sampling design) (Bruns, 2019).
From single-platform to cross-platform and multimodal digital methods research Single platform digital methods research is well suited to address 'medium research questions' (Rogers, 2019, p. 220): it can provide insight into a platform's overall culture and/ or internal ecosystem (Burgess & Baym, 2020;Burgess & Green, 2018;Murthy, 2018) or produce deep understanding of a specific element within it, for instance, a platform's local vernacular (e.g., Gibbs et al., 2015).
When the research focus shifts from a platform to a collective phenomenon, namely from a medium to a social research question, single-platform research can still provide insight into the way a platform relates to that phenomenon (e.g., the role of Twitter for the Black Lives Matter social movement). However, neither can it tell us how different platform affordances and uses contribute to it (e.g., Twitter versus Instagram use motivations among Black Lives Matter activists) or comprehensively picture the phenomenon itself (e.g., Black Lives Matter as a social movement).
In fact, the application of single platform research to the study of collective phenomena assumes that 'social media' can be used as a 'collapsed category' (Rogers, 2019, p. 214) an assumption based on two wrong premises: (1) that all platforms have equal affordances and meet the same user needs and (2) that all platformsinclusive of their digital objects (e.g., hashtags, likes) and overall user cultureswork in the same way (Mayr & Weller, 2017, p. 111). A body of research has now long rejected the first premise. Early social media research showed, for instance, that in its early years Facebook was used primarily for entertainment and sociability while instant messaging (IM) was more geared towards long-term relationship maintenance (Quan-Haase & Young, 2010). Research into worlwide protest events of the early 2010s showed that Twitter and Facebook played different roles for activism, with the first often enhancing visibility and live organising and the latter allowing emotional bonding (Gerbaudo, 2012). Digital methods scholars have been particularly active in refuting the second premise by discussing how digital objects like hashtags work differently across platforms, with Twitter users, for instance, hashtagging their posts much less frequently than Instagram users (Rogers, 2019). More broadly, platform studies have explored the way user cultures depend on device cultures, that is, on how platforms position or recommend content differently, based on often obscure algorithmic choices. In August 2014, for instance, following the killing of Michael Brown by police officer Darren Wilson, the protests in the US city of Ferguson became a trending topic for Twitter users around the world while no Ferguson-related content was being displayed by Facebook news feeds (Tufekci, 2017, pp. 154-156). As a consequence, Twitter users would learn about police violence and protests in Ferguson while Facebook users would not.
The issues so far discussed have led to a renewed effort to develop digital methods designs for the study of collective phenomena as much as possible in an 'open land style', namely in cross-platform research designs. In practical terms, the call for crossplatform digital methods studies suggests focusing on 'which content is co-linked, inter-linked and/or cross-hashtagged' (Rogers, 2019, p. 219), that is, on tracing how content items are relevant toi.e., circulate acrossdifferent platforms (see Burgess & Matamoros-Fernández, 2016;Driscoll & Thorson, 2015;d'Andrea & Mintz, 2019).
Evidence produced by cross-platform studies is indeed still affected by 'platform bias', namely by the unaccessible side of device cultures (e.g., algorithmic recommendation systems; API rate limits, data filtering and overall functioning)a bias that should be either addressed with 'practical precautions' (Venturini et al., 2018) or studied as part of the collective phenomenon being investigated (Marres, 2015). Pearce et al. (2020), for instance, highlight how digital methods research based on text-based queries often overlooks visual cultures (Leaver et al., 2020;Serafinalli, 2018), which are now central to social media practices and collective phenomena more broadly. Ultimately, through Pearce et al.' s (2020) work, the call for cross-platform research has stretched to incorporate a quest for multimodal techniques of data collection and analysis.
In sum, contemporary conceptualisations of digital methods suggest that these methods can best help us investigate collective phenomena via crossing platforms' 'walled gardens' and taking into account each platform's user culture as informed by its multimodal local vernacular. We argue that digital methods' potential for the analysis of collective phenomena is actually broader.
Platforms: hybrid assemblages or socio-cultural artifacts?
The earliest bulk of 'Platform Studies' research framed platforms primarily in computational terms, as infrastructures 'that allow developers to work creatively on them' (Bogost & Montfort, 2009), and can be 'reprogrammed and therefore customized by outside developersusers' (Andreessen, 2007). In their cautious defence against accusations of technofetishism, Bogost and Montfort (2009) defined platforms and their investigation as 'about the connection between technical specifics and culture'. A decade later, however, while ultimately refining a digital methods approach for the analysis of collective phenomena, Pearce and colleagues still pictured platforms in a way that has much more to do with 'technical specifics' than 'culture': 'social media platforms are hybrid assemblages of users, algorithms, and data (among other things)' (2020, p. 164, emphasis added). The label 'hybrid assemblages' probably best summarises the way platforms are often addressed in digital methods studies interested in collective phenomena: as disconnected from local contexts, cultures, and places.
Looked at from the outside, natively born digital data are indeed hybrid assemblages: lists of algorithmically defined and/or personalised trending objects, recommended users, news feeds. But while perfecting their way to follow these and other aspects of the medium, can digital methods also start incorporating more of the contextual 'culture side' of platforms, that is, more of their users' lived experience of collective phenomena through and 'inside' platforms (see, for instance, Vicari & Murru, 2020)? Miller and Slater's words still best summarise what we are pointing at here: we are not simply asking about the 'use' or the 'effects' of a new medium: rather, we are looking at how a specific culture attempts to make itself a(t) home in a transforming communicative environment, how they can find themselves in this environment and at the same time try to mould it in their own image. (2000, p. 1) Recent developments in the field of platform studies have indeed pointed at the limitations of looking at platforms, and at the data flowing on them, through the lens of 'digital universalism' (Chan, 2013) and/or 'data universalism' (Milan & Treré, 2019), namely by presenting technological affordances and data practices as universally valid while primarily drawing on Western or 'Global North' contexts. These limitations have been highlighted by research interested in the 'regionality of platforms' (Steinberg & Li, 2017), that is, in the way platformswith their uses, cultures and valuesare shaped by contextual factors (see, among others, Plantin & De Seta, 2019;Wang & Lobato, 2019;Willems, 2020). Willems, for instance, has introduced the concept of 'relational affordances' to shed light on the need to overcome the limitations of current academic debates that 'focus on the intrinsic features of technology […], thereby neglecting the way in which broader environments and contexts shape the use of technology' (2020, p. 4). Similarly, critical research in the area of big data is pointing at the way mainstream readings of datafication tend to annihilate contextual heterogeneity, entirely overlooking cultural specificities (Milan & Treré, 2019).
Talking to these critiques and drawing from critical digital studies, we suggest that digital methods investigations of collective phenomena should address platforms as socio-cultural artifacts rather than hybrid assemblages, namely as 'products located within pre-established circuits of discourse and meaning' (Lupton, 2014, p. 610). In doing so, we contribute to the emerging scholarship that situates the digital methods paradigm within cultural research (e.g., Caliandro & Gandini, 2016).

Digital methods for cross-cultural research: understanding sustainable food consumption in Brazil and South Africa
Bridging methodological debates focused on cross-platform studies of collective phenomena and epistemological reflections on the socio-cultural specificities of digital data and platforms as part and parcel of wider social phenomena, we advance considerations on how digital methods could help us grasp the contextual dimension of collective events, issues and causes. We draw from a multidisciplinary project focused on middle class understandings of sustainable food consumption in contexts different from the 'Global North'. The project's ultimate goal was that of providing insight into influences and practices in 'developing countries' (UN, 2022) where there is robust evidence of growing middle classes 2contexts whose growth is often framed as both an economic resource and a threat to environmental sustainability. The project was a collaboration between Brazilian, Chinese, South African, and UK Universities. Our specific work package ran from 1 July 2019 to 30 June 2020 and aimed at collecting and exploring social media data to help delimit and evaluate influences in the wider context of food-related discourses in Brazil and South Africa. 3 We originally aimed at focusing on Facebook and Twitter for their cross-country comparability, data accessibility and respective affordances. The strengths and weaknesses of this choice will be addressed below.
The next sections are not meant to present the project or our work package's findings, but rather to reflect on the way we generated and explored social media data in the context of cross-cultural research. We specifically reflect on issues and choices related to (1) accessing social media data relevant to the broader process (i.e., food discourses) and the specific contexts (i.e., Brazilian and South African middle classes) being investigated, (2) the affordances of quantitative data collection strategies, (3) the relationship between big and small data, between macro, meso, and micro levels of investigation, and between 'quanti' and 'quali' research practices and (4) the affordances of 'quali' techniques of data analysis. In sharing these reflections, we aim at a twofold contribution. First, we offer a potential road mapamong othersfor researchers designing cross-cultural research of collective phenomena. Second, we aim to sparke digital methods debates in line with epistemological interpretations of social media platforms and data as socio-cultural artifacts rather than hybrid assemblages.

Research design: developing 'quanti-quali' research trajectories
When it comes to unpacking design strategies, digital methods theorisations draw upon the 'quali-quantitative oligopticon', which suggests that the study of digital traces can deliver a vision of collective phenomena that 'spans from the tiniest micro-interaction to the largest macro-structure' (Venturini & Latour, 2009, p. 99). Contemporary interpretations of this vision (e.g., Rogers, 2019, p. 211; Venturini et al., 2015) suggest that digital methods designs are actually better suited to follow 'quanti-quali' practices by progressing from quantitative to qualitative methodological steps: after exploiting the potential of quantitative (i.e., automated) techniques for data access, collection, and handling, they can turn to qualitative practices to produce thick data analyses. Given the cultural focus of our work package, in our design we did follow a quanti-quali trajectory. The following sections and the summary provided in Figure 5 (below) detail each step of this trajectory.
Data access via query design: from people to platforms Most digital methods research starts with query design. This mainly translates into designing keyword-based platform interrogations whose results will provide insight into trends relevant to the research focus. To 'tease out differences and distinct hierarchies of societal concerns across cultures', Rogers (2019, p. 38) suggests identifying an 'ambiguous query'namely a query based on a keyword without explicit political leaning -, translating it in native languages, and applying each translated version to the relevant platforms. Or in the case of search engines, to their local version (e.g., Google local domain). Applying this process to data collections from mainstream social media platforms, however, assumes the homogeneity of keyword relevance across cultures and collapses language and socio-cultural communities. For instance, according to this process, to study influences on middle class understandings of sustainable food consumption in South Africa and Brazil, we could use a number of relevant politically neutral keywords translated in Portuguese and English to query social media platforms. This, however, would generate extremely hybrid digital datasets: our data would indeed be related to sustainable food consumption and in the native languages of Brazil and South Africaamong many other placesbut neither would they all be necessarily relevant to Brazilians and South Africans 4 nor would they systematically resonate to the middle classes in the two countries.
To address these limitations, in our work package we gathered information about celebrities, organisations/campaigns and keywords/hashtags locally seen as influential in individuals' understanding and choices related to food consumption. To make sure our information resonated to the local publics we were investigating (i.e., middle classes in Brazil and South Africa), data gathering was developed via a three-phase process: project members from Brazil and South Africa i.e., Prof. Rita Afonso, Geetika Anand, Cristine Carvalho, Dr Kim Coetzee, Dr Shari Daya, Dr Megan Lukas, Dr Luiza Sarayed-Din and Rebecca Whitehead carried out interviews with stakeholders (phase 1) and with members of the local middle classes (phase 2). Finally, they provided their local insight to further guide our query design (phase 3). Table 1 provides an example of how we organised this initial process: columns B-D respectively report the celebrities, organisations/campaigns and keywords/hashtags mentioned in any of the three phases described above. Column E reports information about any medium mentioned by the interviewee (s) in relation to the items in columns B-D. Columns F-H report information about the source of the information populating columns B-D (e.g., interview code or team member initials). Column I indicates the social media platforms to which the items of columns B-D were deemed relevant by the country teams.
This three-phase process was extremely important, not least because it soon showed that our initial plans to focus on Facebook and Twitter as key data sites did not resonate with the contexts we aimed to investigate. Both the interviews with stakeholders and middle-class users and our conversations with the country teams suggested that Facebook was indeed very central to the social media strategies of organisations and very popular with users in both Brazil and South Africa. Meanwhile, Twitter, tempting as it was to research due to data accessibility, was often a secondary part of the social media strategy of the activist organisations interviewed in phase 1 and rarely mentioned unprompted by consumers in phase 2 interviews in either country. Instead, Instagram soon emerged as a consumo sustentável DPJ popular platform, especially in Brazil. As a consequence, we incorporated Instagram in our research design. In this paper, we specifically draw on our work with Instagram and Facebook data. The two resulting lists of celebrities, organisations and keywords (126 items for Brazil and 85 items for South Africa) were then tested for relevance on Instagram and Facebook. More specifically, we (1) turned keywords into hashtags and tested their use on Instagram and (2) used the names of celebrities, organisations and campaigns to identify relevant official Facebook pages. The choice to focus on Instagram hashtags and Facebook pages was inspired by two main reasons. First, we chose digital objects (i.e., hashtags, pages) relevant to each platform. Second, given that our work addressed a social question rather than a medium one, we were not bound to compare elements (e.g., celebrity pictures) across the two platforms but rather to collect and investigate content relevant to food consumption from the two platforms. Finally, we decided to rely on Instagram to collect visual content (i.e., images) and on Facebook to access verbal content (i.e., user comments) because this strategywhile helping the manageability of our datasetsbore 'medium sensitivity' (Rogers, 2019, p. 221), that is, it allowed us to collect content where it is most easily produced and accessed (i.e., images on a photo and video-sharing platform and user comments on a multimodal platform that can accommodate long verbal texts).
Ultimately, for each country, starting from our original lists (Table 1), we identified 15 active Instagram hashtags and 15 Facebook pages run by celebrities or organisations. These items became our queries for data collection on the two platforms (Table 2). This preliminary phase allowed us to identify a first important difference between the Brazilian and South African contexts in relation to middle class understandings of sustainable food consumption: international influences are likely to have a very different impact in the two countries. In fact, both the resulting Instagram hashtags and Facebook pages that were to become the 'seeds' for our data collection showed a much stronger international influence in South Africa than in Brazil, with two thirds of the South African queries centring on an international hashtag (e.g., #weightloss) or page (e.g., Gordon Ramsey). The only international query for Brazil was based on British chef Jamie Oliver's Facebook page (that also appeared in the South African list).
Would we have formulated the same list of queries had we 'followed the medium' from scratch? What difference did it make that our queries were initiated by interview participants and project members in South Africa and Brazil rather than designed by us as language sensitive 'ambiguous queries' (Rogers, 2019, p. 38) chosen on the basis of engagement metrics? As a matter of fact, without the preliminary information gathering process described in this section, it would have been extremely difficult to design culturally sensitive queries because platform APIs provide filtering options that offer little more than language filtering and metrics do not tell us from where user engagement originates (outside the platform). In fact, it is exactly APIs and metrics that hybridly assemble users, content and practices. Ultimately, our final list of queries led us to access social media content that might not all have been produced by Brazilian and South African publics but that was certainly accessed by and that potentially influenced those very publics.

Data collection: capturing big data
As discussed above, following social media companies' tightening of their platforms' data access, automated data collection from Instagram and Facebook is not as straightforward and accessible as it used to be prior to the Cambridge Analytica scandal. Being unable to conduct a live and ongoing data collection over the course of our 12-month sample period, we carried out three Instagram and two Facebook snapshots. The three Instagram snapshots covered images posted on the platform in August 2019, December 2019, and April 2020, respectively. To generate each snapshot we used Instagram Scraper, a tool developed by the Digital Media Initiative to collect Instagram posts on the basis of username or hashtag queries. During each snapshot, we collected 100 post URLs and relevant metadata (i.e., author ID, timestamp, post url, media url, number of comments and likes) for each hashtag query.
The two Facebook snapshots covered user comments responding to posts published by the administrators of the selected pages in October 2019 and April 2020, respectively. To generate these snapshots, we exploited the functions of two web scrapers. We first used the Web Data Research Assistant (WDRA) (Web Science Institute, 2022) to scrape posts and relevant metadata (i.e., author ID, timestamp, post URL, number of comments, reactions, and shares) based on our page queries. Then, for each Facebook page we filtered the top 10 posts by number of comments. 5 Finally, using the commercial tool Export Comments, we scraped the comments responding to those posts.
In sum, for each country, our automated data capture exercise produced 45 (hashtagbased) collections of 100 Instagram posts and 30 (page-based) corpora of Facebook user comments. Our final data sets then resulted in 4,500 Instagram pictures and 206,363 Facebook user comments relevant to the Brazilian context and 4,500 Instagram pictures and 168,870 Facebook user comments relevant to the South African one.

Data handling as the beginning of the analytical journey: turning big data small
The quanti-quali spectrum of digital methods research is clearly exemplified in Pearce et al. (2020) work, which proposes visualisation techniques to move from macro-to meso-level analyses of social media visuals, namely, to gradually move from quanti (i.e., big/thin) to quali (i.e., small/thick) methodological steps. As our focus was on providing insight into socio-cultural understandings, we aimed to develop a methodological approach moving between meso-and micro-levels of analysis. Doing so, however, required making our data small to expand the 'quali' potential of our study. Drawing on the meso-level approach to social media visuals introduced by Pearce et al. (2020), we selected the 10 most liked images of each Instagram collection andusing Adobe Photoshopwe stacked them so that they blended into a single composite one. Composite images are useful to convey a quick impression of emerging features in small collections of visuals. We ordered the original images based on like metrics, with the most liked one on top of the stack, hence more visible. Ultimately, for each Instagram snapshot we generated 15 hashtag-based composite images. Figures 1 and 2 show the first Instagram snapshots respectively for Brazil and South Africa. Following the same rationale applied to Instagram visuals, for each Facebook snapshot we used Voyant toolsa web-based text reading and analysis environment (Sinclair & Rockwell, 2016)to generate a word cloud with the 50 most frequently used words in each of the 15 page-based corpora of user comments. In the clouds, the bigger the word, the most frequently it appeared in the corpus. Figures 3 and 4 show the first Facebook snapshot, respectively for Brazil and South Africa.
In sum, for each country, our data handling returned: . 45 Composite images, each based on Instagram visuals tagged with a food hashtag at one of three time points in our sample period. . 30-word Clouds, each based on Facebook comments posted on the official page of a celebrity or organisation relevant to food consumption at one of two time points in our sample period.
In our design, data handling constituted the beginning of the analytical journey because, while turning big data small, it allowed us to identify, visualise, and start our interpretive reflections on content with top engagement and use metrics. This process was informed by our goal to focus on leading influences on middle class understandings   of food consumption. Had our research question been different (e.g., with a focus on niche influences), this initial analytical process would have used other filtering criteria, (e.g., low engagement and use metrics).

'Quali' analytical practices
While the 'quanti side' of digital methods research is often self-evident and extensively problematised, not least given the computational nature of its data capture techniques (e.g., scraping; API interrogations), it is hard to find an equally comprehensive discussion of its 'quali side'. Rare exceptions apart (e.g., Caliandro & Gandini, 2016), the literature aimed at drawing attention to the qualitative steps within digital methods projects, often focuses more on how to turn big data small (e.g., Pearce et al., 2020, pp. 173-174;Rogers, 2019, p. 211) than on how to analytically approach small data with qualitative techniques that 'follow the medium'.
As discussed in the previous section, our research design gradually progressed from quanti to quali: from the big data sets generated in the query-based (i.e., quanti) data capture phase, we used metrics (i.e., quanti) filtering to handle data and redirect our empirical focus to small data sets as this would enable thicker (i.e., quali) analytical steps. Given that our aim was that of providing insight into influences on local understandings of food consumption, we developed a methodological design whose quali side focused on disassembling data from the hybrid assemblages returned in the data capture and handling steps and recombining these data within and across their platforms of origin. This allowed us to interpret digital traces as part of the two socio-cultural puzzles characterising middle class understandings of food consumption respectively in Brazil and South Africa. To do so, we relied on netnographic techniques, namely, qualitative techniques that seek 'to understand the cultural experiences that encompass and are reflected within the traces, practices, network and systems of social media' (Kozinets, 2019, p. 14). These techniques allowed us to develop a data-centric and inductive approach to explore and compare contextualised understandings of sustainable food consumption based on the digital traces returned by our data capture and data handling phases.
Our data visualisations (i.e., composite images and word clouds) constituted the analytical entry point: they helped us see where exactly in our data sets, we should start our netnographic 'immersion', that is, where we had to 'dive deeply into the cultural pools of others, and not merely skim along their surfaces' (Kozinets, 2019, p. 140). In fact, our 'quali' analytical work first translated into going back and forth from the composite images and word clouds to the original images, comments or posts (see circular arrows in Figure 5). This allowed us to track how specific digital traces originated from broader understandings ofor controversies aroundfood consumption. For each Facebook snapshot, for instance, this process required the following steps: 1) Tracing of prominent terms a) identifying a prominent term in a word cloud. b) retracing the use of this term in the corpus of the relevant celebrity Facebook page. 2) Reflective (and circular) reading a) reading the specific thread(s) of nested comments where a prominent term was frequently used within the corpus. b) linking these thread(s) back to the original celebrity post(s) from which they originated. c) checking metadata relevant to these threads (e.g., engagement metrics, timestamps). 3) Theme identification a) coding themes emerging in relation to a prominent term within the corpus; b) assessing theme relevance across the country's Facebook corpora (i.e, how and how often does a theme emerges across all the country's Facebook corpora?). 6 With single-platform narratives having been mapped in this reflective process, we then went on to connect data across the two platforms (see horizontal connecting arrow at the bottom of Figure 5), for instance to assess how and to what extent a theme (e.g., controversy) emerging in the Facebook page explored in the first circular step also surfaced within a hashtag community on Instagram. This second step allowed us to gradually build bigger fractions of the puzzles and to distinguish between platform bias and contextual, socio-cultural specificities. The contribution from team members living in Brazil or South Africa, or having worked there, was again key in this phase: their interpretations gave us a privileged 'cultural entrée' (Kozinets, 2019) into our data sets, namely they often provided us with the means to identify cultural subtleties like subtexts or domestic forms of intertextuality.
In the following section, we provide an example of the way we enacted the two analytical steps described here.

Tracing the politics of food consumption in Brazil
As mentioned in the previous section, we used our composite images and word clouds to guide the qualitative analytical journeys through our different sets of data relevant to Brazil or South Africa. The first part of each journey developed intra-platform (circular arrows in Figure 5) while the second progressed across platforms (horizontal arrow in Figure 5). In this section, we draw from the Brazilian data to provide a synthetic example of how the two parts of the journey developed. In this specific example we start from the Facebook data sets but the circular step we are describing here could equally start fromor run in parallel inany platform (e.g., Instagram) data sets.
Step 1: intra-platform (circular) analytical step Bela Gil is a Brazilian celebrity chefone of the most commonly mentioned in the interviews conducted in the early phases of SCArFEthics in relation to organic food and sustainable consumption. Gil is a key figure in Brazilian food discoursesactive in both legacy media programmes (de Oliveira Santos & de Souza, 2020) and online promotional work through her website, Facebook page (included in our study) and Instagram, TikTok and Twitter accounts. The first word cloud in Figure 1 derives from Gil's first corpus, which is based on comments left on her Facebook page in October 2019. Given that the cloud shows the prevalence of 'governo' (government) and 'povo' (people), we decided to track the use of these words in the corpus to gain a better understanding of their relevance. We traced comments back to an 18 October post, whereby Gil praised volunteers for helping to clean up a beach after the 2019 oil spill on the Brazilian Northeastern coastline ( Figure 6).
While reading through the nested comments responding to Gil's statement, we learned that this post had given way to a discussion that saw an anti-government front referring to failures of the federal government in relation to the northeast region in Brazil, with the specific local issue of pollution being foregrounded. The overall corpus formed by the post's nested comments, however, also showed the presence of a pro-government front where individuals lamented of people attacking the government because they had lost their 'stewardships' (seemingly an attack on Gil) or claiming that government-funded NGOs were also cleaning up the beach, implying that Gil was misrepresenting the issue. This quick netnographic wandering prompted us to reflect on the way the influence of politically active celebrities like Bela Gil might make Brazilian middle class understandings of food consumption closely intertwined with wider and possibly highly polarised political debates.
We decided to further explore the presence of an explicitly political dimension in the Facebook data relevant to Brazil by exploring other word clouds. For instance, we focused on the first word cloud based on comments left on the Movimento do sem Terra (Landless Workers' Movement)'s Facebook page (Figure 1). There, we noticed that the then imprisoned former left-wing Brazilian president 'Lula' (Lula da Silva) dominated the cloud along with 'lulalivre' (Free Lula) and 'Bolsonaro' (Jair Bolsonaro), Brazil's current right-wing president. This prompted us to check the use of these words in the relevant Facebook corpus and replicate the circular journey that we had developed for 'governo' and 'povo' in the case of Bela Gil's first corpus.
Step 2: cross-platform (horizontal) analytical step Having traced several elements in the Facebook data sets that prompted reflections on and further exploration into the influence of wider political debates on middle class understandings of food consumption in Brazil, we shifted our focus to the Instagram data sets and explored the Brazilian composite images. We soon noticed that in the first Instagram snapshot, Movimento do sem Terra re-emerged in one of the layers (Figure 7) of #semagrotoxico (#pesticidefree)'s composite image, directly linking the political Movement to pesticide-free agriculture and to former president Lula (see the 'Lula' cap worn by the individual in the picture), seemingly reconfirming trends seen in the Facebook data.
Overall, the digital traces discussed so far prompted us to reflect with SCArFEthics team members who had worked in Brazil upon how politically engaged celebrities and movements in the country relate to food consumption in conjunction with narratives relevant to wider societal issues. This dialogic reflection led us to explore social media practices not directly related to messages sent by public figures and, possibly, more specifically focused on food-related topics. In fact, we noticed that #Segudasemcarne (#MeatfreeMondays)'s first composite image also showed a message with highly political potential (Figure 8) directly related to food consumption: farming brings deforestation. We also noticed that, in the third Instagram snapshot, among the layers of the #comidaconsciente (#consciousfood)'s composite image, the Covid-19 crisis was being framed in relation to animal exploitation (Figure 9), again a representation with political potential.
Our exploration of composite images could then progress again with a new (circular) analytical step, retracing the additional data (and metadata) relevant to each image back in the context where it had been originally posted.
Overall, the reflections generated via these two analytical steps derive from the platform specifics (e.g., digital objects and engagement metrics) that we used to identify engagement and influence, namely, to 'follow the medium' and turn big data small. These platform specifics, however, were used to follow and explore pathways initiated by interview participants and constructed online by platform users, in a scenario where social media platforms can be seen as blending hybrid assemblages and socio-cultural artifacts. We do not exclude that our own work might have been affected by a  number of limitations, e.g., we could have expanded the range of terms and images to be included respectively in the (Facebook-based) word clouds and in the (Instagram-based) composite images. Or we could have incorporated multi-modal data from each platform. This, however, would not have dramatically changed the quanti-quali practices of our research design.

Conclusion
Digital methods are attracting increasing interdisciplinary attention respectively as means and techniques to approach medium and social research. These methods, however, are often used to address platforms as 'hybrid assemblages' of users, data, and infrastructures, either skimming over the connection 'between technical specifics and culture' (Bogost & Montfort, 2009) or investigating culture as in 'user cultures'. In the former scenario, scholarly work primarily focuses on the technical end of the connection, developing tools and techniques to extract data in the complex and increasingly hostile gateway system of platform APIs (Plantin et al., 2018) and refreshing existing analytical techniques (e.g., social network analysis, controversy analysis, interface methods) to mine and make sense of these data. Studies focusing on user cultures instead draw attention to the way different social media environments host and enable a range of practices where, for instance, distinct and sometimes unique vernaculars seem to emerge (Gibbs et al., 2015;Mayr & Weller, 2017;Pearce et al., 2020). Rare exceptions apart, it is however hard to find scholarly efforts aiming to reconnect these dynamics to the world beyond the platform, that is, to anything happening independently of platforms' affordances.
In this paper we argue for the importance of bridging digital methods literature focused on the study of collective phenomena (Pearce et al., 2020;Rogers, 2019) and epistemological reflections on the socio-cultural specificities of digital data (Milan & Treré, 2019) and platforms (Chan, 2013). In doing so, we stress the need to advance emerging debates that address and problematise the 'quali' side of digital methods research in cultural and cross-cultural investigation of social phenomena. Notes 1. These restrictions clearly affected Facebook (Rieder, 2015), Instagram (Allen, 2016) and Twitter (Walker et al., 2019) data access. 2. Middle classes are here understood as middle income populations.The project drew upon an understanding of 'middle class' as an identity performatively produced through consumption habits and practices (Kravets & Sandikci, 2014). 3. The overall project focused on Brazil, China, and South Africa. These countries were chosen because of their differing institutional contexts and similar rise in middle income populations over the past 20 years (Kochhar, 2015). Our work package only focused on Brazil and South Africa primarily because of the challenges in accessing data from Chinese social media platforms. 4. As a matter of fact, most social media data are not geolocation-annotated. While data science studies have been developing inference techniques to address this, results are still limited (Jurgens et al., 2015). 5. For the second snapshot we switched to reaction metrics as at that point the Web Data Research Assistant no longer provided comment metrics. 6. Given that our Instagram snapshots had visuals as analytical entry points, in the case of Instagram data, the three steps listed above required a social semiotic approach (e.g., Rose, 2016, pp. 106-146). In practice, rather than focusing on prominent terms, we drew our attention to prominent elementsor 'signifiers'in the visuals (e.g., modes to picturing food, presence of humans and non-humans in the pictures).