BWDAT: A research tool for analyzing the consumption of VOD content at home

Highlights • BWDAT is a reliable tool that facilitates the study of viewing experience on VOD platforms.• Collects users’ physiological data and users’ interactions with Netflix interface.• Non-intrusive and easy to use, successfully used in long-term projects with more than 200 users.• Includes a graphical display of the viewing sessions to help researchers visualize the data.• Includes an automatic report generator and data exporter for multiple platforms.


Introduction
Media consumption is changing rapidly in tandem with computing technology. The widespread integration of video-on-demand (VOD) services into consumer smart-TVs, computer applications, smart-devices apps, and pay-TV providers illustrates the increasing importance of this media distribution modality. Retaining viewers for as long as possible, engaging them in presumed addictive behaviors, and a consequent "vicious cycle" was, according to some scholars, the ultimate goal of linear television (Berger, 1978). An objective that seems to persist in the streaming era with the re-emergence and popularization of bingewatching (for an introduction to the concept see Jenner, 2015).
Broadly defined as the consumption of multiple episodes of a show in one sitting, the term binge-watching 1 has frequently been paired to the word of "addiction" in the press (Pierce-Grove, 2016). Furthermore, and as pointed out by Flayelle et al. (2020, p. 45), there is a "widespread assumption in the literature that binge-watching has addictive qualities". In this vein, international scholars have explored the effects that TV consumption has on viewers' social life and health-related behaviors (Basterra-Gortari et al., 2014;Exelmansa & Van den Bulckb, 2017;Shirakawa et al., 2016;Van den Bulck, 2000). In the era of Internet Streaming Content and in the exceptional current pandemic times when people are encouraged (or even forced) to spend more time at home, this area of research is more relevant than ever. For instance, the consumption of VOD content significantly increased during the COVID-19 outbreak given the recommendation by most countries to stay at home. Data shows that the time spent consuming streaming TV and video in the weekend of March 13-14, 2020 grew by more than 40% in countries like Austria and Spain. 2 To study binge-watching, international scholars 3 have relied on online surveys (Riddle, Peebles, Davis, Xu, & Schroeder, 2017;Starosta, Izydorczyk, & Lizinczyk, 2019) in order to better understand viewers' motivations and habits, and physiological effects of binge-watching. Others have used interviews and focus groups (Steiner & Xu, 2018) or the combination of both (Flayelle, Maurage, & Billieux, 2017;Panda & Pandey, 2017) to learn about the main binge-watching traits and college students' intentions towards binge-watching. Despite the limitations of selfreported data as a unique source of information to understand people's TV consumption (Mandryk & Inkpen, 2004) as it "represents a summary of the whole experience" elaborated "after exposure" (Sukalla, Bilandzic, Bolls, & Busselle, 2015), this still remains today's dominant approach.
Recently, a few researchers started to collect quantitative data such as log files of actions from streamers and Smart TVs, and browser history data (Bentley, Silverman, & Bica, 2019;Schweidel & Moe, 2016;Trouleau, Ashkan, Ding, & Eriksson, 2016) to classify viewing behaviours. Despite their contributions, further limitations were highlighted. For example, relying on participants sharing the historic data with the researcher may lead to data manipulation (e.g., erasing of some data entries). Moreover, these approaches do not register interactive actions such as pause and rewind. BWDAT makes an effort to overcome the above mentioned limitations of current binge-watching studies and tools.
In fields such as cinema, the performing arts and video games, analyzing users' experiences through the combination of both objective and subjective data has a noteworthy trajectory (Cox, Bianchi-Berthouze, & Jennett, 2020;Reeves, Lang, Kim, & Tatar, 1999). For instance, in controlled environments, Sukalla et al. (2015) and Heiselberg and Bjorner (2018) have combined both psychophysiological data (e.g., heart rate, skin conductance, facial EMG, eye tracking, electroencephalogram activity) and self-reported data to explore narrative engagement and audience's responses to Danish Broadcasting Corporation shows, respectively. Mixed methods approaches allow the researcher to validate self-reported data (Sukalla et al., 2015) and tackle known issues of under and over reporting information about, for instance, time spent online (Araujo, Wonneberger, Neijens, & de Vreese, 2017;Scharkow, 2016) and phone usage (Boase & Ling, 2013). Moreover, such mixed approaches allow to take into consideration both the conscious and unconscious reactions of TV audiences (Heiselberg & Bjorner, 2018). Despite these benefits, there is a paucity of research on binge-watching that adopts this holistic approach, partially due to the lack of non-intrusive and open-access tools that facilitate it (Castro, Rigby, Cabral, & Nisi, 2019). BWDAT has the potential to help researchers to fill in this notorious gap.
In this paper, we argue that real-time collection of users' data in natural environments and the combination of objective and subjective data can help researchers to delve deeper into the binge-watching phenomenon, and position BWDAT as an enabling tool in these regards. We also believe that the widespread use of wearable devices equipped with increasingly accurate sensors is opening up new opportunities to collect physiological data in the wild (e.g., measuring the autonomic nervous systems and stress levels) and match psychological traits to this data.
The aim of this article is to report on the design and evaluation of the technical implementation of the BWDAT 4 tool. BWDAT is a nonintrusive and low-cost analytical research software that facilitates a holistic understanding of binge-watching in an uncontrolled environment remotely (e.g., the home). More specifically, the tool facilitates the collection of data about the pre-viewing, the viewing and the postviewing of Netflix experiences, on viewers' computers. This is achieved by combining, in a single tool, the collection of users' self-reported data with their physiological data, and their interactions with a VOD interface. With the design of this tool we aim to contribute to the existing debates about binge-watching and its multifaceted effects on people by allowing researchers to collect more thoroughly and efficiently data on VOD consumption, as well as tackling old and newly emerging research questions on the topic. BWDAT was designed, tested and successfully validated as a support tool for an investigation about excessive media use on Netflix, a research project carried out at University of Fribourg (Switzerland). Furthemore, BWDAT was created to facilitate longitudinal cross-national comparative studies on the topic of VOD consumption and on binge-watching, in particular.

Design goals
The specific goals of BWDAT data-driven approach can be summarized as follows: (a) The development of an application that records users' interactions (e.g., pauses, forward) with VOD interfaces (currently supporting Netflix) and that assists in the synchronization of the physiological data with the content watched. This synchronization happens in such a way that the researcher can clearly pinpoint, for instance, the exact scene when a user performs a specific action (e.g., pauses the video). (b) The development of a flexible smartwatch app that accurately collects physiological data (i.e., heart rate and inertial data) during VOD consumption at home. We opted for these two kinds of data because the heart rate reflects emotional activity (Mandryk & Inkpen, 2004), and wrist inertial data allows for a better interpretation of the heart rate due to the impact that movement has on the cardiovascular system. Furthermore, these physiological signals are the most popular and accurately registered by mainstream wearables. (c) The design of a graphical interface that integrates and displays all the data collected by the different devices, to ensure its accurate interpretation. Furthermore, a set of data analysis reports were developed to integrate the BW sessions' data, and help identify its errors.

Tools for data collection
BWDAT was integrated by a custom made browser extension, to function on Google Chrome. 5 The version introduced here was designed for Netflix and collects up to eleven user actions on the interface during a Moreover, the browser extension allows for certain actions (e.g., closing Netflix) to trigger specific questionnaires created ad hoc by the researchers according to their research questions or hypothesis. To make sure that the participants filled the questionnaires after the session, and/ or in case they accidentally closed the questionnaire before completing it, a pinned fix tab was added to the browser with all the questionnaires' URLs, for later retrieval. The App developed in Java, using Android Studio, collects heart rate (HR), gyroscope and accelerometer data. By using a high abstraction level approach, the App is currently supported by 26 different smartwatch models. The screen displays the smartwatch ID and a start and stop button (see Fig. 1) to initiate and finalize the physiological data collection process. The current values are sent to the server every second. Fig. 2 is a diagram of the different steps that need to be taken to use BWDAT. Fig. 3 represents the final architecture of BWDAT.
To develop and evaluate the technical quality of BWDAT, we used an iterative and incremental approach which involved several cycles of evaluations and validations. After a successful stress test, simulating the transmission of 20 smartwatches simultaneously to the database for 10 min, two trials and a long-term study were conducted. These three phases are described in detail in the next section.

Phases and participants
The first and second trial lasted one week. Participants' involvement in these two trials was totally voluntary, and curiosity about the project was their main motivation to participate. Prior to the two 7-day trials, participants completed a pre-study questionnaire to collect demographic and viewing habits data and provided informed consents about their participation. Table 1 below synthesizes both trials specific goals, participants' requirements and profiles, and the stimulus used.
Finally, after the refinements and iterations induced by the two trials findings, we considered the prototype ready for testing for its original purpose. BWDAT was hence used by a team of researchers for a long term study on binge-watching and its effects on sleep patterns. Our goal with that study was to evaluate the software performance on a larger scale, using more participants with a diverse demographic, and to compare its performance under 2 different environments (bingewatching and non-binge-watching mode). 37 participants (11 male and 26 female) between 18 and 52 years of age (M = 26.87, SD = 5.37) living in (Switzerland) were recruited. The research study divided participants into two groups (A and B). Participants in Group A (n = 21) were asked to watch three full episodes of a program of their own choice in a row. Participants in Group B (n = 16) were asked to only watch one episode in one sitting for three separate nights. All viewing sessions happened at home at night. The period of data collection lasted from June 2019 to September 2019 (4 months). The research team offered a 40 CHF compensation for participating in the study.

Procedure and devices used
Two days before trial 1 and 2, a researcher met the participants and

#1
-To identify software installations and usability problems -To evaluate the tool for at least 2 h, the average bingewatching session identified by both Rigby, Brumby, Cox, and Gould (2018)   explained the trials aims, presented the ethical issues and participants' rights, and provided each participant with a Polar m600 smartwatch. 7 For trial 1, a participant used a Moto 360 smartwatch to confirm the flexibility of the App. For trial 2, the usage of the smartwatch was not mandatory, since the main aim was to verify the improvements to the browser extension. In all cases, participants followed the instructions displayed on the smartphone and smartwatch to connect and synchronize the devices. The use of their own smartphones did not raise any concerns among participants since it was clearly explained that the app would not invade or collect any of their other personal data. After receiving their unique user code on a private WhatsApp message, participants installed the browser extension following a URL shared through a WhatsApp group created to facilitate the communication between participants and the researcher during the trial. Once the browser extension was installed, a login page prompted the users asking them to insert their unique code.
In the long-term study, a total of nine Polar m600 smartwatches and nine mobile devices (5 Nokia Smartphones and 4 Samsung Tablets Galaxy Tab E) were used. Participants were provided with both a smartwatch and a smartphone that were already paired. This difference in regards to trial 1 and 2 is related to the ethical issues the project needs to meet (e.g., not to request the participants to use their personal Google accounts). Ethics approval for the study was granted by the Ethics Review Board of University of Fribourg.
All participants of the long-term study used their personal computers to watch content on Netflix. They received an email with their personal ID code, the URL to the browser Extension, and the URLs to the different questionnaires (pre-and post-study questionnaires, and the ones participants needed to complete before and after each session). To better understand the engagement of participants with each session's content, the browser displayed a pre-session and post-session questionnaire containing, for instance, questions related to uses and gratifications and the PANAS scale (see Watson, Clark, & Tellegen, 1988).

Evaluation metrics
Interfaces and scripts were developed (using JavaScript, HTML and CSS), first, to identify the errors during the trials, and secondly, to provide the best interface for the data analysis and performance evaluation. The software offered an option to download data to be exported to other platforms (e.g., R Studio or Excel), allowing BWDAT to be used in other contexts and purposes. To frame our evaluation of the BWDAT tool, we distinguish between three working definitions of users' sessions: (a) Valid session: A session during which a participant consumes media content and which starts with "Session started" (opening a Netflix tab on the browser) and ends with "Session ended" (closing the Netflix tab). (b) Semi-valid session: A session during which a participant consumes media content but the browser extension does not detect the "closing" of the Netflix tab. (c) Empty session: A session during which the participant did not consume any type of media content, despite the session being bounded by the "Session started" and "Session ended" actions.   Fig. 6. Graphical distribution in time of the total number of actions collected during Trial 1 (Output from BWDAT). 7 We evaluated several devices combining reliability, cost, ease of development and autonomy, and compatibility with Wear OS 2.0 (https://wearos.goog le.com) operating system, which allows for standalone applications. The Polar m600 smartwatch Android-based wearable was selected. A technical test was run with Polar m600, by comparing the data collected from 4 users with Empatica E4, giving an average mean absolute error of 5.31%. According to Nelson and Allen (2019), an average absolute error below 10% serves as an indicator of an acceptable heart rate accuracy.
To evaluate the data collection accuracy, we took into consideration: (1) the coherence of the sequence of actions of the Netflix (e.g., some Play actions are followed by a Pause of the same episode) and (2) the accuracy of the concept of session implemented by the tool (i.e., that the participant manages to close the tab when the session is over). To evaluate the performance of the smartwatch app, we looked into the coverage and data transmission of each valid session and semi-valid session.

Trial 1
During the seven-day trial period, the installation of both the smartwatch app and the browser extension ran smoothly. No problems were encountered or reported. The only remark is that some smartwatches took more time to update to Wear OS 2.0 due to the low battery levels of some of these devices at the moment the participant was installing the app. During the seven days, the researcher used the WhatsApp group to remind the participants about the importance of maintaining the devices charged.
During the trial period, a total of 112 sessions were registered from the 10 participants. A graphical interface classified and provided the HR coverage of all the sessions, as shown in Fig. 4. Each session and its correlated information can be represented individually (see Fig. 5).
From all the sessions (see Table 2), 46.4% (n = 52) qualified as valid sessions, based on the definitions provided in Section 2.5. Data from the empty sessions reveals that between 1 and 11 (M = 5.4) empty sessions were registered in the very beginning of the study, when participants were logging in to the browser extension for the first time, or getting used to the Netflix platform (i.e., those who have never had a Netflix account before the trial). Therefore, the percentage of empty sessions registered is interpreted as a natural behaviour of users opening and closing the Netflix site, rather than the tool's error.
In 12.5% (n = 14) of the 112 sessions (6 of the 58 non-empty sessions [10.34%]), the action "closing Netflix tab" was not registered. At the end of the study we debriefed participants about these semi-valid sessions. Users explained that they did not close the Netflix tab immediately after the content was over.
Instead, they closed the tab a few hours later (e.g., after falling asleep) or they forgot that they were being monitored, leaving Netflix open and closing it days after, in places with no internet connection. Moreover, users described their experience with the tool as "natural" and "non-intrusive" of their viewing experiences.
In total, from all sessions, 1,382 actions were registered along the week (see Fig. 6). The analytics interface facilitated the researchers in the analysis of the data and it allowed them to detect Netflix usability patterns. In 94.83% (n = 55) of the sessions, participants paused, rewind and/or forwarded the video, which highlights the importance of accurately registering these user actions to be able to effectively synchronize the content watched with the HR and inertial data.
Several issues emerged when some participants switched from one episode to another of the same program. For instance, in seven sessions with content, empty actions with no episode associated were registered when switching between episodes, meaning that the software was not fast enough to read the current content. Consequently, it mistakenly stamped the episode with the time frame of the previous one. In some cases, participants refreshed the Netflix page. In these cases, the first action registered by the browser extension was "empty content" followed by "Play" when the content was loaded. All the identified issues (e.g., actions with missing content and several empty sessions with no distinction between login and browsing) in trial 1 were fixed before trial 2 took place.
The 10 sessions of trial 1 containing the requested movie had an average of 89.10% of CC and an average of 89.17% CT. From those, 2 had a lower CC (25.78% and 65.93%) while all the other sessions had more than 99.3% CC.
In this first trial, the most extended session registered was 6 h long, as the participant fell asleep while binge-watching. The Polar m600 battery lasted long enough for the required and desired time stamps to be recorded. Also the participant with the moto m360 smartwatch had  100% data coverage of the content during the session, confirming the flexibility of the tool to serve on more than one smartwatch model.

Trial 2
Similarly to trial 1, all participants were briefed and filled in a prestudy questionnaire. No issues aroused while installing the tools. Because six participants installed the app a few days after receiving the smartwatch, the watch took more time to update to 2.0 OS due to the smartwatch battery levels being depleted after a few days of inactivity. Moreover, all participants installed the browser extension successfully. In sum, the participants acknowledge that the procedure was easy to follow.
The HR coverage (see Table 3) low performance was justified because participant 6 did not use the smartwatch when watching Netflix, but only the browser extension, in 7 out of the 9 sessions.
Most participants reported to have had a stressful week and not clearly remembering the instructions when performing the viewing session. In those sessions when the browser extension did not register the closing Netflix tab action, we considered the end of the session as the last action (pause or stopped watching) performed by the participant. For instance, participant 7 had the most extended session, of 44 h, with a DC 1.42%, but a CC of 94.31%.
Despite the results in the DC, the browser extension improvements performed more accurately (compared to the previous version) when users were changing from one episode to the next one, as well as in the recording of the episode length. If we analyze the sessions with HR data collected, when the participants watched The Haunting of Hill House (Ep. 1, Season 1), the official stimulus of this trial, they had a 98.86% CC and DT. One of the users (participant number 2) did not record HR data while watching the episode.
Overall, this trial highlighted a fault in the design of the study, where the usage of the smartwatch was not declared mandatory, and it decreased the HR coverage evaluation results. This failure implied some loss of data. After this second trial, the browser extension was updated adding a "search" action which allows the researcher see what content the user browsers before engaging in the consumption of a show.
An alarm system was also implemented in the browser extension to verify if any physiological data was collected five seconds before the  Fig. 9. Actions recorded by session interface (Output from BWDAT).

J.A. Cordeiro et al.
beginning of a Netflix session. If no data was transmitted, a blocking pop-up message asks the user to verify the smartwatch. Finally, the browser extension was adapted, with minor changes, to the Mozilla Firefox browser, proving its flexibility and scalability from browser to browser.

Long-term study
After the issues identified and refined through trial 1 and 2, we estimated we had reached a stable and complete version of the tool to make it available to the researchers (currently open source on Github). At this time (May 2019) a team from University of Fribourg was involved in conducting the long term study about the effects of bingewatching on night sleep. We took this opportunity to test the tool in a real in-the-wild study, performed by colleagues of another institution.
The study was designed as a A/B study, where two groups of bingewatchers performed different activities and their data was compared at the end of the study. Participants in Group A had to watch three episodes in a row at night, before going to sleep, while Group B watched one episode per night before going to sleep for 3 nights in a row. Researchers from University of Fribourg intended to compare the effect of binge versus non binge-watching on sleep patterns.

Group A
The 21 participants performed a total of 95 viewing sessions during the study. Of these 95, 27 were valid, 32 semi-valid and 36 empty sessions. Given the focus of the study on night sleep, from those sessions, we only considered the sessions with various episodes watched on a night schedule, and with HR data, making a total of 21 valid sessions and 4 semi-valid sessions (n = 25) (see Table 4).
Several transmission failures, during data acquisition with the smartwatch and the server, impeded the system to record the information in the DB. The gaps of information lasted between 1 s to 2 min (M = 1 min and 13 s). All the sessions with less than 81% coverage stopped at a certain point in the session, which never recovered until the end, see Fig. 7 as an example. This way, some information was lost without the participants being aware of it. This issue was solved in the release of a new version (June 2020) by implementing an alarm system that verifies the data every 5 mins.
Some issues emerged in sessions with low physiological data coverage. Participant 9, which had a 62.55% DC, reported interruption and problems with the device. Participant 16, with a 34.38% DC, reported internet connection problems. Participant 18 answered the phone, which may have led to only an 80% DC, and participant 37 reported technical problems with the smartwatch battery, leading to a 90% DC and not 100%.

Group B
Overall participants completed a total of 105 sessions during the study. Of the 105 sessions, 61 were valid, 3 semi-valid and 41 empty (see Table 5).
Two interfaces (see Fig. 8 and Fig. 9) were implemented to analyze and download all the contents and actions of each session (see Fig. 10).
For group A and B alike, we considered as valid those sessions that include one or two watched episodes at night, before sleeping, with HR data and a post-session questionnaire associated (n = 51). The content watched had an average length of 42m23s, while the sessions lasted only 39m33s because of the skipping credits feature, and some cases of multiple forwarding while watching. Each night, the percentage of data coverage increased, possibly showing the users getting better at using the technology.
Regarding the browser extension, the only challenge was making sure that participants close the Netflix tab right after finalizing the viewing session. However, in those cases that they forgot to do so, we consider the last pause or stop playing action registered in that session as the end of the session. This approach worked well. In total, we registered 2,755 actions. A script was then developed to easily find errors in the data collected. For example, when the server received several actions at the same time this derived in some actions being registered in the wrong order.
After this long-term study the same version of BWDAT (V1.4.14, purple in Fig. 10), was used by the same research group at University of Fribourg for further studies. Data was collected from more than 200 users (n = 203) during one week, one order of magnitude above the previous study (Cabral et al., 2020). Finally, we debriefed the University of Fribourg researchers' experience with the tool, and we received positive feedback. According to the researchers reports, these further studies run smoothly and the tool performed well without requiring any upgrades or changes in the software. The researchers also confirmed that the right data was captured to explore their hypothesis about bingewatching.
The main positive features highlighted by them were the automatic reports generated by BWDAT, which allowed the data analysis of thousands of data entries in a simplified way (Figs. 4 and 8), and the clear data visualization (Figs. 5 and 6). Researchers also praised BWDAT accuracy and flexibility with several languages and time zones (e. g. collecting Netflix actions of a Korean movie being watched in the USA (e.g., action "Clicked: ")). From evaluation results and the extensive feedback received, the first author refined the browser extension and in September 2020 a new BWDAT Chrome Extension version was released (V 1.6.2). New features included a smartwatch's low battery alert and a next episode form display option (if pre-defined, when the viewer moves to a new episode a form is displayed in a new tab). To easily identify to which viewing session each questionnaire belongs to, the session ID was added to the questionnaire  URL. Table 6 summarizes the main issues identified in the iterative design of the tool and the status of those issues.

Discussion
Inspired by the benefits described by authors such as Heiselberg and Bjorner (2018) about the combination of objective and subjective data to analyze viewers' behaviours, we created BWDAT. This paper reports on the design and technical evaluation framework of BWDAT, a tool created to facilitate the study of the viewing experience on VOD platforms − more precisely, on Netflix − in an uncontrolled environment and for long periods of time. To the best of our knowledge, this is the first open-access tool designed to collect both subjective and objective data for research on the use and consumption of VOD services According to our tool evaluation results and researchers feedback, BWDAT provides researchers with a reliable instrument able to accurately synchronize users' interactions on Netflix with, on the one side, the physiological data generated by the user (and collected through a smartwatch) and, on the other side, synchronising such data with the timeline of the content watched. All of this with a minimal level of intrusiveness and supervision, and positive user experience.
Here we have described the iterative and incremental development and testing of BWDAT, explaining the main design decisions and technical obstacles and providing a framework to evaluate the reliability of these tools. BWDAT has already been used successfully in long-term (several months) research studies geared towards the understanding of different facets of Netflix content consumption. The tool also provides an analytics interface for the real-time data and automatically sends usage data to a back end server, which can be used for exporting the data to statistical packages to perform further analytics. Moreover, during the testing of BWDAT we designed an evaluation framework that provides a baseline for testing VOD experiences in-the-wild research tools.
The tool was tested over long periods and reported robust, high performance and availability (uptime) during sessions where participants spend several hours watching one specific unit of content (e.g., a movie, a documentary) as well as when binge-watching TV series. For instance, watching the movie requested on trial 1 obtained 89.10% coverage, and for the series episode on trial 2, 98.86% of content coverage and data transmission. Furthermore, the current version supports studies with more than 200 weekly users, proving its reliability and the success of the approach taken. These are positive and promising results regarding the robustness and reliability of the tool.
BWDAT was useful in exposing high-level user behavior that is essential for understading the binge-watching mode of viewing (e.g., viewers' actions on Netflix interface, and the synchronization of the content watched with the viewers' physiological data). The results from our trials demonstrate how the BWDAT tool and our framework can be easily used in a wide range of media research projects conducted not only in the wild, as the long-term study has proved, but also in a controlled environment. In the later scenario, the tool is flexible enough to incorporate data from sensors that measure, for instance, viewer's temperature or electroencephalogram activity. Concerning the analytics interface, the synchronization of all the objective data in a single interface facilitates its interpretation by the research teams (e.g., what scenes increased viewer's HR?). Similarly, the interfaces for action and data coverage analysis show that providing all the different data in the same platform enables automation scripts to analyze thousands of data entries in a matter of seconds.

Future work and limitations
Hitherto, BWDAT has been used by various teams of researchers to investigate various issues connected to binge-watching, such as to explore if binge-watching leads to a stronger immersion into the narration, and if binge-watchers develop further parasocial relationships than appointment viewers. However, BWDAT can also help explore how a user's interaction with the Netflix interface during a viewing session relates to self-reported levels of transportation (see Green & Brock, 2000). It can also be used to measure how the environment (controlled versus uncontrolled) of a media-related study affects viewing behaviours, or to explore if viewers' interaction with an interface changes according to the content watched (e.g., in terms of genre and format of the media product). Although we applied BWDAT in the context of binge-watching at home, the intentional flexibility of its design guarantees its use in the more general domain of the consumption of streaming content in any type of environment where a laptop and Internet connection are both present (e.g., at work, whilst commuting medium/long distances in public transportation).
Despite the novelty of the BWDAT tool, we have identified several limitations that suggest paths for future work. First, a backup system on both the smartwatch and the browser extension should be developed to ensure that data is still gathered when the Wi-Fi connection is lost. Second, the technology used does not offer data about the potential multitasking behaviour of the viewer. Despite its intrusiveness, the usage of cameras (Rigby, Brumby, Gould, & Cox, 2017) could be a potential solution. Third, detecting if the users fall asleep (from the HR levels or body stillness) could help classify users' behaviors. Fourth, considering the predominance of the TV set as the main screen to consume TV content at home, BWDAT should be adapted for smart TVs. The same applies to the consumption of VOD content on smartphones. Finally, the tool could be upgraded to register other types of physiological data (e.g., skin temperature) directly collected by newer generations of wearables or by uploading data from those devices into the analytical tool using .xlsx (Excel) or .csv files.

Conclusion
This paper presents BWDAT, a novel, reliable and flexible tool for the study of VOD consumption at home. We argue that the combination of subjective data and objective data collected in a non-intrusive fashion has the potential to challenge the existing knowledge regarding the effects of binge-watching. In this article, we described the design and technical evaluation of the tool to conclude that BWDAT works with standard devices and software, and provides researchers with an easy to use and deploy platform to holistically study the behaviour patterns of consumers of VOD media content.
We have explored the technical challenges and limitations of developing tools for in-the-wild studies and provided an evaluation framework for VOD content media consumption in general.
The combination of physiological data with user interaction and content description provides a powerful mechanism for research teams working at the intersection of this new interdisciplinary area and allowing for a more comprehensive and complete understanding of not only binge-watching behaviours but VOD viewing habits and experiences in general.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.