Electronic Data Collection in Health Research: Shared Experiences from the Field, South Africa

Background: mHealth and electronic data collection (EDC) systems have rapidly expanded in developing countries. A synthesis of the experiences of the researchers in resource limited African settings who have used electronic (mobile) systems to facilitate data collection in large-scale research was conducted. Methods: We synthesise the experiences of researchers and users engaged in studies using electronic data collection conducted by the South African Medical Research Council (SAMRC): (1) A cross sectional national survey of 9679 mother-infant pairs measuring the effectiveness of Prevention of Mother to Child Transmission (PMTCT) programme using low cost Nokia mobile phones; (2) A school-based randomised control trial to prevent gender based violence with teenagers (N=3755) using iPod Touch; (3) A longitudinal community-based study on International Alcohol Control using LAVA tablets on 2000 adults and 1000 adolescents; (4) A retrospective descriptive survey on injury mortality using entry-level Nokia phones to interview 22,733 participants using questionnaires. Results: Electronic Data Collection (EDC) necessitates systematic set-up and testing of the system, training and daily support of data collectors with appropriate matching between data collector age, ability, the tools complexity and size. Some of the risks noted in four research studies conducted in resource-limited settings were delayed uploading of data due to no or limited network coverage, loss of devices (e.g., cell phone or iPod touch), increased training time for older aged users, typing errors, and challenge of keeping batteries charged while conducting fieldwork. The benefits noted included the use of automated skip patterns and mandatory fields which reduced errors and early detection of potential errors, user-friendly interfaces, access to real time data for monitoring of field work enabling simultaneous feedback to staff and management, negated the need for data capturers, reduced printing and storage costs and reduction in time from completion of data collection to the generation of a cleaned final data set for analysis. Conclusion: The benefits of using electronic (mobile) systems for data collection appear to outweigh the risks in resource limited settings. Given the continuously changing information and technology age, electronic mobile technologies are becoming a popular data collection tool. Like any other technological tool, electronic systems can be improved to


Background
The use of Electronic Data Collection (EDC) methods in health research has grown rapidly in recent years. Whilst conventional methods of data collection are printed forms, there is a body of literature on the use of various types of electronic data collection systems used in the field of health care. The major advantages of EDC include the ability to enter, review and monitor data collection, evaluate study status and analyse data in real-time. There are a variety of suitable devices for electronic data collection, the most popular being mobile phones, Personal Digital Assistants (PDAs), iPod Touch, and tablets. In the past 20 years, electronic methods of data collection have been developed on handheld devices such as PDAs and more recently on mobile phones [1]. A study evaluating PDA and paper-based data collection reported that sixty-two percent of the participants perceived that the PDA-based questionnaire took less time to complete and participants preferred using PDA instead of paper for data collection [2]. Literature findings further suggest that smartphones may be more suitable than low-end mobile phones for data collection, as smartphones have larger screens and can more easily accommodate complex functions (such as wireless uploading and downloading, screen touch typing, and photo or video capturing) [1]. In a health behaviour assessment study, participants were asked to provide feedback on their experiences using iPod Touch. Ninety-nine percent of the participants reported an overall positive experience and majority of the patients did not believe the iPod Touch negatively affected their interactions with their doctor [3]. They further stated that iPod Touch was a promising device to assist behaviour change in a diverse population of varying age groups, genders, ethnicities and health status. Whilst literature findings indicate a positive approach towards electronic data collection systems, there is limited literature on the shared experiences, benefits, risk and challenges from researchers and users of electronic methods for data collection.
This paper aims to synthesise the experiences of researchers at the South African Medical Research Council who have used mobile/electronic systems to facilitate data collection in studies of varying designs, conducted in resource limited settings, in an attempt to provide contextual guidance for future users of mobile technology for data collection.

Methods
Project PI/Project managers (n=10) were invited to a discussion forum on electronic data collection users. Attendees were requested to share their experiences (benefits, risks, challenges) with mHealth in their respective research projects. These included four large scale studies using electronic data collection conducted by the South African Medical Research Council (SAMRC): (1) A cross sectional national survey of 9679 mother-infant pairs measuring the effectiveness of Prevention of Mother to Child Transmission (PMTCT) programme using low cost Nokia mobile phones; (2) A school-based randomised control trial to prevent gender based violence with teenagers (N=3780), parents (N=1000) and educators (N=800) using iPod Touch; (3) A longitudinal community-based study on International Alcohol Control using LAVA tablets on 2000 adults and 1000 adolescents; (4) A retrospective descriptive survey on injury mortality using entry-level Nokia phones to interview 22,733 participants using questionnaires.

Results
The results are presented by research project. Table 1 summarises the key elements of the systems presented in this report.

Project I: South African Prevention of Mother-to-Child Transmission of HIV Programme Evaluation
The SAMRC Health Systems Research Unit conducted large scale Prevention of Mother to Child Transmission (PMTCT) national surveys. The aim of the survey was to assess impact of programme to prevent Mother To Child Transmission of HIV (MTCT), measured at 4-8 weeks of age, in three sequential, annual, nationally representative surveys in all nine South African provinces (2010, 2011-12 and 2012-13) [4]. In partnership with a South African company that specializes in designing and implementing mobile technologies for different mobile data collection systems (http://www.mobenzi.com/researcher/ home), platforms were developed for the data collection and management of these surveys, conducted between (2010-2013) and one national prospective observational cohort study (2012-14) derived from the third survey. The mobile data collection system managed data collection from over 10 000 mother-infant pairs in each of the three-national facility-based surveys. Mobenzi developed two platforms to support the (2012-2013) survey with the added longitudinal cohort. Mobenzi Researcher supported the data collection component of the study and Mobenzi Outreach supported the operational workflow of the project.
The system was designed with internal quality control checks and restrictions e.g. limits were placed on certain fields to avoid 'out of range' values, compulsory questions could not automatically be skipped, extremely important information was double entered, and the system recognised discrepancies immediately, prompting for re-entry. Electronic skip patterns or loops were set in place based on maternal responses. The system was used by mainly retired nurses (employed as data collectors) who received one day of training on how to use the technology, and 5 days additional training on other aspects of the study protocol. Real-time daily telephonic support was needed to manage field-based crises and to ensure appropriate functioning of the technology. Many of the challenges arose when data collectors removed the mobile phone sim card or did not check for daily updates. To manage contingencies, data collectors carried a limited number of hard copy questionnaires. These were used if the cell phone malfunctioned in the field i.e. software application errors, inability to retrieve survey questionnaire, survey questionnaire freezing while data collection is in progress and completed surveys partially uploading onto the web base console [5]. The web-based interface created a platform for trained data collectors to undertake real time data entry during the interview. Additionally, real time review of submitted data enabled supervisors to monitor field work progress, to track the quality of data collected and to performance-manage the data collectors. Raw data reports were also readily available for feedback to stakeholders' and program managers. Table 1 presents more detail about this use, lessons learnt, benefits and risks of using electronic (mobile) technology for data collection in large national surveys.

Project II: Randomised Controlled Trial on Teen Violence Prevention
The SAMRC Gender and Health Research Unit conducted a randomised controlled trial in (2014-2015) with teenagers in Grade 8, to develop and test a multi-faceted school-based intervention to prevent Intimate Partner Violence (IPV) among teenagers [6,7].
A standard self-complete questionnaire was loaded on to an iPod Touch. The iPod Touch allowed for confidentiality as data were entered anonymously and findings were reported with anonymity. They allowed a person completing a questionnaire to go back one screen but once a questionnaire was completed it could not be viewed again on the device. Thus, it was impossible for someone who viewed the iPod Touch to determine the information stored in its files even if multiple interviews were done on one day on the same device. Data were uploaded via a wireless network from the iPod Touch and sent to a web based, password protected system. The study team were able to easily attract their target group. Teenagers were fascinated by the use of mobile technology and were keen to participate in the study. The study team could reach many participants and administer the questionnaires in groups with only a few data collectors responsible for guiding and monitoring the process. Table 1 provides more details including lessons learnt, benefits and risks. The use of iPod Touch was also attractive to parents who were acquainted with use of mobile technology using smart phones, but a challenge to those who were not familiar with such devices and it took the latter group more time to complete the survey.

Project III: International Alcohol Control Household Survey
The Alcohol Tobacco and Other Drug Research Unit used LAVA tablets in a community based survey aimed at (i) measuring alcohol consumption among populations in Tshwane, South Africa; (ii) documenting alcohol policy relevant behaviours, including place and time of purchase, prices paid, and exposure to and salience of alcohol marketing; and (iii) determining the impact of restrictions in alcohol marketing on alcohol consumption and policy-related mediating variables [8].
LAVA tablets were used to collect household survey data from two samples; adolescents (aged 16 and 17 yrs.) and adults (aged 18-65 yrs.). The use of tablets was prompted by the fact that the questionnaire was extremely complex consisting mostly of 'loops within loops'. Participants had 17 different options regarding alcohol consumption locations (including 'other') and researchers asked them to report on 13 different types of alcohol consumed at these 17 different locations. This resulted in a vast number of variables due to the complexity of the questionnaire and therefore a pen and paper survey was not considered feasible. Development of the software was an expensive exercise and it was extremely difficult to find a software developer due to the complexity of the questionnaire. This resulted in challenges such as delays in the development of the software and delays in the commencement of the actual fieldwork. Problems that were experienced in the field with data collection using the tablet included broken or lost tablets, system failures at data collection points and the challenge of keeping tablet batteries charged in the field. However, despite these challenges, the use of tablets during data collection was advantageous. In addition to confidentiality and anonymity, the use of tablets prevented loss of data since data was uploaded in real time and sent to a central server immediately where it was stored within a protected database. The data collectors were trained in the use of the tablet as well as data collection processes and reported that the use of tablets was less cumbersome than traditional approaches. Benefits cited included that they were not required to carry papers and stationery around and electronic use of questionnaires eased concerns of not being able to account for all questionnaires. While concerns in respect of fieldworker safety were taken into consideration at study conception, the data collectors reported that they did not feel threatened in the field while carrying the tablets. Table 1 summarises the lessons learnt, benefits and risks whilst using mobile-based systems as a data collection tool.

Project IV: Injury Mortality Survey
The SAMRC Burden of Disease Research Unit conducted an Injury Mortality survey aimed to establish the cause-specific incidence of fatal injury for the year 2009. The specific objectives were: (i) to describe the incidence of fatal injury rates in South Africa for 2009 by age, sex and cause (ii) to describe the metro and non-metro profiles of fatal injuries (iii) to compare the provincial profiles of fatal injuries [9]. The study was designed as a retrospective descriptive study, utilising routine data collected through the post-mortem reports, and ancillary documentation including police reports and hospital records that appear in case folders. All folders and registers for patients who died an unnatural death during 2009 were reviewed across South Africa. Mobile phones were used to collect data via a web-based service provider Mobenzi. The Mobenzi team converted the questionnaire into a mobile phone application incorporating the screening logic, skips and control flow capabilities offered by Mobenzi Researcher for entry-level handsets. The unit conducted a survey of more than 22000 non-natural deaths that presented to mortuaries. The mobile phone-based questionnaire was used to collect demographic information from post-mortem reports, including age, sex and race of the deceased. The primary source of the data was entered on site via the custom-designed software for mobile telephones and submitted to a central web-based data platform, which enabled ongoing monitoring of data collection activities, quality control and data cleaning by the national coordinator and project manager. Mobenzi provided airtime monitoring and recharge services, which included both scheduled recharges as well as ad hoc recharge requests. Table 1 summarises the lessons learnt, benefits and risks whilst using this mode of EDC system.

Discussion
The use of EDC systems in research affords researchers new opportunities to enhance survey and questionnaire-based research. Collecting data through structured surveys with structured response options, coupled with the ability to view data in real time makes EDC system a desirable option for data collection. Analogous to all technological devices, the use of electronic devices like mobile phones, tablets, PDAs and iPod Touch for data collection has its risks and benefits.
The users who contributed to this paper shared their experiences, lessons, benefits and risks when using technology for data collection. They highlighted cost saving benefits including less paper, printing, and storage needs and no employment of data capturers. In addition, other benefits included a reduction in data entry time and data capturing errors, real time data viewing, real time and automated data quality checks, mandatory fields and skip patterns that reduce incomplete surveys and missing data, and enhanced data security. In terms of risks, they highlighted software development costs, initial hardware acquisition costs and potential replacement costs especially for desirable devices like iPod Touch and tablets if stolen, and lack of source data as survey responses are either verbal or responses entered directly onto device by participant. The benefits however appear to outweigh the risks.
The benefits highlighted by iPod Touch users are consistent with findings of a feasibility study using iPod Touch for a health behaviour assessment survey where majority of the participants found the iPod Touch easy to use, questions were easily understood, a clear recording of responses and minimising response error rate were reported [3]. Other benefits of a reduction in paper and storage costs were also consistent with reported findings in studies that used mobile phones to collect data in a household survey [10,11] and a study that reviewed challenges in mobile phone based data collection systems highlighted cost reduction as a benefit [1,2,12]. The availability of real time data and data quality control are also highlighted in various literature sources as a major advantage of using EDC for data collection [1,2,5,[10][11][12][13][14]. A study comparing EDC with standard paper based data capture have found that regardless of the type of device, the major advantage of EDC is that data becomes available after collection in the field without delay [1].
The potential risks of EDC highlighted in Table 1 are consistent with findings in a household survey that reported the drawback of using a paperless system is that there is no paper questionnaires to review in the event of problems detected in the and paper based questionnaires in Nigeria and Benin-Republic reported the absence of source information which one can revert to in the event of a problem or errors detected and identified this as a limitation [15]. They found that it was not possible to ascertain the source of the differences for the variables which showed a weaker agreement, whether it came from the information given by the study participants during the interview or the entry of the information by the interviewers.
Mobile phones, iPod Touch and tablets are relatively high cost bearing devices. They are desirable items and their compact size means they can be easily stolen. In research settings, replacement costs can be escalated especially for iPod Touch and tablets which are relatively costlier to replace than some mobile phones, particularly if they are lower end non-smart phones. The risks of these replacement costs highlighted in Table 1 are consistent with findings in a study in China that compared two methods for a Maternal and Child health (MNCH) household survey. They report that the use of smartphones had some drawbacks, including that data can become corrupted when the device is damaged and replacement costs are relatively high when the device is lost or damaged [1].
Whilst there is a growing trend in the use of EDC systems, ongoing advancement in mobile technology make mobile phones a popular data collection tool. The synthesis of the experiences of researchers in this paper highlight the potential gaps, possible limitations, benefits and risks that will assist other researchers when engaging in EDC systems for data collection. This will also assist researchers to make informed decisions based on lessons, benefits, and risks when selecting appropriate tools that will facilitate data collection.

Conclusion
Mobile based systems as a data collection tool are less time consuming and more user friendly as they provide access to real time data, facilitate early detection of potential errors, can have a built-in error detection component, reduce the need for large and heavy data collector field-based bags and reduce printing and storage costs. Electronic systems can also be used to manage and monitor the day-to-day operations of a study, provide simultaneous feedback to staff and management as well as almost instantaneous reporting. Data are aggregated on a web based central database enabling a quicker processing, analysing and dissemination of data.
The benefits of use of electronic (mobile) systems for data collection appear to outweigh the risks involved. Given the continuously changing information and technology, mobile phones and other portable electronic devices are becoming a preferred data collection tool. Like any other technological tool, electronic systems can be improved to overcome some of the risks highlighted in this report.

Declarations Ethics and consent to participate
This paper is a review synthesis; thus, no patients were recruited. All authors willingly contributed insights and perspectives.

Consent to publish
All authors have granted consent to publish. No patientlevel data are included. Thus, no consent to publish is needed from patients. Thus, this section is 'not applicable'.

Authors' Contributions
YS: She had taken a major role in drafting the first version of the paper. She has taken a major role in reviewing the contributions in this paper, writing the submitted paper and approved the final version.
DJ: She has taken a major role in reviewing this paper and made substantial contributions in writing the submitted paper and approved the final version.
AG: She has contributed in writing the submitted paper and approved final version.
PM: She has contributed in writing the submitted paper and approved final version.
SS: He has contributed in writing the submitted paper and approved final version.
EN: She has contributed in writing the submitted paper and approved final version.
NHB: She has contributed in writing the submitted paper and approved final version.
AEG: She has taken a major role in reviewing this paper and made substantial contributions in writing the submitted paper and approved the final version.

Availability of Data and Materials
All data arise from the documented experiences of the authors and data collectors. No patient data were used for this paper. No raw data/materials are used for this paper. Thus, this is not applicable.