The Use of Machine Translation for Outreach and Health Communication in Epidemiology and Public Health: Scoping Review

Background Culturally and linguistically diverse groups are often underrepresented in population-based research and surveillance efforts, leading to biased study results and limited generalizability. These groups, often termed “hard-to-reach,” commonly encounter language barriers in the public health (PH) outreach material and information campaigns, reducing their involvement with the information. As a result, these groups are challenged by 2 effects: the medical and health knowledge is less tailored to their needs, and at the same time, it is less accessible for to them. Modern machine translation (MT) tools might offer a cost-effective solution to PH material language accessibility problems. Objective This scoping review aims to systematically investigate current use cases of MT specific to the fields of PH and epidemiology, with a particular interest in its use for population-based recruitment methods. Methods PubMed, PubMed Central, Scopus, ACM Digital Library, and IEEE Xplore were searched to identify articles reporting on the use of MT in PH and epidemiological research for this PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews)–compliant scoping review. Information on communication scenarios, study designs and the principal findings of each article were mapped according to a settings approach, the World Health Organization monitoring and evaluation framework and the service readiness level framework, respectively. Results Of the 7186 articles identified, 46 (0.64%) were included in this review, with the earliest study dating from 2009. Most of the studies (17/46, 37%) discussed the application of MT to existing PH materials, limited to one-way communication between PH officials and addressed audiences. No specific article investigated the use of MT for recruiting linguistically diverse participants to population-based studies. Regarding study designs, nearly three-quarters (34/46, 74%) of the articles provided technical assessments of MT from 1 language (mainly English) to a few others (eg, Spanish, Chinese, or French). Only a few (12/46, 26%) explored end-user attitudes (mainly of PH employees), whereas none examined the legal or ethical implications of using MT. The experiments primarily involved PH experts with language proficiencies. Overall, more than half (38/70, 54% statements) of the summarizing results presented mixed and inconclusive views on the technical readiness of MT for PH information. Conclusions Using MT in epidemiology and PH can enhance outreach to linguistically diverse populations. The translation quality of current commercial MT solutions (eg, Google Translate and DeepL Translator) is sufficient if postediting is a mandatory step in the translation workflow. Postediting of legally or ethically sensitive material requires staff with adequate content knowledge in addition to sufficient language skills. Unsupervised MT is generally not recommended. Research on whether machine-translated texts are received differently by addressees is lacking, as well as research on MT in communication scenarios that warrant a response from the addressees.


Background
Public health (PH) and epidemiology are increasingly challenged by decreasing response proportions, in general, and an underrepresentation of culturally and linguistically diverse (CALD) communities, in particular [1].Such underrepresentation increases the risk of biased estimates and, therefore, might limit the generalizability of findings in population-based research [2][3][4][5][6][7].Ultimately, it might hinder the inclusion and involvement of these communities in disease prevention, surveillance efforts, and emergency response.In PH outreach and information campaigns, reaching CALD populations often poses greater difficulties than reaching other groups.As a result, these groups are challenged by 2 effects: the medical and health knowledge is less tailored to their needs, and at the same time, it is less accessible to them.These effects will only increase in importance as migration owing to globalization, global conflict, and economic inequalities increasingly shapes our societies toward multiculturality.
Using personalized recruitment material is an effective approach to engage individuals from CALD groups in population-based studies [8].The choice of language matters because language barriers often result in their disengagement with PH initiatives [9][10][11].If recipients are not able to comprehend transmitted information in the first place, they cannot react to it or provide an informed response [12].Inclusive outreach approaches in PH study material, such as simplifying technical language or using multilingual cover letters, have been proven to improve access to information, foster meaningful participation, and reduce study nonresponse [13][14][15].PH officials and researchers often struggle to effectively reach and engage all target audiences evenly.Although sufficient knowledge about the cultural composition of the target populations may be available (ie, the necessity to use particular languages), budget limitations usually restrict how many professional translations can be prepared and used for PH communication and outreach efforts to start with.A further complication with printed outreach material is that the number of different language versions that can be sent out in a single letter is physically limited, but the preferred language of an individual often is not known; therefore, it is difficult to conduct targeted outreach with specific language versions tailored to each recipient.
The use of machine translation (MT) technology poses a potential solution to overcome language hurdles in multilingual populations and improve effective material dissemination.As a computerized system, MT is able to automatically translate text or speech from 1 source language to multiple output languages [16].In clinical settings, the technology has already been used to lower language barriers and facilitate services independently of the spoken language of the physician [17].In the context of PH and epidemiology, MT could also be used to increase outreach by providing cross-lingual access to information and supporting PH staff to optimize material translation workflows.

Prior Work
To our knowledge, there are 4 recent systematic reviews that cover aspects of the use of translation technologies in medical and clinical settings.In 2018, Dew et al [18] published a review on how the development of MT technology could be useful to assist one-way communication among individual stakeholders.In 2020, Frampton et al [19] systematically mapped digital tools for the recruitment and retention of participants in randomized controlled trials.Although the authors did not specifically address MT or similar language technologies, one of their main takeaways was that few studies address its use to support underserved groups.A year later, Thonon et al [20] published a review on the use of mobile apps to facilitate dialogue between health care professionals and CALD individuals with low language proficiency levels.In 2022, Vieira [21] published a review with a focus on the use of MT in medical and legal settings as 2 separate cases of translations of highly specialized vocabulary.The paragraphs devoted to medical settings mostly focused on one-to-one communication examples, mainly corroborating the findings of Dew et al [18].
In addition to these systematic reviews, other studies have assessed the use of MT in different health settings.Panayiotou et al [22] provided a methodical evaluation of 15 Apple iPad-compatible language translation apps to facilitate conversations between health care providers and patients in Australia; aside from its geographically bounded context, the study centers on native mobile apps for one-to-one communication.Nurminen and Koponen [23] outlined several applications of MT for increasing information accessibility in humanitarian settings (eg, an armed conflict, a natural disaster, or an epidemic), including a paragraph devoted to discussing community-based health, as well as safety and security information.Although relevant to PH, the overview neither specifically reviews other contexts nor identifies patterns in the literature regarding the state of readiness of MT for PH settings.
These earlier publications are mostly confined to reporting literature on the use of MT for real-time bilingual person-to-person communication.The technology is mainly studied as an on-premise solution to support medical service provision in spoken interactions between specific groups of patients (eg, tourists, refugees, or expatriates) and health care staff (eg, general practitioners, caregivers, or paramedics) [24][25][26][27][28].Only a few of the articles explore the use of multilingual translation tools for disseminating PH information to specific target audiences [29] or for population-wide health initiatives [30].

The Goal of This Study
The objective of this scoping review was to systematically map the use of MT for conducting PH outreach, with a particular focus on population-based recruitment methods.As a first step, we identify the information exchange scenarios in which MT technology is used to facilitate essential PH operations in different health and care settings.Second, we provide an overview of the types of study designs and research instruments for monitoring and evaluating the use of MT in these cases.Third and last, we synthesize the reported findings, benefits, and risks in relation to technical, socioeconomic, and ethicolegal technology readiness levels.

Search Strategy and Selection Criteria
This scoping review was preregistered on the Open Science Framework on February 11, 2022 [31], and conducted in accordance with the updated guidance on scoping reviews of the JBI Manual for Evidence Synthesis [32] as well as the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist (Multimedia Appendix 1 [33]) [34,35].This scoping review exclusively includes peer-reviewed original research describing and assessing the use and suitability of MT for written texts for the purpose of improving collective outreach as well as the response and involvement of participants in the fields of epidemiology and PH, regardless of the specific target interventions or health areas involved.Given the technical nature of the research topic, peer-reviewed conference papers were also included.In addition, articles reporting guidelines or consensus statements concerning the use of MT in PH settings were included.The scoping review considers only studies written in English and published from 2007 onward, a year after the launch of the first fully web-based MT system and the publication of the first reference framework for MT quality assurance (the EN15038 standard) [36].Studies of individual care or counseling settings (eg, practitioner and patient) were excluded because, in these settings, MT is used for spoken two-way communication.Textbox 1 presents the eligibility criteria.
Textbox 1. Eligibility criteria for the scoping review.

•
Article type After deduplication and the application of the exclusion criteria (ie, language not English, publication before 2007, and non-peer-reviewed articles), both authors (PSH-E and SR) independently screened the titles and abstracts of all remaining records using the R packages revtools [37] and metagear [38], which provide tools for semiautomatic deduplication and title or abstract screening.Disagreements were discussed and resolved by reaching a consensus.If necessary, full texts were consulted.

Data Extraction, Synthesis, and Analysis
Data extraction was conducted using a standardized data extraction template to extract bibliographic characteristics, health information exchange scenarios, research objectives and corresponding study designs, and technical characteristics of the MT tools used, as well as to identify the principal findings in the selected articles.
Health information exchange scenarios were assessed using a settings approach to health promotion [39].We extracted and classified data regarding the (1) transmitters and recipients of translated materials, (2) types of translated materials, (3) types of MT systems and the source and target languages studied, and (4) nature of the use of MT in PH procedures as unsupervised (ie, without editing efforts) or supervised (ie, combined with editing efforts).
Research objectives were assessed according to the World Health Organization (WHO) monitoring and evaluation (M&E) framework [40], which is useful to map the research and development of digital health technologies according to their stage in the innovation maturity life cycle.We then classified the articles as either monitoring studies or evaluation studies.We considered monitoring studies to be those involving research on the technical quality and stability of MT (eg, technology assessments and comparative experiments) and evaluation studies to be those reporting on the appraisals of the technology-based interventions over time (eg, usability, affordability, and economic cost-effectiveness studies), as well as implementation research for integrating developed systems within broader PH workflows.
To assess the principal findings, we extracted sentences reporting quantitative and qualitative outcomes from the results sections.Following the service readiness level framework of evidence proposed by Hughes et al [41], we organized the statements as concerning technical, socioeconomic, or ethicolegal readiness levels of MT technology.On the basis of a manual sentiment analysis, we then detected the tonality of each text and classified them as positive, negative, or neutral.

Search Outcomes
Conducted on January 31, 2022, and updated on March 3, 2023, the search yielded a total of 7186 records, of which 2934 (40.83%) were removed (1596/2934, 54.4% duplicates and 1338/2934, 45.6% not meeting the eligibility criteria).A review of the titles and abstracts of the remaining 4252 records resulted in 56 (1.32%) being selected for a full-text screening.From these 56 articles, 10 (18%) were removed for not meeting the study design criteria, not specifically addressing the research question, or for providing duplicate information from another included paper (Multimedia Appendix 3), and 46 (82%) were included in the systematic scoping review (Figure 1; Multimedia Appendix 4 [29,).
The tested MT software tools were either freely available on the web from commercial technology vendors or were in-house built systems created by the research teams themselves (Multimedia Appendix 5).Regarding commercial vendors, Google Translate was the most used translation engine ( [73,86], among others.All these systems were used as domain-agnostic systems and not pretrained on specific language corpora.All articles regarding in-house built systems (9/46, 20%) [61,63,69,71,76,79,80,82,84] presented a prototype demonstration of domain-specific MT systems specifically trained on PH-related and medical vocabulary.The studies comparing these systems against each other (4/9, 44%) [71,79,80,84] advocate for using in-house built systems for shorter text with medical terminologies in long-term projects, whereas off-the-shelf systems may be used for more general information.In relation to each other, the evidence does not clearly favor 1 translation engine over another.Instead, it suggests that the choice among systems depends on the language pairs and the vocabulary domain used in the material.Provided that the texts are not exclusively reliant on specific terminologies, domain-agnostic solutions are equally suited for handling short-text translations.

Study Designs According to the WHO M&E Framework
In accordance with the WHO M&E framework [40], we identified 6 types of research designs across the selected articles (Table 1

Studies, n (%) Study type and research design
Monitoring studies: functionality and stability of MT a at predefined levels of quality 23 (50) 1. MT technology assessments: studies assessing MT quality, functionality, and performance 3 (7) 2. Technology stability standards: studies proposing standards or criteria for MT quality assurance 8 (17) 3. Prototype demonstrations: studies reporting on the development and design of an in-house built MT-based system Evaluation studies: MT technology in health-related settings 4 (9) 4. Usability studies: studies addressing end-user attitudes, perceptions, and responses when using the prototype system and assessing how easily end users can interact with the system 4 (9) 5. Economic evaluations: studies addressing accessibility, availability, or affordability of the system 4 (9) 6. Implementation research: studies around the implementation of MT technology within a broader (public) health system architecture a MT: machine translation.
The monitoring studies adopted standard MT evaluation methods to measure the quality of MT output across various samples of health information material.Most of these studies focused on studying MT quality in terms of structural accuracy (28/34, 82%) [29,43,45,46,48 [61,62,69,71,79,80,82,84] assessing structural accuracy supplemented their findings with standard automatic evaluation methods to verify the quality of MT output in comparison with the output of professional human translators.Flesch-Kincaid grade level scores and content analysis techniques were used to measure the readability levels and meaning preservation of the translated sentences.In a few of the articles (6/34, 18%) [50,51,57,58,66,72], MT was also evaluated in terms of the risk severity of mistranslation (ie, the degree of negative impact on the patient's health outcome because of a wrong translation).Studies investigating postediting (4/34, 12%) [29,59,77,86] or back translation (2/34, 6%) [66,74] focused on identifying error patterns or measuring the amount of time saved, whereas pre-editing (2/34, 6%) [61,62] was investigated to understand the ability of MT to handle PH jargon and medical terminologies.

Principal Findings
In our scoping review, we sought to systematically identify and map existing peer-reviewed literature on the use of MT for population-based outreach, with a particular interest in its use for recruiting participants for PH and epidemiological research.None of the included articles (n=46), published between 2009 and 2023, tested MT for recruiting participants to population-based studies or in scenarios where a response from addressees is expected.Research on the use of MT for PH activities is still in its early stages, primarily concentrating on assessing the technical readiness for one-way written communication between PH officials and addressed audiences.The majority of information transmitters (ie, the end users of MT) were PH professionals in PH departments and research, clinical and hospital staff, or staff at international and national health organizations.PH materials translated with MT were predominantly official guidelines and educational resources, simplified medical information, or PH promotional material.The intended target audiences (ie, the receivers of translated material) were the wider population (both offline and seeking information on the world wide web), patient groups, or professionals in PH and clinical settings.Nearly three-quarters (34/46, 74%) of the articles reported monitoring studies, with the remaining quarter (12/46, 26%) reporting evaluation studies.

Research on the Use of MT for PH Activities Is Still Nascent
The current focus of research is mostly concentrated on understanding the extent to which machine-translated output is reliable and stable enough for translating specific sample texts, while placing less emphasis on the feasibility of its use in real-world settings.Published study types mostly provided technical maturity assessments of MT (eg, in exploratory research, experimental proofs of concept, and implementation research studies).
A handful of studies (9/46, 20%) [61,63,69,71,76,79,80,82,84] reported ongoing research in the development of in-house software, pretrained on specific vocabulary.These systems were reported to outperform off-the-shelf models (eg, Google Translate and DeepL Translator), namely when translating shorter text with specialized terminologies, such as those used in medical guidelines or prescriptions.The fact that the technology is evolving and can now be trained in PH and biomedical vocabulary sheds light on future possibilities to meet the needs of staff working with more complex PH material.However, the current state of evaluations on the advantages and disadvantages of the off-the-shelf systems over internally developed models does not yet allow PH researchers to model the best use of both systems during specific stages of material production.Provided that PH material does not heavily rely on domain-specific vocabulary, off-the-shelf MT solutions are sufficiently reliable in terms of translating shorter text.Given that these systems are predominantly free to use and easily adaptable to a translation workflow, proprietary models are relatively costly to develop and maintain, as well as scale to new vocabularies.
The literature tends to focus on evaluating the accuracy of supervised translations from the language of the working staff or researchers (typically English) to 1 or a few languages (in most cases, Spanish, Chinese, or French).The observed inclination to study English as a source can be attributed to the origin of the selected articles in this review.For most of the studies (19/46, 41%) [29,[50][51][52]56,57,[59][60][61][62][63]66,69,72,74,77,79,80,85], the target audiences of interest were large linguistically diverse communities residing in predominantly English-speaking countries (eg, the United States, the United Kingdom, and Australia).Future studies could also aim to cover underrepresented languages beyond that of the largest linguistically diverse groups and continue exploring cases to support linguistically diverse PH staff.For now, a few of these studies (6/46, 13%) [29,59,66,74,77,86] tested MT in light of postediting efforts.As user-friendly MT applications become more accessible to the public and professionals, we can reasonably assume that the focus of MT research in PH might shift from generating texts with MT to generating texts that are optimized for MT, that is, the emphasis might shift from technical accuracy and postediting efforts to pre-editing of texts.
A limited number of articles (21/46, 46%) [29,42,[44][45][46][47]49,53,59,60,64,65,67,70,[75][76][77][78]80,83,85] investigated the societal acceptance of MT, mainly by surveying the attitudes of PH staff toward its adoption, formulating new concepts, and studying current practices and standards.The selected studies point to the conclusion that PH staff are enthusiastic and open to adopting MT in their workflows.Almost half (10/21, 48%) of the studies held positive attitudes toward the potential cost-effectiveness of using MT to increase public access to PH information.However, the technology has not been routinely adopted by PH departments owing to safety concerns, the loss of control over content, and the unquantified variability of the quality of translation between languages.There is a need to further identify relevant stakeholders for implementing and deploying MT, as well as to test proposed solutions in controlled environments with the end users of translated material.
Most of the experiments (31/46, 67%) were based on expert focus groups and surveying PH professionals, whereas only a few (3/46, 7%) explored end-user interactions, preferences, and perspectives in real-world settings.However, without real-world studies conducted outside laboratory settings and in field experiments, the user experience of the technology remains largely unknown.Only a few studies (8/46, 17%) [42,47,49,53,64,65,67,78] tested the usability and acceptability of MT in community settings.Future studies could explore, for example, end-user interactions with machine-translated text in daily life settings, while also continuing to survey PH professionals in digital environments and capturing their attitudes toward use and adoption, as well as measuring the actual information uptake by groups targeted with machine-translated materials compared with nontranslated materials alone.Moreover, no article focused solely on the legal or ethical aspects of the use of MT for PH purposes.However, some of the studies (13/46, 28%) [50][51][52]57,58,64,66,68,72,76,83,85,86] did provide a generic consideration of ethical compliance aspects as part of their discussions.To the extent that these concerns were addressed, 2 (4%) of the 46 studies called attention to the fact that the commercial vendors' algorithms are not transparent to researchers and staff.Investigating MT from an ethical perspective, such as its impact on the digital divide, and establishing standards for its adoption also remain pending in light of PH equity goals and the risk of harmful errors.literature only covers the use of MT for communicating in PH settings that do not warrant a response from addressees.Most of the studies (27/46, 59%) [29,42,43,49-53,57-60, 62,64-67,70,72,75-80,82,83] focused on the use of MT for translating simple text in flyers, instructions, and general information sheets from 1 language into a selected few.Hardly any of the articles (44/46, 96%) [29,[80][81][82][83][84][85][86] discussed cases where the technology was used to communicate with several linguistically diverse populations at once.Only 2 (4%) [75,79] of the 46 studies introduced the use of MT for emergency preparedness and outreach prompted by the COVID-19 pandemic.These cases remain examples of unidirectional communication between PH staff and addressed audiences who are not expected to provide a response in return.
One possible reason why MT has not been used for recruitment in population-based research may be that there is limited utility in providing translations of PH material into languages that are not spoken or read by researchers or field staff or in recruiting participants who cannot interact with the languages in which the study is offered.On the contrary, if studies are offered in multiple languages, they are usually prepared with research instruments and personnel pre-equipped with the skills to meet the language diversity of the study population.It is therefore rather unlikely that MT would be necessary for translating recruitment materials in the first place.
However, there are scenarios in which MT may prove beneficial in population-based recruitment; for example, in studies on children and adolescents, the actual study participants often speak the language of the country fluently, but their legal guardians, who have to consent to their children's participation, might not be proficient in the language.Providing them with study information and consent forms in their preferred language might help them to understand what is asked from them and their children and, therefore, increase the probability that they will provide consent.However, for such purposes, ensuring a certain translation quality is crucial to meet ethical and legal requirements, but, as mentioned before, this review did not find much evidence of research regarding this problem.Furthermore, providing multilingual invitations could also help PH employees to understand the demand for different languages at the population level.If addressees could be enabled to report their preferred languages back to PH staff, the collected data might be used to adapt ongoing or future studies to provide additional language support.Alternatively, addressees could be informed that participation is possible, contingent on being accompanied by a translator.
Finally, even if it is not possible to add each language preferred by potential study participants, using MT tools for PH study invitations would ensure that more addressees understand the content of the invitation letters, which, given their official appearance, might otherwise leave them uncertain regarding missing out on something important or even undermine trust in PH departments and reduce participation in future studies or initiatives.

Limitations
Our findings should be considered with limitations.First, this review is limited to publications addressing the use of MT either as part of the research question or as a key point of discussion in the publications.It cannot be ruled out that MT might already be used as a routine tool, and therefore, its use is not reported in peer-reviewed papers.Second, we used an interpretative sentiment analysis to classify the principal findings for each article based on the extraction of selected statements.This exercise, although systematic and with the intention of objectivity, is prone to the authors' interpretation of enthusiasm regarding the specific dimensions of digital technology maturity.Finally, the search was limited to articles published only in English, which might bias the results toward studies examining MT from or into English.There is also a possibility that articles published before 2007 could contain information relevant to the research question.However, because the technology has evolved exponentially in the last 2 decades, prior information is likely to be outdated and no longer applicable to current standards.

Conclusions
Using MT in epidemiology and PH can enhance outreach to linguistically diverse populations.The translation quality of current off-the-shelf systems, such as Google Translate or DeepL Translator, is sufficient if postediting is a mandatory step in the translation workflow.Postediting of legally or ethically sensitive material requires staff with adequate content knowledge in addition to sufficient language skills.When preparing texts for translation, it is advisable to use shorter sentences and specifically mark domain-specific vocabulary for possible postediting.Unsupervised MT is generally not recommended.Research on whether machine-translated texts are received differently by addressees is lacking, as well as research on MT in communication scenarios that warrant a response from the addressees.

Figure 1 .
Figure 1.Flow diagram of the search and study selection process following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines.

Figure 2
Figure2provides a Sankey diagram visualizing PH information exchanges supported with the use of MT technology between groups of transmitters and receivers across the selected articles (Multimedia Appendix 5[29,).

Figure 2 .
Figure 2. Public health information exchange scenarios: transmitters and receivers of public health information and the types of public health materials.

Figure 3 .
Figure 3. Appraisal of study results.(A) Positive, negative, and mixed findings on the use of machine translation (MT) in public health settings by type of study and technology readiness dimensions.(B) Aggregate of final statements (N=70) by technological readiness levels across the 46 selected articles.IMP: implementation research; ELR: ethicolegal readiness; SER: socioeconomic readiness; TR: technical readiness; TSS: technology stability standards.