Use of Patient-Reported Outcome Measures and Patient-Reported Experience Measures Within Evaluation Studies of Telemedicine Applications: Systematic Review

Background: With the rise of digital health technologies and telemedicine, the need for evidence-based evaluation is growing. Patient-reported outcome measures (PROMs) and patient-reported experience measures (PREMs) are recommended as an essential part of the evaluation of telemedicine. For the first time, a systematic review has been conducted to investigate the use of PROMs and PREMs in the evaluation studies of telemedicine covering all application types and medical purposes. Objective: This study investigates the following research questions: in which scenarios are PROMs and PREMs collected for evaluation purposes, which PROM and PREM outcome domains have been covered and how often, which outcome measurement instruments have been used and how often, does the selection and quantity of PROMs and PREMs differ between study types and application types, and has the use of PROMs and PREMs changed over time. Methods: We conducted a systematic literature search of the MEDLINE and Embase databases and included studies published from inception until April 2, 2020. We included studies evaluating telemedicine with patients as the main users; these studies reported PROMs and PREMs within randomized controlled trials, controlled trials, noncontrolled trials, and feasibility trials in English and German. Results: Of the identified 2671 studies, 303 (11.34%) were included; of the 303 studies, 67 (22


Background
With the rise of digital health technologies and telemedicine services, the need for evidence-based evaluation is growing [1]. Over the past years, several evaluation guidelines that address study types, outcomes, and patient perspectives, among other requirements have been published [2][3][4][5][6][7]. The two best-known and most commonly used evaluation guidelines are the Model for Assessment of Telemedicine (MAST) applications [2] and the evidence standards framework for digital health technologies of the English National Institute for Health and Care Excellence (NICE framework) [3]. They have been used in several evaluation studies over the years [1,[8][9][10].
Focusing on outcomes, MAST provides the following elements as part of a multidisciplinary evaluation of telemedicine applications: clinical effectiveness, patient perspective, safety, economic aspects, organizational aspects, and sociocultural, ethical, and legal aspects [2]. The patient's perspective is evaluated by patient-reported outcome measures (PROMs), such as health-related quality of life (HRQoL) or behavioral outcomes, the latter being relevant when focusing on the domain of clinical effectiveness. In addition, patient-reported experience measures (PREMs) should be a part of the evaluation to assess satisfaction and acceptance, understanding of information, confidence in the treatment, ability to use the application, and empowerment [2]. The NICE framework provides minimum evidence standards and best practice standards for the evaluation of digital health technologies according to the degree of the treatment. Among them are, for example, the demonstration of effectiveness, use of behavior change techniques, and economic aspects. It also recommends the assessment of patient-centered outcomes in complex digital health technologies and specifically states that many of these outcomes should be measured using PROMs [3]. This demonstrates the importance of PROM and PREM in the context of evaluation studies of telemedicine applications.
The US Food and Drug Administration refers to PROMs as "any reports coming directly from patients about how they function or feel in relation to a health condition and its therapy, without interpretation of the patient's responses by a clinician, or anyone else" [11]. These reports are ideally collected using validated outcome measurement instruments (OMIs), which are regarded as cost-effective, efficient, and scalable, especially in the early stages of development of an innovative intervention [1]. In addition, PROMs are classified according to generic, disease-specific, and target group-specific OMIs [12].
OMIs that quantify the experience, satisfaction, acceptance, or quality of care from the patients' perspective are called PREMs. The goal of PREMs is to measure and report whether the provided care meets the expectations of the patients. Thus, PREMs are an indicator of patient centeredness and service quality in health care [13].
In the past, PROMs and PREMs have been used to evaluate the effectiveness and quality of care achieved when implementing telemedicine applications. Reviews of evaluation studies regarding telemedicine applications showed that single outcome domains such as HRQoL and psychological outcomes were used for specific use cases, such as inflammatory bowel disease management [14], adherence, self-efficacy, and self-management for medication management [15]. PREMs were used, for example, to measure satisfaction with knee pain management [16].
In summary, PROMs and PREMs have been recommended and already used for the evaluation of telemedicine applications. However, to the best of our knowledge, no systematic review exists to date that investigates the characteristics of the use of PROMs and PREMs in evaluation studies of telemedicine applications irrespective of application type and medical purpose.
It is still not known which and how often outcome domains and OMIs have been used in evaluation studies and whether the selection and frequency differ by the characteristics of the telemedicine application and the chosen study type. Our systematic review was conducted to close this research gap.

Objectives
This review aims to investigate the following research questions: 1. In which scenarios have PROMs and PREMs been collected for evaluation purposes? 2. Which PROM and PREM outcome domains have been covered and how often? 3. Which OMIs have been used and how often? 4. Did the selection and quantity of PROMs and PREMs differ between study types and application types? 5. Has the use of PROMs and PREMs in evaluation studies changed over time?
Furthermore, we will assess the extent to which the results can be transferred to use cases that have been derived from frequent combinations of application types and medical purposes.

Systematic Literature Research
To identify relevant articles, we conducted an electronic database search on MEDLINE and Embase. On the basis of the Population, Intervention, Comparison, Outcome, Studies scheme, the following inclusion and exclusion criteria were defined (Textbox 1): The search string (Multimedia Appendix 1) was based on 2 previous studies. The part dealing with the assessment of telemedicine applications is based on a review by Arnold and Scheibe et al [4], which aimed to identify standards for the evaluation of telemedicine applications. The part of the search string covering PROMs and PREMs is based on the PROM Group Construct and Instrument Type Filters of the University of Oxford [17]. This search string has already proven itself in the design of other reviews [18,19]. The search query was performed on April 2, 2020.

Development of Data Extraction Matrix and Used Taxonomies
A matrix was developed as the basis for data extraction. The studies were categorized by (1) [21,22]. This taxonomy was chosen because of its development based on empirical data, which allows its use in quantifying and statistically analyzing the characteristics of telemedicine applications. This taxonomy differentiates between 6 different application types: (1) teleconsultation, a process of providing health care from health care providers to patients over a distance [23]; (2) telediagnostics, a process where a disease is identified over a distance [24]; (3) teleambulance or tele-emergency, a process where emergency care is assisted or data are collected during an emergency over a distance [25]; (4) telemonitoring, a process of data collection over a distance for the purpose of medical decision-making [23,26,27]; (5) telerehabilitation, a process of data collection over a distance for the purpose of coping with the long-term consequences of a disease or an impairment [28]; and (6) digital self-management, a process to promote responsibility for one's own health and to encourage health literacy [29,30]. The classification into application types is intended to be the basis for subsequent subgroup analyses and has already been proven useful for this purpose in other systematic reviews evaluating telemedicine interventions [31,32].
All studies have been reviewed for the use of PROMs and PREMs; both could be represented by established and potentially validated OMIs, which were used frequently in nontelemedicine trials, or OMIs developed especially for the study in question. The OMIs were checked to verify whether they were established instruments or had been developed specifically for a study (SELF_PROM and SELF_PREM). The availability of a validation study served as an indicator of an established instrument. The psychometric properties of the OMIs were irrelevant for the classification into established and self-developed measures, as assessing the quality of the instrument was not within the scope of the review. The assignment of the OMIs to the individual outcome domains took place in an iterative process. In the first step, paraphrases were freely assigned to the OMIs. In the second step, the paraphrases were collected, mapped, and the corresponding categories were developed by the reviewers (AK and SH). The preliminary work of the Core Outcome Measures in Effectiveness Trials initiative provided the framework for the development of categories [33] but was supplemented by additional domains or modified where required. This was necessary, as the Core Outcome Measures in Effectiveness Trials initiative's taxonomy does not sufficiently describe and categorize PREMs to fit the purpose of this review; thus, they had to be developed inductively from the collected and mapped paraphrases. Furthermore, categories were assigned to either the PROM or PREM areas. In the third step, OMIs were assigned to the previously defined outcome domains. To ensure objectivity in the assignment of outcome domains, the reviewers wrote a codebook in advance (Table 1).

Data Extraction
The developed matrix provided the basis for subsequent data extraction. The extraction of paper characteristics and information concerning study type, medical purpose, and application type was performed by 1 reviewer (AK) because of the limited risk of misinterpretation. A total of 2 reviewers (AK and SH) independently performed the assignment of OMIs to PROM and PREM outcome domains based on the developed codebook. In case of any disagreement, assignments were discussed and resolved by consent. The complete data extraction matrix can be found in Multimedia Appendix 1.

Statistical Analysis
For the descriptive analysis, absolute and relative frequencies, mean values, and SDs were calculated for the individual outcome domains and for PROMs and PREMs. The calculations were performed once for all included studies as a whole and also individually for all study and application types. Correlation analyses according to Pearson for metric data and Kendall tau-b for ordinal data were performed to check the strength of dependencies.
To examine the transfer of results to individual subgroups, 3 use cases were selected from frequent combinations of medical purpose and application types. For this purpose, the frequent outcome domains and study types were determined and descriptively compared with the overall results.

Study Selection
Overall, the electronic search resulted in 2671 hits. Of the 2671 studies, 2136 (79.97%) studies were included in the title abstract screening after removing duplicates. A total of 2 reviewers (AK and LH) performed this step. AK screened all the papers, and LH screened a sample to validate AK's screening. The match between the reviewers was 82.3%, which, according to the AMSTAR 2 (A Measurement Tool to Assess Systematic Reviews) guidelines [34], legitimizes the examination of only a sample by a second reviewer. Of the 2136 papers, 627 (29.35%) papers were selected for full-text screening, which could be conducted by 1 reviewer (AK) because of the strictly formulated inclusion and exclusion criteria. Of the 627 papers, 303 (48.3%) papers were included in the review ( Figure 1). A complete list of all inclusions can be found in Multimedia Appendix 2.

Telemedicine Scenarios
All included studies (n=303) were categorized according to their medical purpose in terms of the ICD-10 chapter and the telemedicine application type ( Table 2). The most common ICD-10 chapters were I for diseases of the circulatory system (51/303, 16.8%), C for neoplasm (47/303, 15.5%), and F for mental and behavioral disorders (44/303, 14.5%). Studies that could not clearly be assigned to a chapter were summarized under the term other (40/303, 13.2%). These studies were usually telemedicine applications from the fields of primary prevention, aging, and well-being.

Use of Outcome Domains
In total, 339 different OMIs were used in 1114 cases in the included studies (n=303). The OMIs were classified into 89.4% (303/339) PROMs and 10.6% (36/339) PREMs ( Figure 2). Measurement instruments, which were developed especially for the individual study and were not listed in databases for PROMs and PREMs, were summarized in SELF_PROM or SELF_PREM. Measurement instruments for general satisfaction with the entire medical treatment process were summed up under the term SAT for satisfaction, which belongs to the field of PREMs and includes various forms of Likert scales, visual analog scales, and other self-developed constructs. Considering all studies, PROMs (881/1114, 79.08%) were used more frequently than PREMs (233/1114, 20.92%). The correlation analysis indicated that with an increasing number of PROMs, the number of PREMs decreased (r=−0.23; Figure  3). Across all studies, 21

Outcome Measurement Instruments
The most commonly used PROM OMIs were the HRQoL OMIs EuroQol five-dimension scale [35]  The PREM OMIs that were most commonly used were the Client Satisfaction Questionnaire-8 to measure treatment satisfaction [38] in 4% (12/303) of studies and the System Usability Scale usability OMIs in the domain technology [39] in 2% (6/303) of studies. The third most frequently used OMI was the Patient Assessment of Chronic Illness Care OMI, which also measures treatment satisfaction [40], in 1% (3/303) of studies, together with the Telehealth Acceptance Measure [41], used in 1% (3/303) of studies.

Chronological Trends in the Use of PROMs and PREMs
The included studies were clustered into 5-year groups for analysis of the evaluation practice development over time ( Figure 5). The year 2020 was not included in the analysis, as data were only available for the first 4 months of that year. The number of included studies increased above average over the years. The share of RCTs doubled every 5 years until 2014 and then dropped from 68.5% (50/73) to 43.7% (73/166) from 2014 to 2019.   The number of telemedicine studies has steadily increased over time. However, the number of studies reporting PROMs and the number of studies reporting PREMs increased more compared with MEDLINE hits.

Subgroup Analysis: Application Type
Subgroup analysis for application type was conducted to cluster the technologies described in the studies according to their intended medical purpose and to explore differences in the evaluation approaches. On average, more PROMs were applied in studies focusing on telerehabilitation (mean 3.82, SD 2.60) and digital self-management (mean 3.51, SD 2.51) than on teleconsultation (mean 2.63, SD 2.41), telemonitoring (mean 2.24, SD 1.92), and telediagnostics (mean 1.00, SD 2.00). The application of PREMs was distributed evenly across all application types (range of mean values 0.50-1.06). Figure 8 shows the mean values of the PROMs and PREMs used by application type and compared with the mean values of all studies. The values for all the application types and outcome domains can be found in Table 5.

Subgroup Analysis: Study Type
The second subgroup analysis was conducted based on the study type to evaluate the use frequency of PROMs and PREMs in different types of studies and the levels of evidence they were associated with. Of the 303 studies, 67 (22.1%) feasibility studies, 70 (23.1%) noncontrolled trials, 20 (6.6%) controlled trials, and 146 (48.2%) RCTs were identified. The study design served as an indicator of the evidence level of the studies [5]. The evidence level was determined according to the guidelines of the Oxford Centre for Evidence-based Medicine [47]. Study types with evidence level 3, such as feasibility studies (mean 1.66, SD 1.66) and noncontrolled trials (mean 1.66, SD 1.64), used fewer PROMs than controlled trials (mean 2.65, SD 2.72), with evidence level 2 or even RCTs (mean 4.12, SD 2.36), with evidence level 1. An opposite trend was observed for PREMs. The values for PREMs in order of increasing evidence level were as follows: feasibility study (mean 1.22, SD 0.87), noncontrolled trial (mean 1.00, SD 1.14), controlled trial (mean 0.70, SD 0.86), and RCT (mean 0.46, SD 0.71). The correlation analysis for the relationship between the number of PROMs or PREMs and the evidence levels resulted in r=−0.50 for PROMs and r=0.34 for PREMs (Figure 3). Table 6 lists the complete distribution of outcomes by study type.

Use Cases
Three use cases were formed to check the results for transferability and were based on common combinations of medical purpose and application type. The use cases were telemonitoring for cancer diseases (21/303, 6.9%), teleconsultation for mental and behavioral disorders (22/303, 7.3%), and telerehabilitation for cardiovascular diseases (21/303, 6.9%). Although the total number of studies on telemonitoring for diseases of the circulatory system was 22, we chose to cover the widest possible range of characteristics within the presented use cases. Therefore, we opted for telemonitoring for cancer diseases and telerehabilitation for cardiovascular diseases, although these have lower numbers.
A descriptive analysis of the distribution of PROMs and PREMs and their outcome domains was also conducted. Again, the ratio of PROMs was different from that of PREMs (Figure 9). Similarly, the proportion of PREMs in the use case of telemonitoring for cancer diseases with evidence level 3 was higher than in the other 2 use cases with evidence level 1. HRQoL and emotional function were found to be the most frequently used outcome domains in all 3 cases (Table 7). Only the third most frequent outcome, satisfaction, was case-specific; it accounted for half of the cases. The results of the entire sample could be transferred to the 3 use cases, which could be an indication of the transferability of the review results to specific use cases.

Summary and Discussion of Main Findings
The aim of this systematic review was to empirically examine the characteristics of PROM and PREM use in evaluation studies of telemedicine applications. Owing to the large number of possible combinations of application types (n=6) and medical purposes (n=24), there was great heterogeneity in the evaluation studies. Of the 144 possible combinations, 51 (35.4%) were identified in this study. However, we were able to answer the research questions.
PROMs dominated the evaluation of telemedicine applications. In total, 80% (4/5) of OMIs were PROMs, and only in 14% (1/7) of studies was no PROM used. On the other hand, PREMs were used in less than half of the studies, and hardly any of these PREMs were adapted to telemedical care. The lack of telemedicine-specific OMIs was apparently compensated for by the use of self-developed OMIs. This could indicate that the existing OMIs could not be applied because of the great heterogeneity of the telemedicine-specific use cases, did not collect the desired outcomes, or were simply not known to the evaluation team. The review by Hajesmaeel-Gohari and Bahaadinbeigy [48] in 2021 examined the use of validated telemedicine-specific OMIs in the form of PREMs for the evaluation of telemedicine service quality. The review was able to identify 59 different PREMs, of which only the 10 most frequent were mentioned. Our review was able to identify 70% (7/10) of the most frequent PREMs. However, the frequency distributions of the OMIs used do not match between the two reviews, as Hajesmaeel-Gohari and Bahaadinbeigy [48] identified a higher number of PREMs because of a more specific search strategy. They concluded that the use of PREMs for the evaluation of the quality of telemedicine applications should be obligatory and needs to be expanded, which also requires the development of further specific OMIs [48].
The quantity of PREMs decreased with an increasing number of PROMs; that is, a negative correlation (r=−0.23) was observed. One explanation for this correlation could be that the number of OMIs and outcome domains was kept as low as possible. In the sample, the median was 3 OMIs and outcome domains per study. However, the number of outcome domains per study varied (SD 2.36). As the OMIs are constructs of several items, depending on the instrument, this can range from a handful to several dozen items; the total number of items should be taken into account when selecting the OMIs [48]. Furthermore, the study participants or patients should not be overwhelmed by the total number of OMIs and included items as this could lead to incomplete answers or even dropout [49].
The number of telemedicine studies that collected PROMs and PREMs increased on average over time ( Figures 5 and 6). In addition, the proportion of high-evidence studies, especially RCTs, also increased ( Figure 6). It was shown that in years with a high proportion of high-evidence studies, the ratio of PROMs was considerably higher than the ratio of PREMs, as described above. This could be caused by the wider recognition and implementation of PROMs and PREMs [50,51], as can be seen in Figure 7, where the growth rate of studies using PROMs and PREMs is far higher than the growth rate of telemedicine papers in MEDLINE. The trend toward the increased use of PROMs and PREMs is also evident in several medical disciplines, such as oncology [52] and orthopedics [53], as well as in studies for regulatory purposes for medical devices [54].
In addition, guidelines that recommend the use of PROMs and PREMs published in recent years (eg, MAST 2012 [2] and NICE framework 2019 [3]) could have promoted the increased use of PROMs and PREMs over the years. These guidelines also recommend the use of high-evidence study designs. Again, an increased use of RCTs has been noticed since the publication of these guidelines.
Regardless of the telemedicine evaluation tools used, variations can be found between countries regarding the state of PROM and PREM implementation, types of data use, conditions and therapeutic areas, and challenges and success factors for PREM and PROM use [55]. Hence, regional and cultural aspects must be taken into account when developing, translating, and implementing PROMs, especially if they are measured using electronic tools [56]. Furthermore, these aspects have to be considered when evaluating PROM and PREM scores and comparing them between different countries.
The ratio of PROMs to PREMs also depended on the study type and evidence level. Although in low-evidence studies the frequency of PREMs was almost equal to the frequency of PROMs, it decreased with increasing evidence level. At the same time, more outcomes were recorded at high evidence levels ( Figure 6). This could be related to the development cycle of telemedicine technologies [5]. Using evidence level as a surrogate parameter for the maturity stage of the application, feasibility studies and proof-of-concept studies increasingly require information on the usability and acceptance of the technology in addition to the clinical effectiveness. On the other hand, PREMs played almost no role in clinical trials with high evidence levels. PROMs clearly dominated in RCTs in relative and absolute numbers. This is also reflected in the Khoja-Durrani-Scott framework for eHealth evaluation [6]. Khoja et al [6] subdivided the development cycle of an eHealth application into 4 phases. The framework recommends focusing on typical PREM domains, such as usability, user-friendliness, and acceptance in the early phases of development. In later phases, evaluation should focus on health outcomes, such as quality of life and health impact, although these should also be recorded in the early phases. The design and evaluation framework for digital health interventions by Kowatsch et al [5] goes one step further and specifies the outcomes as well as the required study designs for each phase. With each phase, the evidence level of the study designs increases, and the focus of the outcomes change according to the needs. The first phase, the preparation phase, includes feasibility and acceptability studies to determine the ease of use and adherence. In the optimization phase, the first evidence of effectiveness, expected benefits, and satisfaction with the quality of the application should be measured. In the later phases, that is, the evaluation and implementation phases, the success of the implementation of digital health applications should be monitored. The fact that the selection of the evaluation design and outcomes should be made according to the stage of development and should have an appropriate level of evidence has also been pointed out by the MAST model [2] and the evaluation principles of Arnold and Scheibe et al [4]. The correlation of the PREMs (τ=0.35) and the PROMs (τ=−0.45) with evidence level indicates that evaluation was performed as described in the guidelines for maturity stage-based evaluation.
A key milestone in the implementation of PROMs and PREMs in evaluation studies of telemedicine interventions was set by Germany in 2020 with the Digital Care Act. One significant innovation is that the costs for the use of so-called digital health applications will be reimbursed by statutory health insurance [7,57]. As a result, since October 2020, around 90% of the population is entitled to a wide range of mobile health applications in the areas of telerehabilitation, telemonitoring, and digital self-management [57]. Another significant innovation is that the assessment of bankability does not exclusively depend on the medical benefits, which, among clinical and epidemiological outcomes, could be assessed by PROMs, such as HRQoL, but also on the so-called patient-relevant improvement of structure and processes, which are mainly assessed by PROMs and PREMs. Examples of patient-relevant improvement of structure and processes are coping with difficulties in everyday life because of illness, facilitating access to care, health literacy, patient autonomy, reduction of therapy-related expenses, and burdens for patients and their relatives [7]. Medical benefits and patient-relevant improvements of structure and processes are now of equal importance in the approval process of digital health applications, and only one of the outcomes has to be more effective than standard care [7]. This represents a significant increase in the importance of PROMs and PREMs in evaluation studies of telehealth applications. The reason for including patient-relevant improvement of structure and processes as an outcome in evaluation studies was that digital health applications are considered to improve patient self-efficacy [58] and health-related behaviors, such as adherence [59] and health literacy [60]. In our review, 31.7% (96/303) of the included studies assessed the effects on adherence to medication or other therapies, and 10% (30/303) assessed health literacy. The Danish MAST does not mention the measurement of health-related behavior changes [2]. Within the NICE framework, originally developed in the United Kingdom, applications with the purpose of improving health-related behaviors are assigned to their own group [3]. However, neither the MAST nor the NICE framework explicitly recommends capturing adherence or health literacy for all types of applications. Health literacy is not only an outcome but it is also a critical precondition for the successful use of telemedicine by the patient in addition to digital literacy. To ensure the appropriate use of the technology and the assessment of PROMs and PREMs, proper training and guidance of the users is of at least equal relevance, according to the literature [56,[61][62][63][64]. Therefore, health literacy should not only be included in the evaluation merely for reasons of measuring effectiveness; it is also a possible factor influencing purposeful and successful telemedicine use by the patients [58,60,65,66]. In summary, future developments will show to what extent and in which way innovations from Germany will affect the use of PROMs and PREMs in evaluation studies of telemedicine applications.

Strengths and Limitations
One limitation of the study is that the medical purpose was classified by the ICD-10 chapters, all of which only describe a group of diseases and not the disease itself [20]. Chapter 1, for example, covers circulatory diseases, which include congenital heart defects, strokes, and aneurysms, all of which differ in etiology, symptoms, and therapy. There was a similar degree of heterogeneity in telemedicine applications. A more detailed distinction between user groups, setting, technical execution, and other criteria exists in the taxonomy used as a basis for the subgroup analysis, but this was not considered in our review [21]. The same applies to the analysis of single OMIs. The problem of heterogeneity is not an issue inherent only to this study. In their paper published in Nature in 2020, Guo et al [1] pointed out that the different types of interventions, medical purposes, and outcomes can lead to limitations in reviews of digital health interventions in general.
Another limitation was the large number of possible combinations of medical purpose, application, and study type. Nevertheless, several patterns were identified to answer the research questions, and the results of the entire sample could be transferred to use cases; thus, the influence of heterogeneity was not as great as initially assumed.
Another limitation might be that only 1 reviewer performed full-text screening. In the context of classical systematic reviews for the purpose of evidence synthesis of effectiveness or risk factors, screening by 2 reviewers is mandatory to minimize beta error. The approach of our review, on the other hand, was different. We intended to use the methodology of a systematic literature search to generate data for quantitative analysis. Owing to the 627 studies to be screened, an increased beta error in the form of missing studies seemed acceptable to us for reasons of research economics. As we wanted to conduct a plain descriptive analysis of the data with a total of 303 included studies, we did not consider the validity of the result to be compromised.
The strength of the review is that, to the best of our knowledge, this is the first systematic review investigating the characteristics of PROM and PREM use in evaluation studies of telemedicine applications covering all application types and medical purposes.
Reviews do exist for specific use cases; however, these usually do not cover all outcomes. Instead, they focus on selected outcomes for the purpose of evidence synthesis or do not focus exclusively on PROMs and PREMs [14][15][16]48,[67][68][69][70][71].
Preliminary excerpts of the review results were presented to an expert audience of health care scientists at a conference in October 2020 [72].

Implications for Future Research
High heterogeneity reflected by the multitude of OMIs used per outcome domain and a lack of standardization poses a challenge to the selection of PROMs [70,71] and PREMs. New developments and updated versions of existing guidelines for the evaluation of telemedicine could contribute to further standardization in the selection of outcome domains and OMIs [73].
The use case analysis indicated that the most common outcome domains were HRQoL and emotional function, which could be the first starting point for further efforts. Equally, user satisfaction and usability [48] as well as health literacy and adherence [7] should be taken into account, although these outcome domains were not frequently surveyed in our review.
Further investigation will be required to reveal how the use of PROMs and PREMs for the evaluation of telemedicine will evolve over the next few years and if the trends observed in this review will persist.
In addition, upcoming studies will have to investigate how a greater consideration of PROMs and PREMs in German approval and reimbursement procedures for digital health applications will affect the future use of PROMs and PREMs in evaluation studies of telemedicine applications.

Conclusions
In recent years, there has been an increasing number of studies, particularly high-evidence studies, that use PROMs and PREMs to evaluate telemedicine services. Despite the great heterogeneity of telemedicine interventions and the associated evaluation approaches, several conclusions can be drawn. PROMs have been in the focus of evaluation studies. With the increasing maturity stage of telemedicine applications and higher evidence levels, the use of PROMs has increased. PREMs played a role, especially in the initial phases of application development, with low-evidence study designs. In this case, PREMs were primarily used to test the usability and acceptance of the application. Regardless of the findings, telemedicine-specific PREMs should be used more frequently and in a standardized manner to continuously evaluate telemedicine service quality, both during and after implementation.
The distribution of the outcome domains showed that only HRQoL and emotional function were assessed in almost all studies. Simultaneously, health literacy as a precondition for using the application adequately, alongside proper training and guidance, has rarely been reported. At the level of the OMIs, it was shown that many different OMIs were used for each domain. Further efforts should be pursued for the standardization of PROM and PREM collection in evaluation studies of telemedicine applications.