Laughter-inducing therapies: Systematic review and meta-analysis

Rationale: Laughter-inducing therapies are being applied more regularly in the last decade, and the number of scientific reports of their beneficial effects is growing. Laughter-inducing therapies could be cost-effective treatments for different populations as a complementary or main therapy. A systematic review and meta-analysis has not yet been performed on these therapies for different populations and outcomes, but is needed to examine their potential benefits. This research aims to broadly describe the field of laughter-inducing therapies, and to estimate their effect on mental and physical health for a broad range of populations and conditions. Method: A systematic review of the field was undertaken, followed by a meta-analysis of RCTs and quasi-ex- perimental studies. The systematic review included intervention studies, one-session therapies, lab studies and narrative reviews to provide a broad overview of the field. The meta-analysis included RCTs or quasi-experi- mental studies that assessed multi-session laughter or humor therapies compared to a control group, performed on people of any age, healthy or with a mental or physical condition. English and non-English articles were searched using PubMed, Web of Science, EBSCO and EMBASE. Search terms included laugh(ing), laughter, humo (u)r, program, therapy, yoga, exercise, intervention, method, unconditional, spontaneous, simulated, forced. Studies were classified as using humor (‘spontaneous’ laughter) or not using humor (‘simulated’ laughter). Results: This systematic review and meta-analysis suggests that (1) ‘simulated’ (non-humorous) laughter is more effective than ‘spontaneous’ (humorous) laughter, and (2) laughter-inducing therapies can improve depression. However, overall study quality was low, with substantial risk of bias in all studies. With rising health care costs and the increasing elderly population, there is a potential for low-cost, simple interventions that can be administered by staff with minimal training. Laughter-inducing therapies show a promise as an addition to main therapies, but more methodologically rigorous research is needed to provide evidence for this promise.


Introduction
There is some evidence that laughter has physical, emotional, and social benefits (Bennett et al., 2014;Mora-Ripoll, 2011;Yim, 2016). However, scientific research is still in an early stage when it comes to empirically determining the therapeutic value of laughter. Different physiological and psychological effects of laughter have been anecdotally reported, e.g. decreasing pain, strengthening immune function, mitigating stress and improving social support (Bennett and Lengacher, 2006;Martin, 2001;Mora-Ripoll, 2011). Laughter is presumed to decrease levels of stress hormones, and theorized to buffer the effects of stress on the immune system and thus elevate our mood (Bennett and Lengacher, 2009). Current literature broadly distinguishes between Following the saying that laughter is the best medicine, in the past twenty years, 'spontaneous' (humorous) and 'simulated' (non-humorous) laughter have been applied in humor or laughter therapies in a wide range of settings to improve health and well-being. These therapies have been tried in a range of settings, from children to the elderly, and with a broad range of targeted outcomes such as mental health, cancer, diabetes, migraine, and other chronic conditions. Despite many studies on myriad forms of laughter therapies for different patient populations or healthy individuals; as yet there has been no comprehensive systematic review or meta-analysis to assess whether these applications are effective or not. In this study, we focus on laughter-inducing therapies that are compared to control groups (no treatment or attention control). We also investigate the outcome differences based on how laughter was induced, either through 'spontaneous' or simulated' laughter.
Laughter-inducing therapies come in different formats. Humor therapies typically include laughter exercises with humor such as humorous videos or clowns Low et al., 2013;Szabo, 2003;Tan et al., 2007). Laughter therapy without using humor typically includes exercises such as clapping, dancing, and vocalizing laughter-like sounds like "hoho-hahaha", but can also include elements not involving laughter, such as breathing and relaxation exercises (Yim, 2016). Laughter Yoga is a specific example of non-humorous laughterinducing therapy; is encouraged to be done in groups, involving laughter exercises, clapping, and yoga exercises like breathing and relaxation exercises. (Cokolic et al., 2013;Farifeth et al., 2014;Miles et al., 2016;Nagendra et al., 2007;Yazdani et al., 2014). Overall, most of these studies report an improvement of physical or mental health, as compared to a control group. The control group typically receives usual care, no intervention, or an attention control condition where e.g. plants are watered or crafts projects are made (Cai et al., 2014;George and Jacob, 2014;Jung et al., 2009;Kim et al., 2015;Low et al., 2013). Improvements such as decreased depression, pain, and stress hormones, and improved mood and life satisfaction, were reported and will be discussed briefly.
Previous reviews have attempted to summarize the field. McCreaddie and Wiggins (2008) reviewed the direct and indirect links between humor and health, specifically in nursing applications. They found that research designs were lacking methodologically, and were unable to provide strong evidence for humor having a positive effect on health. Furthermore, most included studies were correlational and lacked the appropriate randomized controlled design to infer causality.
Mora-Ripoll (2011) conducted a narrative literature review of both 'simulated' and 'spontaneous' laughter therapies, and concluded that there is some evidence that 'simulated' laughter (non-humorous laughter) has positive effects on health compared to control groups (waiting list or receiving no intervention), other experimental groups (exercise therapy), or not compared to another group (interventional study). He also noted that there are practically no contraindications of laughter, and that few adverse effects were reported. These conclusions were based on a combination of randomized, interventional, observational, and non-randomized studies. Bennett et al. (2014) conducted a narrative review of laughter and humor therapy specifically for patients undergoing dialysis. They concluded that laughter and humor therapies have positive effects compared to control groups (no intervention or active control) on immunity, pain, sleep quality, respiratory function, depression, and anxiety; all of which are relevant for patients undergoing dialysis. Bennett and colleagues concluded that non-humorous laughter therapies, such as Laughter Yoga, are suitable for dialysis patients. However, they noted that it is unclear if these health benefits are sustained in the long-term. Again, these conclusions were based on a combination of both randomized and non-randomized studies.
Gonot-Schoupinsky and Garip (2018) conducted a systematic review of laughter and humor interventions for adults over 60 years old. They concluded that these interventions appear to enhance well-being, but that there is insufficient evidence that the laughter itself is causing the enhancement, as there are a range of confounding factors. They note that participant laughter has to be isolated and measured to build evidence for these interventions.
Overall, all four reviews concluded that sufficient evidence exists to suggest health benefits of laughter and humor, but that clear guidelines for laughter and humor therapies are necessary. In two reviews (Bennett et al., 2014;Mora-Ripoll, 2011), 'simulated' (non-humorous) laughter was mentioned as specifically interesting for laughter therapies, due to its applicability to many different populations and settings.
Against a backdrop of increasing healthcare expenditure, there is a potential for simple, cheap, broadly applicable, and easily implementable therapies such as laughter therapies as a cost-effective addition to regular healthcare. Therefore, there is a need for a systematic review and meta-analysis of laughter-inducing therapies to explore their potential efficacy and effectiveness. Moreover, the existing (narrative) reviews are due for an update with evidence from recently published laughter-inducing interventions.
The goal of this article is to provide a systematic review of the literature on laughter-inducing therapies. Specifically, we aimed to summarize the state of the literature on laughter-inducing interventions for a broad range of (mental) health outcomes. In this work, we focus on laughter-inducing therapies that are compared to control groups (no treatment or attention control), and, following previous literature, looking specifically at how laughter was induced: using humor or not using humor. Sufficient outcome data were retrieved to perform a metaanalysis of randomized and quasi-experimental laughter-inducing therapy trials.

Search strategy
All searches in PubMed, EMBASE, Web of Science (WoS) and EBSCO were performed in October 2016 and continuously updated until December 2017 using the following queries (for WoS and EBSCO): TITLE: (laugh* OR humo$r) AND TITLE: (program* OR therapy OR yoga OR exercise OR intervention OR method OR unconditional OR spontaneous OR simulated OR forced) NOT TITLE: ("aqueous humor" OR "aqueous humor" OR "vitreous humor" OR "vitreous humor"). Exact search terms can be found in Online Supplement Appendix I. Initial searches for laughter-inducing therapies in the grey literature were performed from July until October 2016 to determine a minimum of 10 benchmark articles for the complete search in the databases. Google Scholar was searched with the search terms: Laughter, Laughter yoga, Unconditional laughter, Simulated laughter, Laughter therapy, Laugh, Laughing, Laugh therapy, Review. Eligible studies on the first 10 Google Scholar pages were added. References from eligible studies and review articles were checked for articles to add. Authors of included articles and experts in the field were contacted for additions. Journals in which eligible studies were published were hand-searched for more studies. This resulted in a list of 13 benchmark studies for the complete search in the databases mentioned above, as well as providing additional search terms (Beckman et al., 2007;Bennett et al., 2003, Chang et al., 2013, Cho and Oh, 2011, De La Fuente and Gonzalez, 2010, Dolgoff-Kaspar et al., 2012, Foley et al., 2002, Hirosaki et al., 2013, Hsieh et al., 2015, Ko and Youn, 2011, Mora-Ripoll, 2011, Nagendra et al., 2007, Neuhoff and Schaefer, 2002, Raja and Sundari, 2014., Sakai et al., 2013, Shahidi et al., 2011, Yazdani et al., 2014. Of all included studies for the systematic review, one article was in Spanish, one in German and 31 in Korean. The articles were translated with Google Translate. One of the authors reads German and Spanish, verified the translations against the original text and found that the translations were sufficient to extract the necessary data. The translations from Korean to English were first checked against the provided English language abstract. The authors and a Korean translator looked at the translations to see if the data could be extracted. Even though the grammar was suboptimally translated by Google Translate, we concluded that sufficient data could be extracted as most tables with statistics were understandable and the translator clarified mistranslations from Google Translate.

Eligibility criteria
We included studies in the systematic review with (a): people of any age, healthy or with a mental or physical health condition; undergoing (b) laughter-inducing therapy compared to (c): all forms of control or comparison groups (e.g. waiting list control or control group therapy), as assessed by (d) all mental and physical health-related outcomes. Case-control studies were excluded. From this selection, included studies in the meta-analysis (e): had a control group and (f) used multiple intervention sessions.

Study selection
Eligibility assessment was performed independently in a blinded standardized manner by 2 reviewers (CNvdW and RK), screening first on a title basis, and then on an abstract and full-text basis. In the title and abstract phase, disagreement on whether to include a paper meant this paper would be assessed in more detail at the abstract or full-text level. Final disagreements between reviewers on full-text basis were resolved by discussion until consensus was reached (initial kappa at full-text screening = 0.62; final kappa after discussion = 1). Disagreements were about whether articles should be included in the systematic review if it included (1) more background information on laughter, but no report from an experiment, or (2) experiments with just one laughter session. We agreed to exclude explorative, correlational and one-session studies.

Data extraction
Study information on the publication language, study design, participants, type of intervention, study outcomes, number of sessions, overall duration of treatment, and humorous/non-humorous therapy, was extracted by one reviewer (CNvdW) and independently crosschecked by another reviewer (RK) using a predefined, standardized data extraction spreadsheet. Inconsistencies in data extraction were resolved between the reviewers by referring to the source study, and consensus was reached between both reviewers.

Therapy classification
Therapies were classified based on the way laughter was induced in the therapy, as mentioned in the primary study's methods section. Interventions were classified as 'using humor', 'not using humor' or 'unknown'. The classifications were operationalized as following: an intervention was classified as 'using humor' ('spontaneous laughter') if some humorous stimulus was mentioned as being part of the therapy, such as humor, jokes or humorous videos. Interventions were classified 'not using humor' ('simulated laughter') if it was specifically mentioned that only Laughter Yoga (which is non-humorous per definition) or nonhumorous laughter was used, or when all elements of the interventions were clearly mentioned and none of them involved humor. 'Unknown' interventions were classified as such when the content of the 'laughter therapy' was not specifically mentioned, and thus could not be definitively classified as 'humorous' or 'non-humorous' laughter. When interventions combined both non-humorous and humorous laughter, these interventions were classified 'humorous', to indicate the effects of non-humorous laughter therapies versus a therapy that includes only humorous laughter or a mix of laughter including humorous laughter. In case of disagreement between the reviewers, the text was revisited for a stimulus inducing laughter to determine if humor was used or not. All disagreements were due to overlooking this information and after careful inspection of the text, all disagreements were resolved.

Outcome measures
All mental and physical health outcome measures were included in the systematic review to give a full overview of the field. This was done for two reasons: 1. the absence of a systematic review so far; 2. the relatively low number of studies allowed for this task to be undertaken. In the meta-analysis, only mental health outcomes were used due to the diversity of physiological outcomes, which would have made pooling of results difficult to interpret.

Analytic approach and data synthesis
Due to the methodological differences in the data set, a two-step approach to data synthesis was used. Firstly, a systematic review was performed on all included studies to provide an overview of the field (see Sections 3.2-3.5). We included quantitative results in this overview as well, such as effect sizes and mean differences with 95% confidence intervals. Adding quantitative results prevents a simple vote-counting procedure, which could leave important information out about the replicability of results, effect sizes, and difference in mean values between the experimental and control groups. Moreover, it adds contextual information on clinical relevance in the case of statistically significant outcomes, for example in cases where outcomes are statistically significant but the difference is so small as to be clinically negligible. The Cohen's d effect sizes are corrected by using the pooled pretest standard deviation for weighting the differences of the pre-post-means, d ppc2 , (Morris, 2008) or by computing Hedges' g for both groups and to subtract them afterwards to correct for different sample sizes and pre-test values, d Korr , (Klauer, 2001).
Secondly, for a subset of quasi-experimental studies and randomized controlled trials, a meta-analysis was performed to estimate separate pooled effect sizes for both study designs. Randomized and quasi-experimental studies were not pooled and analyzed as separate data sets (see Section 3.6) (Verde and Ohmann, 2015). For the randomized studies, the outcome measures were the standardized effect sizes between treatment and control groups as calculated using Hedges' g, which resembles Cohen's d but attempts to correct the effect size for small studies (Hedges and Olkin, 1985). Studies with insufficient information for calculating effect sizes using Comprehensive Meta-analysis were excluded if sufficient additional information could not be retrieved from article appendices, additional publications from the same data set, or by contacting study authors. Subgroup analyses were conducted using the procedures as implemented in Comprehensive Meta-Analysis, using a mixed effects model (pooled within groups using a random effects model; but differences between groups were tested using a fixed-effects model). We calculated the I 2 and T 2 statistics to assess relative and absolute heterogeneity of effect sizes within subgroups. A common interpretation is that an I 2 value of 0% indicates no observed heterogeneity, and higher values indicate higher observed within-subgroup heterogeneity (25% -low, 50% -moderate, 75% -high; Higgins et al., 2003). A higher observed statistical heterogeneity indicates a higher proportion of observed variance, which can indicate underlying differences between the pooled studies. This makes interpreting the pooled effect size difficult, as it is difficult to distinguish the observed effect size from the true population effect size (Borenstein et al., 2017). Comprehensive Meta-Analysis version 2.2.057 was used for all analyses.

Assessment of risk of bias
RK and CNvdW independently rated the risk of bias in each study using the Cochrane Collaboration's Risk of Bias Assessment Tool. Studies were rated for adequate sequence allocation, concealment of allocation, selective outcome reporting, masking of assessors, and whether incomplete data were addressed (Higgins et al., 2011). Differences in the assessment of bias were resolved through discussion.
Since treatment allocation was obvious for most studies, blinding of participants was not assessed. Additionally, blinding of outcome assessment was not considered a major risk of bias as most outcomes were self-report measures; nevertheless, this was assessed. Risk of bias is presented graphically in Fig. 1 and per-study results can be found in Online Supplement Table 1. In the meta-analysis, publication bias was tested by inspecting a funnel plot of primary outcome measures, and the 'trim and fill'-procedure was used to correct for publication bias by imputing studies presumed missing due to unpublished negative or null findings (Duval and Tweedie, 2000). Egger's intercept test (Egger et al., 1997) and Begg and Muzamdar's tests (Begg and Mazumdar, 1994) were performed as additional tests of publication bias.

Risk of bias
Overall, reporting quality was low or very low, and crucial information to assess risk of bias was often missing from the studies. Therefore, most studies were at unclear risk of bias for the five outcomes that were assessed. Since none of the assessed studies were preregistered in a trial registry, we could not evaluate whether outcomes were switched or selectively reported; hence the risk of bias assessment for these elements was performed using only information available in the study. Thus, if a study reported using an outcome measure in the methods section but failed to report the actual outcome of this measure in the results section, this would be rated as being at a high risk of bias. This is a rather lenient interpretation of the selective outcome reporting criterion as used in the Cochrane Collaboration's Risk of Bias Assessment Tool (Higgins et al., 2011). Fig. 1 graphically depicts the overall risk of bias within studies, and the individual assessments are presented in Online Supplement Table 1.

Systematic review
We screened 98 potentially relevant articles, and finally included 86 studies in our systematic review and 29 in our meta-analysis. Reasons for exclusion from the systematic review were: no intervention (k=7), full-text unavailable (k = 3), or the text was not translatable using Google Translate (k = 3). Reasons for exclusion from the meta-analysis were: the intervention had only 1 session (k = 12), there was no control group (k = 14), mixed-therapies intervention (1), or is a review article (k = 3). The process of study selection is shown in Fig. 2. Therefore, the systematic review includes low quality studies, such as studies without a control group, a small sample size, or a very high risk of bias, but gives a more complete overview. The meta-analysis only includes quasiexperimental studies and randomized controlled trials.

Study characteristics
The 86 articles in this systematic review can be categorized into three review articles and 83 studies on laughter therapy with humor, without humor, or from which it is unknown if it is with or without humor. From all 83 articles about interventions, 14 were randomized controlled trials and 41 quasi-experimental pre-test post-test design studies including a control group. The other 31 studies were either pilot studies, field studies without a control group, or had an unknown study design. These study details are presented in Online Supplement Table 2.
Studies originated from the Americas (USA, Colombia), Asia and the Middle East (China, Iran, India, Hong Kong, Korea, Taiwan, Thailand, Japan), Europe (Germany, Slovenia, Switzerland, United Kingdom), Africa (South Africa) and Australia. Outcome measures varied broadly, from mental health and well-being (e.g. agitation, anxiety, cognitive function, coping responses, depression, laughter, life satisfaction, mood, pain, quality of life, resilience, self-efficacy, self-esteem, stress) to physical health (e.g. blood glucose level, blood pressure, body weight, fatigue, heart rate, immune function, insomnia, pulmonary function, sleep quality). Laughter was induced through: clapping, dancing, facial muscle exercises, laughing with a clown, laughter exercises (e.g. laughing 'big' and 'small', giving each other applause, smiling), laughter yoga, and watching (self-selected) humorous videos. Detailed study characteristics, such as study aim, outcome measures, assessment points, limitations, and results are presented in Online Supplement Table 3. The study results will be discussed below.

Systematic review: broad overview of all included studies
When reviewing the literature, we found three broad main subgroups. The first subgroup is based on three most common outcome measures reported in the studies: depression, anxiety and stress. The second subgroup is based on the elderly population that was used most often in the studies. The third subgroup consists of populations or outcome measures that otherwise do not come to light in the metaanalysis or systematic review, but we feel are important when giving a full overview of the field.

Subgroup 1: most common outcomes -depression, stress, and anxiety
Depression. In 31 different studies, depression was an outcome measure (see Table 1). In 26 studies, depression significantly decreased due to laughter-inducing therapy. Besides the statistical significance, we compared effect sizes and mean differences for all studies. The average corrected effect size for all studies is d dppc2 = 0.85 (d Korr = 0.80). The average effect size for randomized controlled trials only is d dppc2 = 0.57 (d Korr = 0.63). We also made a distinction between the way laughter was induced in the therapy; either with or without using humor. The average effect size for humorous therapies is: d dppc2 = 0.43 (d Korr = 0.40), and for non-humorous therapies: d dppc2 = 1.14 (d Korr = 1.187). After removal of an extreme outlier (Kim, 2010), for non-humorous therapies, this dropped to d dppc2 = 0.73 and d Korr = 0.78. Overall, all studies show a medium effect size and nonhumorous therapies show an effect size twice as large as humorous therapies.
We then looked at 'replicability' of results. This meant, we plot the mean differences and the 95% confidence intervals of studies that measured depression with the Beck Depression Inventory (BDI) and the Geriatric Depression Scale (GDS) respectively (see Figs. 3 and 4). These   . 3). Further, there is no clear replication of the significant Cai et al. (2014) outcome, in the other two studies with nonsignificant outcomes Cho and Oh, 2011). From this plot, we cannot conclude whether laughter-inducing therapy has an effect on depression. For GDS as outcome, a reduction in depression is likely, as seven out of 10 studies have their 95% confidence interval below zero (see Fig. 4). Only four out of 10 studies have an expected reduction of the GDS outcome of four or more points (GDS outcome range is [0,30]). The other six studies show a reduction of zero to two points. Even though the reduction is significant, it might not be clinically relevant. Only a small to medium reduction in the GDS score can be expected based on this plot. There is no clear reduction for CES-D and CSDD scores (see Table 2). For the other results there are not enough similar outcomes measures to plot a comparison. Stress. In 19 different studies, stress was an outcome measure (see Table 2). In 18 studies, stress significantly decreased due to laughterinducing therapy. Stress was measured subjectively as perceived stress or objectively as a cortisol level. The average effect size of all studies is: d dppc2 = 0.58 (d Korr = 0.60). The average effect size for randomized controlled trials only: d dppc2 = 0.51 (d Korr = 0.56). We could not ascertain the difference in effect sizes between humorous and non-humorous therapies, as there were only non-humorous therapies for this outcome measure. The average effect size for non-humorous therapies only: d dppc2 = 0.66 (d Korr = 0.55).
Next, we looked at the possible 'replication' of stress outcomes. The confidence intervals of the perceived stress measures show evidence of a reduction in stress (see Table 2). The upper bounds of four out of five studies stay below zero, but because they are different measures we cannot conclude replicability of these results. The mean differences for cortisol measurements indicate there is no evidence that laughter-inducing therapy will be beneficial for cortisol (see Fig. 5). The majority of mean differences (6/7) show a positive (statistically significant) effect, but the effects are between zero and negative two, and the upper bound of the confidence intervals cross zero in all cases. Although most studies report statistically significant reductions in cortisol levels, these differences are unlikely to be of meaningful clinical benefit.
Anxiety. In 15 different studies, anxiety was an outcome measure (see Table 3). In 14 studies, anxiety significantly decreased after laughter-inducing therapy. The average effect size for all studies is: d dppc2 = 0.81 (d Korr = 0.92) and for randomized controlled trials: d dppc2 = 0.98 (d Korr = 1.04). A further distinction between humorous and non-humorous therapies could be made. The average effect size for humorous therapies is: d dppc2 = 0.51 (d Korr = 0.53) and for non-humorous therapies is: d dppc2 = 1.00 (d Korr = 1.19). Similarly, as for the studies with depression as an outcome, the non-humorous studies have an effect size twice as large as the humorous studies.
The possible replication of anxiety outcomes is plotted in Fig. 6. The results are inconclusive. All mean differences show a positive effect, but only two upper bounds of the confidence intervals stay below zero. Again, the vote counting does not translate to the analysis of the mean differences.

Subgroup 2: elderly people
In 21 different studies, a positive effect of laughter or humor on the mental and physical health of the elderly was found. Most studies classified 'elderly' as 65 years or older (Bains et al., 2014;Brodaty et al., 2014;Cha and Hong, 2013;Hsieh et al., 2015;Jung et al., 2009;Kim and Lee, 2012;Ko and Youn, 2011;Lee and Eun, 2011;Lee et al., 2013;Park, 2013;Song et al., 2013;Tse et al., 2010;Walter et al., 2007), some studies as 60 years or older (George and Jacob, 2014;Hirosaki et al., 2013;Konradt et al., 2013;Sohn, 2010, Shahidi et al., 2011), one study in this subgroup studied middle-aged women between 40 and 60 years old (Cha and Hong, 2013), and two studies had unspecified samples (Lee and Young, 2011;Song et al., 2011). In general, reported effects were: improved sleep quality, improved mood, increased life satisfaction, decreased depression and decreased pain. For most of these outcome measures, there are between one and three studies that investigated the effect. We chose to focus on the three most common outcome measures: stress, depression, and anxiety.
Depression. Eight different studies reported that laughter-inducing therapy significantly decreased depression in the elderly; two used 'spontaneous' (humorous) laughter, three used 'simulated' (non-humorous) laughter and three are unknown. (George and Jacob, 2014;Hsieh et al., 2015;Jung et al., 2009;Ko and Youn, 2011;Konradt et al., 2013;Lee and Eun, 2011;Lee et al., 2013;Young, 2011, Shahidi et al., 2011). In three studies, humor therapy did not have a significant effect on depression (Hirosaki et al., 2013;Low et al., 2013Low et al., , 2014Park, 2013). Two of these studies are the largest studies in this review (Low et al., 2013. Based on vote-counting only, most studies show that laughter-inducing therapy improves depression in the elderly (eight out of 11 studies). Next, Table 1 shows average effect sizes for humorous versus non-humorous therapies. For humorous therapies, the average effect size is: d dppc2 = 0.79 (d Korr = 0.73), while for non-humorous therapies it is: d dppc2 = 0.53 (d Korr = 0.52). The positive effect of laughter-inducing therapy on depression in the elderly and the larger effect size for non-humorous therapies are in accordance with the results for all studies and populations in subgroup 1.
Stress. The effect of laughter on cortisol is mixed. One study reported decreased cortisol in the elderly, which is reported to be beneficial for memory (Bains et al., 2014) versus two studies that did not (Cha and Hong, 2013;Hsieh et al., 2015). This is in accordance with the inconclusive cortisol results over all studies in subgroup 1.
Anxiety. Only one study determined laughter-inducing therapy to have no effect on anxiety in the elderly . No comparison could be made with other studies.     Table 3 Effect sizes and mean differences for studies with anxiety as outcome.  3.14 ± 0.6(n = 10) 1.63 ± 0.61(n = 10) Lee and Sohn (2010 f For RCT's mean diff at post-test, other designs difference between mean diff pre-post.

Subgroup 3: other important populations and outcome measures
Cancer patients. Eight studies reported that laughter therapy had a significant positive effect on the mental and physical health of cancer patients. More specifically, laughter therapy decreased perceived stress and improved mood (Choi et al., 2010;Farifeth et al., 2014;Kim et al., 2009Kim et al., , 2015. Laughter therapy also decreased anxiety (You and Choi, 2012;Han et al., 2011;Kim et al., 2009). Mixed results were found for pain and immunological response. Laughter therapy decreased pain in one study (You and Choi, 2012), but not in another one (Choi et al., 2010). The immunological response was improved in one study (Sakai et al., 2013), but no effect on immunity was found in Cho and Oh (2011). Improvement of mood and perceived stress in cancer patients seems convincing; with three out of four studies being a RCT and half of the criteria show mostly a low risk of bias versus an unclear risk of bias in the other half. For pain and immunological response, no clear indication can be given at this point.
Healthy adults. Thirteen studies reported a positive effect of laughter inducing-therapy on mental, social or physical health in (healthy) adults. Seven of these studies used 'spontaneous' laughter (Bennett et al., 2003;Berk et al., 2014;Buchowski et al., 2007;Lowis, 1997;Lee Fig. 3. Mean differences in depression (BDI) with 95% confidence intervals.  C.N. van der Wal andR.N. Kok Social Science &Medicine 232 (2019) 473-488 andJi, 2011;Park, 2010;Szabo, 2003;Szabo et al., 2005), three studies used 'simulated' laughter (Nagendra et al., 2007;Wagner et al., 2014;Yazdani et al., 2014) and two studies were unclear (Jung and Park, 2012;Oh et al., 2011). Findings report that laughter or humor can reduce stress and increase "Natural Killer cell activity" (Bennett et al., 2003), improve mental health (Park, 2010;Yazdani et al., 2014), increase coping humor (Lowis, 1997), have EEG correlates in beta and gamma bands , increase energy expenditure, activate abdominal and back-lifting muscles (Buchowski et al., 2007;Wagner et al., 2014) and improve mood and reduce anxiety or reduce stress (Szabo, 2003;Szabo et al., 2005;Jung and Park, 2012;Lee and Ji, 2011;Nagendra et al., 2007). Although Oh et al. (2011) found laughter therapy can decrease serum cortisol, there was no effect on coping and stress response in their study. The positive findings on stress reduction and energy expenditure can be convincing, given they come from lab studies and have low risk of bias; however, these studies used small sample sizes. The other results are hampered by medium to high risk of bias, mainly coming from non-randomization. Three studies determined the effect of humor on employees and found it can increase self-efficacy in employees (Beckman et al., 2007), increase enjoyment at work and reduces employees' stress levels  or increase positive mood and reduce blood pressure (Nagendra et al., 2007). These results are not very convincing because they either lack a control group (Beckman et al., 2007), had a high drop-out rate  or found no significant changes in stress with a stronger study design (Nagendra et al., 2007). Children and teenagers. Four studies reported that laughter significantly improved the social health of children and teenagers. More specifically, laughter improved social support and life satisfaction , improved self-efficacy and social competence (Koo, 2010), reduced stress and depression (Koo and Kim, 2013), and self-esteem and coping skills (Choi and Cho, 2011) of children and teenagers. For all four studies it is unknown what type of laughter was used. These results should be interpreted with caution due to the unclear risk of bias in three studies and the small sample size and non-randomization in the other study.
Physical health: pain. Nine studies determined the effect of laughterinducing therapy on pain. (Choi et al., 2010;Herschenhorn, 1995;Kessler et al., 2010Kessler et al., , 2012Kim et al., 2010;Ko and Youn, 2011;Lee and Eun, 2011;Tse et al., 2010;You and Choi, 2012;Yu and Kim, 2009). Six out of nine studies reported a decrease of pain after laughter-inducing therapy. Moreover, studies reported that laughter can improve fatigue in patients with arthrosis  or decrease anxiety, depression, pulse rate and blood pressure in military patients with low back pain ). One study did not find significant effects on quality of life, headache, fatigue, or general health in females with chronic migraines (Sahai-Srivastava et al., 2014). The relationship between laughter intensity and pain in rheumatoid women has also been investigated, but no results are reported in the article (Herschenhorn, 1995). Again, most of these findings are unconvincing due to the high risk of bias or small sample size.
Female health and wellbeing: postpartum stress & infertility. Three studies reported that laughter therapy can decreases infertility stress and anxiety and increases laughter (Jung and Park, 2012), improve the immunoresponse in postpartum women (Ryu et al., 2015) and decrease fatigue and serum cortisol levels in breastfeeding postpartum women (Shin et al., 2011). These studies are at high or unclear risk of bias and are therefore not very convincing.

Results summary systematic review
In conclusion, there is a reasonably convincing trend indicating 'simulated' (non-humorous) laughter having a more positive effect on depression and anxiety compared to 'spontaneous' (humorous) laughter. This is based on effect sizes twice as large for 'simulated' laughter than 'spontaneous' laughter-inducing therapies. Both for subgroup 1 (all populations), as well as in subgroup 2 (the elderly). Furthermore, laughter-inducing therapies seem to improve depression, and perceived stress. This is based on the replication of a decrease in depression (measured with GDS) and perceived stress (measured with differing instruments). The majority of the lower bounds of these confidence intervals stay below zero, indicating a positive effect, though this may not be clinically relevant. For cortisol levels, the results remain inconclusive. Finally, laughter-inducing therapy also seems to improve mood, perceived stress and depression in cancer patients, infertile women, adults and children. This conclusion should be interpreted with caution, as the systematic review included a high number of low quality studies (no control, small N or no results in the article due to conference abstract) and studies with a high risk of bias. Therefore, although a summary of these studies suggests an overall positive effect of these therapies, the methodological shortcomings of these studies make it hard to critically interpret the positive results. As most of the included studies in the systematic review used nonrandomized study designs, had very small sample sizes or were otherwise at high risk of bias, many of these positive results could be spurious findings or other statistical artefacts such as regression to the mean or nonspecific factors.
As a practical significance, laughter-inducing therapies seem to be applicable in a wide range of settings and for many different populations, including severe or terminally ill, disabled, healthy, employees, elderly, adults and children. Moreover, there are almost no contraindications, which makes it safe for many people to try. It seems to be feasible for practitioners to teach laughter-inducing therapies after following a laughter teacher training and when experienced in giving group therapy or other type of group sessions (e.g. improvisation, singing, dancing, sports, yoga, etc.) as the group dynamics seem to be important to stimulate and manage.

Meta-analysis
In total, 29 studies (Flow diagram 2) reported sufficient information, such as outcome statistics and number of participants, to be eligible for the meta-analysis, with a total number of n = 1986 participants (n = 976 in the intervention group and 1010 in the control group; on average n = 68 participants per study, range n = [20, 398]). Of these, a total of n = 894 participants were available from randomized controlled trial designs (n = 430 in the intervention group and n = 464 in the control group).
Of all mental health outcomes, depression, anxiety, and stress were reported most often in the included studies. Depression outcomes included both validated and commonly accepted measures of depression (e.g., BDI and GDS), but also ad-hoc self-report measures. We therefore report depressions outcomes both at a pooled level, and as a subgroup present results only from validated measures (see Table 4). For anxiety outcomes, too few validated measures were used to make this distinction. Too few studies -randomized or quasi-experimental -were available to pool effect sizes for stress-specific or quality of life outcomes.
Results are presented for depression and anxiety outcomes for all types of laughter-inducing therapies. Randomized versus quasi-experimental studies will be compared and integrated with the results found in the systematic review (qualitative synthesis). See Figs. 7 and 8 for forest plots of these results and Table 4 for detailed quantitative results.

Results summary meta-analysis
The results show that overall, laughter interventions show medium to large between-group effect sizes for depression and anxiety outcomes, which were the only outcomes that were reported often enough to be pooled meaningfully. The results are similar to those in the systematic review. Although the effect sizes are comparable to, e.g., those found for various psychotherapeutic interventions for depression -see e.g. (Barth et al., 2013) -this does not mean that laughter therapies are equally as effective as established therapies as these effect sizes cannot be compared directly.

Publication bias
Publication bias was assessed using Duval and Tweedie's trim and fill procedure. The procedure indicated considerable publication bias, imputing 10 missing studies. This lowered the pooled effect size from g = 0.47 (95% CI = 0.13, 0.81) to g = 0.28 (95% CI = 0.08, 0.53). However, in the presence of high heterogeneity as in this sample, the trim-and-fill procedure may give unreliable results (Peters et al., 2007). Additionally, both Egger's test (intercept = 3.67, p < 0.0001) and Begg and Mazumdar's test (Kendall's Tau = 0.51, p < 0.0001) were significant, also indicating publication bias. Therefore, the results of publication bias analysis must be interpreted with caution as they warrant no conclusion to the (non)existence of publication bias.

Strengths and limitations
To our knowledge, this review is the first to systematically retrieve and assess literature on laughter interventions. Systematic reviews and meta-analysis are by necessity limited by the quality of the included primary studies. We tried to overcome limitations by including quantitative measures in the systematic review to prevent categorizing results as either statistically significant or not. Vote-counting could show a positive image, while taking sample sizes, effect sizes and mean differences into account can show a different picture and provide a meaningful context with information on clinical relevance. Therefore we included quantitative measures when reported in the article.
In this review, limitations are the low quality of studies and high risk of bias in the included studies. Many studies in the meta-analysis had a very low sample size (average n = 68) with one distinctive outlier, a multi-site randomized controlled trial (Low et al., 2013;Low et al., 2014, n = 398). Twelve studies had 20 or fewer participants per condition, and they could perhaps more accurately be described as pilot studies unsuitable for reliable effect size estimations (Kraemer et al., 2006). This limits the conclusions drawn from the systematic review, where generally favorable results were found; and in the meta-analytic results, this is reflected in the large confidence intervals around the estimated pooled effect size and the high heterogeneity. Moreover, the results of the systematic review rely for the largest part on nonrandomized studies. This means that the results of these studiesmostly statistically significant -could be the result of nonspecific treatment effects, spurious findings, regression to the mean or noncorrected multiple testing, suboptimal analytic strategies, and all the other inherent limitations of nonrandomized studies.
None of the included trials were pre-registered in a publicly accessible trial registry, as required by the declaration of Helsinki and recommended by the ICMJE guidelines, which made it impossible to check if outcome measures were omitted or switched. Some studies had mixed therapies, which make it difficult to conclude whether the results found are attributable to the laughter or humor part of the therapy, or to nonspecific treatment effects. Dismantling studies could provide more insight into this. Moreover, a number of outliers in the metaanalysis reported improbably high effect sizes, which could have resulted in the distortion of the effect size estimates (e.g., Kim, 2010;g = 4.19; however, the very small sample size (n = 24) attenuates its effects on the analyses). Conversely, the study of Low et al. (2013) in the meta-analysis could be considered a negative outlier, as its effect size was very low but its sample size very high. Almost all pooled outcomes showed high heterogeneity, even in subgroup analyses. This can be explained by a large variability between studies, especially the 'simulated' (non-humorous) laughter studies. There were many non-English journal articles of studies performed in Asian countries, which could lead to a language bias in the results, although much data could be extracted from non-English papers using Google Translate. We would like to suggest to make English reports available as well. Finally, C.N. van der Wal and R.N. Kok Social Science & Medicine 232 (2019) 473-488 even though great care was put into search terms and covering the field of laughter and humor therapies, we decided to include the databases most commonly used for meta-analysis research of health outcomes: WOS, EBSCO, EMBASE, and PUBMED. Additionally, there is lack of shared vocabulary among researchers, which leads to a wide range of phrases, terms, and definitions used; in turn this suggests a research field that is yet to reach maturity.

Practical implications
Laughter-inducing therapies could be a valuable complementary or in some cases even main therapy in different settings, but the lack of high-quality studies currently preclude recommending its use in clinical practice on anything other than a complementary, patient-preference basis. The results of the systematic review suggest that laughter therapies are acceptable in a wide range of settings, and for a broad range of patient groups; this is a good starting point for further research. From the included studies it can be concluded that it requires a trainer that can give the therapy in group sessions. The participants and staff can practice laughter on their own as well, between sessions. From multiple studies, it also seems acceptable to use in serious or terminal conditions, such as cancer patients waiting for chemotherapy or terminally ill patients (Farifeth et al., 2014;Kessler et al., 2010Kessler et al., , 2012Kim et al., 2015). Laughter-inducing therapy can also be applied in populations with reduced mobility as it can be done lying or sitting (watching humorous videos or doing laughter exercises). Especially non-humorous laughter could be used well in elderly or cognitively impaired populations, as this type does not rely on, e.g., verbal skills such as wordplay.
Laughter has shown to produce effects on many levels: emotional, psychological, behavioral, and biological. Laughter-inducing therapies show promise as an addition to main therapies or medication, and should be investigated further, especially as an adjunct therapy for somatic diseases with a psychological component such as depression or anxiety. Furthermore, to our knowledge, there are few imaginable contraindications, which makes it universally safe to practice. The social aspect of laughing together seems to be an advantage to alleviate societal problems such as loneliness, bullying at schools, or aggression in general. We can imagine the intensity and therefore effectiveness of laughter during sessions might depend on the experience of the teacher or therapist, and group dynamics. However, more research is needed to establish the importance of a possible 'therapeutic alliance' as is commonly measured in psychotherapeutic interventions.
We encourage practitioners to experiment with and test laughterinducing therapies, and suggest co-creation activities to help develop and protocolize these therapies. For example, in elderly care homes, there could be a discussion with the residents to assess their preferences. Our results show that humor is not necessary, and that 'laughing about nothing' seems to work as well, so taste or sense of humor is not relevant, although some might initially feel embarrassed laughing about 'nothing' in a group.
In the context of other rising health care costs and the increasing elderly population, there is a potential for low-cost, simple interventions that can be administered by staff with minimal training. Unfortunately, the seemingly great potential of laughter-inducing therapies has not yet materialized, perhaps partly due to a lack of evidence of effectiveness. Should effectiveness be shown, how can laughter be implemented in healthcare? We recommend 'simulated' (non-humorous) laughter instead of 'spontaneous' (humorous) laughter, as the systematic review shows a more consistent effect and a higher effect size for 'simulated' (non-humorous) laughter. Furthermore, 'simulated' laughter does not need any cognition (e.g. no necessity of understanding a joke or funny story) and could be performed seated or in a hospital bed. Laughter-inducing therapies can be performed in group and private settings, preferably with a teacher experienced in group dynamics and laughter exercises.

Theoretical and research implications
Although the evidence for laughter-inducing therapies is growing, methodological rigor is still lacking and the quality of evidence is either low or very low. Future research should focus on adequately powered, pre-registered randomized controlled interventions with large samples, longer therapy durations (e.g. 10 or more sessions over more than 5 weeks) and with follow-up measurements. Also, an intervention where the level or 'dosage' of laughter can be measured should be created. Some efforts in this direction have been made, for example, Mora-Ripoll (2011) suggests the diaphragm electromyogram as an exact measurement of laughter. We also see possibilities for a web-or smartphonebased intervention where laughter is measured with facial recognition and voice analysis using integrated smartphone sensors. Great advances have been made recently in automatically recognizing (genuine) laughter through social signals processing and face and voice recognition (Dibeklioğlu et al., 2015;Dupont et al., 2016), and this could provide a useful measure of whether the posited positive effects of the therapies are indeed mediated by laughter itself.
We propose that diverse laughter and humor trainers come together to find a consensus on what guidelines for laughter-inducing therapies should be created, and whether a protocol for a 'standard' laughter therapy treatment can be established. Such a protocol should include information on which exercises should be included in training, how many sessions are needed to derive benefit, how long these sessions should be, and other basic information. Furthermore, it is unclear what the necessary qualification for teachers or therapists would be, and what the 'minimal' dose per session should be. Standardization of treatments would make it easier to independently reproduce results and to systematically research and improve the therapies. This can then be validated in large-scale randomized controlled trials.

Conclusions
The aim was to systematically estimate the efficacy or effectiveness of laughter-inducing therapies. This was done by conducting a systematic review and meta-analysis on the effects of 'spontaneous' (humorous) and 'simulated' (non-humorous) laughter-inducing therapies on physical and mental health outcomes. Firstly, the systematic review found there was a reasonably convincing trend indicating 'simulated' (non-humorous) laughter has a more positive effect on depression and anxiety compared to 'spontaneous' (humorous) laughter. This was based on effect sizes twice as large for non-humorous therapies than humorous therapies. Secondly, in the systematic review as well as the meta-analysis, a similar pattern was found for the laughter-inducing studies and their effect on depression. Both found a positive effect on depression, with similar medium effect sizes ranging between: Hedges' g [0.48,0.65] or d ppc2 [0.51, 0.58]. The systematic review also found a convincing replication of depression outcomes (measured with the GDS). Furthermore, laughter-inducing therapies seem to improve perceived stress. The systematic review showed that cortisol levels and pain could potentially be reduced by laughter-inducing therapy, but results remain inconclusive. Although a summary of these studies suggests an overall positive effect of these therapies, the methodological shortcomings of these studies make it hard to critically interpret the summary positive results. The systematic review included a high number of low-quality studies (no control, small n or no results in conference abstracts), and studies with a high risk of bias.
Future directions for this research field are: performing randomized, pre-registered, controlled trials of some standardized form of laughter therapy with sufficient sample sizes; ideally testing the cost-effectiveness of laughter-inducing therapies in clinical settings with trained staff, developing interventions where the precise 'dosage' of laughter can be measured and managed and investigating the most effective (and minimal) number and recurrence of therapy sessions.

Author contributions
Study conception and design: NW. Acquisition of data: NW and RK. Analysis and interpretation of qualitative data: NW. Analysis and interpretation of quantitative data: NW and RK. Drafting of manuscript: NW and RK. Critical revision: NW and RK.

Funding
Part of the research has been performed under the European Union's Horizon 2020 research and innovation programme: Marie Sklodowska-Curie grant agreement No 748647. No specific other funding was obtained for this manuscript. The authors declare no conflicts of interest, financial or otherwise.