Secondary transmission of SARS-CoV-2 during the first two waves in Japan: Demographic characteristics and overdispersion

Objectives Super-spreading events caused by overdispersed secondary transmission are crucial in the transmission of COVID-19. However, the exact level of overdispersion, demographics, and other factors associated with secondary transmission remain elusive. In this study, we aimed to elucidate the frequency and patterns of secondary transmission of SARS-CoV-2 in Japan. Methods We analyzed 16,471 cases between January 2020 and August 2020. We generated the number of secondary cases distribution and estimated the dispersion parameter (k) by fitting the negative binomial distribution in each phase. The frequencies of the secondary transmission were compared by demographic and clinical characteristics, calculating the odds ratio using logistic regression models. Results We observed that 76.7% of the primary cases did not generate secondary cases with an estimated dispersion parameter k of 0.23. The demographic patterns of primary-secondary cases differed between phases, with 20–69 years being the predominant age group. There were higher proportions of secondary transmissions among older individuals, symptomatic patients, and patients with 2 days or more between onset and confirmation. Conclusions The study showed the estimation of the frequency of secondary transmission of SARS-CoV-2 and the characteristics of people who generated the secondary transmission.


Introduction
Epidemiologic findings on COVID-19 have accumulated at an unprecedented rate. A key finding is the heterogeneity in the num-ber of secondary transmissions, which is often characterized as overdispersion. Super-spreading events are essential factors contributing to the sustained transmission of SARS-CoV-2 ( Endo et al., 2020 ;Xu et al., 2020 ) similar to severe acute respiratory syndrome (SARS) ( Leo et al., 2003 ) and other infectious diseases ( Kucharski and Althaus, 2015 ;Lau et al., 2017 ;Wong et al., 2015 ). A study in Hong Kong revealed that only 19% of infected persons with COVID-19 generated 80% of all transmissions. Contrastingly, 69% of cases did not lead to secondary transmissions, indicating substantial transmission heterogeneity ( Adam et al., 2020 ). In addition, a study in Korea reported temporal changes in the level of transmission overdispersion during different epidemic periods ( Lim et al., 2021 ). A study conducted in Japan at the beginning of the COVID-19 outbreak, which examined 110 cases, including 11 clusters, identified that most infected individuals did not generate any secondary transmission ( Nishiura et al., 2020b ). The characteristics of clusters at the early stage of the outbreak were also analyzed in Japan ( Furuse et al., 2020b ). On the basis of these findings, Japan developed a cluster-based approach to COVID-19 management that focuses on identifying and preventing clusters (superspreading events) to suppress transmission ( Oshitani, 2020 ).  vaccinations are rapidly progressing in many countries, and real-world data indicate that the impact of COVID-19 could be significantly reduced by vaccination ( Haas et al., 2021 ). However, reduced vaccine efficacy against the delta variant ( Lopez Bernal et al., 2021 ) and waning of vaccine immunity resulting in breakthrough infections have recently been reported ( Rosenberg et al., 2021 ). Furthermore, vaccine coverage remains low in many countries, especially in the low-and middleincome countries ( Ritchie et al., 2021 ). Therefore, developing more effective public health measures to suppress transmission is critical. To achieve this, the exact level of overdispersion and the demographic characteristics that are likely to generate secondary transmissions need to be defined. It is also necessary to identify whether the level of overdispersion changes during the different phases of COVID-19 transmission.
This study aimed to analyze the frequency and demographic patterns of secondary transmission of SARS-CoV-2 during the first 2 waves in Japan using the data of individual cases available from local governments in Japan. We also analyzed the characteristics of cases that generated secondary transmissions to define the factors associated with secondary transmission.

Data collection
The Japanese government included COVID-19 as a designated infectious disease on January 2020. As a consequence, physicians have a legal mandate to report all confirmed cases to local public health centers ( Furuse et al., 2020a ). Public health nurses in public health centers interview confirmed cases on the basis of the National Institute of Infectious Diseases guidelines ( Imamura et al., 2021a ). They collect demographic data, clinical information, history of high-risk activities such as traveling to affected areas, contact with confirmed COVID-19 cases, and visits to high-risk venues using standardized forms. These data are collected for the 14 days before the onset of symptoms. Local governments compile the data on all confirmed cases, and most release this information online on a daily basis. The data are different between local governments, but most local governments include age, gender, identification number, and epidemiologic link. We collected and collated these data for the analysis.

Definition of phases
We analyzed the data of confirmed cases between January 15, 2020 and August 31, 2020. This period included 2 waves, which were divided into 4 phases on the basis of the epidemic curve, namely the increasing (phase 1: January 15 to April 13) and decreasing (phase 2: April 14 to May 24) phases of the first wave and the increasing (phase 3: May 25-to August 6) and decreasing (phase 4: August 7 to August 31) phases of the second wave ( Figure 1 ). The levels of interventions differed between the phases. During phase 1, the government implemented limited control measures, such as school closures and cancellations of mass gathering events. In response to the surge in cases, the government declared a state of emergency for 7 prefectures, including Tokyo and Osaka, on April 7, 2020, and extended it to all prefectures on April 16, 2020. During the state of emergency, the government requested people to stay at home unless necessary, and people's compliance was high during this period until the state of emergency was lifted from all prefectures on May 25, 2020 (phase 2). The number of newly confirmed cases dropped in May; however, the transmission of infection continued in nightlife areas in metropolitan cities such as Tokyo. The number of cases increased in June and July until it peaked on August 7, 2020 (phase 3) ( Nagata et al., 2021 ). Although the government did not declare another state of emergency at that time, specific interventions targeting nightlife areas were implemented, and newly confirmed cases decreased until August 31, 2020 (phase 4).

Definition of primary and secondary cases
We defined the primary and secondary cases as follows: a primary case had the earliest date of onset among epidemiologically linked cases if the duration between the onset dates of primary and secondary cases was less than 15 days based on the serial interval reported in previous studies ( Nishiura et al., 2020a ). If the onset date was unavailable or the case had no symptoms, we used the confirmation date or reporting date instead of the onset date. If 2 or more cases had the same earliest onset date, we defined the case with an earlier confirmed date as the primary case. For clusters comprising several cases with primary exposure reported at a common event or venue, we considered the case that had the earliest onset date as a primary case and other cases as secondary cases if the onset date of the cases was ≤7 days after the onset of the primary case. If the onset date was unknown or the cases in a cluster were asymptomatic, we regarded those cases as secondary cases if the confirmed date was ≤7 days after the confirmation of the primary case.

Inclusion and exclusion criteria
We analyzed the frequency of secondary transmission in primary cases reported between January 15, 2020, and August 31, 2020. However, we included those cases identified as secondary cases in September 2020 if their corresponding primary cases occurred by August 31, 2020.
We only included primary cases with no identified exposure to calculate the frequency of secondary transmission for 2 reasons. First, it was difficult to identify the primary and secondary cases for the cases in a cluster that could have had more than 1 generation of transmission. Second, individuals with a contact history with the previously identified cases were likely to have practiced precautionary measures, such as self-quarantine, before testing positive for COVID-19.
Although there was a certain degree of consistency in the data released by the local governments, some variations were observed, especially regarding the contact history, which was primarily a result of privacy concerns. Thus, restrictions on prefectures and phases were imposed on the basis of contact history reporting details. Specifically, we used 2 criteria to evaluate the level of the released contact history: 1) the proportion of cases with an identified source of infection (cut-off 1) and 2) the proportion of cases with a disclosed source of infection among cases with an identified source of infection (cut-off 2). We used ≥25% for cut-off 1 and ≥75% for cut-off 2 for the principal analysis (Supplementary Figure 1). We also analyzed the data using different cut-off values for the sensitivity analysis. We also excluded cases of international arrivals to avoid underestimating the secondary transmission fre- quency, considering the requirement of 14-day quarantine after arrival.

Statistical analysis
We analyzed the observed distribution of secondary cases by calculating the number of secondary cases per primary case by fitting a negative binomial distribution by maximum likelihood. We calculated the reproductive number ( R ) from the mean of the negative binomial distribution fit to the observed distribution of secondary cases and the degree of transmission heterogeneity from the corresponding dispersion parameter ( k ) as described elsewhere ( Lloyd-Smith et al., 2005 ). Furthermore, we employed Markov chain Monte Carlo method to obtain the joint density estimates and marginal density estimates for R and k .
We analyzed the number of secondary cases stratified by phase, age, gender, presence of symptoms at confirmation, and days from onset to confirmation. We calculated the odds ratios (ORs) by 1) comparing the proportion of cases that caused secondary cases with that of cases that did not cause secondary transmission and 2) comparing the proportion of cases that caused 1-4 secondary cases with that of cases that generated ≥5 secondary cases by a logistic regression model. The cut-off was ≥5 because superspreading events are generally considered to be transmission to more than 4-6 people. In each stratified group, the group with the largest number of cases was selected as the reference group, such as the group that belonged to phase 3, group of those aged 20-29 years, group comprising men, group of those who were symptomatic at confirmation, and group of those had 2 days between onset and confirmation. All analyses were performed using R (R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ ).

Overdispersion of secondary transmission
We collected data on 67,761 confirmed COVID-19 cases in Japan between January 15, 2020, and August 31, 2020, from local government websites, including demographic and epidemiologic information. Among them, 46,481 cases had no identified exposure. From these 46,481 cases, we excluded the 30,010 cases reported by prefectures during the specific phases in which the contact history information was limited. The remaining 16,471 cases were included as primary cases for the principal analysis of secondary transmission. We found that 76.7% (12,638/16,471) of the cases did not generate secondary transmissions ( Figure 2 ). Contrastingly, 21.9% (3,604/16,471) of the cases generated 1-4 secondary transmissions, and 1.4% (229/16,471) generated ≥5 secondary transmissions. We fitted the negative binomial distribution and obtained an R of 0.47 (95% CI, 0.45-0.49) and k of 0.23 (95% CI, 0.22-0.25). The expected proportion of cases accounting for 80% of all COVID-19 transmission was 13.3% (95% CI, 12.8%-13.9%). While k was low for all age groups throughout the entire period, R varied for each age group in each phase. Particularly, the R for those aged 0-19 years was low in phases 1 and 2 (0.24 and 0.20, respectively) before increasing in phase 3 (0.45). In individuals aged ≥70 years, R fell from phase 1 to phase 2 (from 0.83 to 0.39) and increased from phase 3 to phase 4 (from 0.49 to 0.70) ( Table 1 ). When we excluded cases associated with health care and other facilities, k increased in all phases except for phase 3, and the R decreased, particularly in those aged ≥70 years. However, the change was minimal in phase 3, suggesting a limited contribution of health care and other facilities in phase 3. The overall patterns of R and k in each phase were the same after excluding cases associated with health care and other facilities (Supplementary Table 2). For the sensitivity analysis, we calculated k using different inclusion criteria and found that it was not higher than 0.5 in all phases under different criteria (Supplementary Table 3).

Changes in demographic patterns of primary and secondary cases
The demographic characteristics of the primary and secondary cases differed significantly by phase ( Figure 3 ). In phase 1, the most common transmissions were from males aged 50-59 years to females in the same age group and between males aged 20-29 years. In phase 2, the most common transmissions were also from males aged 50-59 years to females in the same age group; however, transmissions from males aged 60-69 years to females in the same age group increased. In phase 3, transmission between males aged 20-29 years was significant. In phase 4, the proportion of transmissions from various age groups, especially from the older adults corresponding to both genders, increased. When age and gender were analyzed separately, most of the primary cases were identified between those aged 20-39 years and those aged 40-69 years. However, the proportion of patients aged 40-69 years was the highest in phases 1 and 2 (55.0% and 58.0%, respectively), whereas the proportion of patients aged 20-39 years was the highest in phase 3 (57.4%). In phase 4, the proportions of those aged 20-39 years and 40-69 years were approximately the same (38.2% and 38.3%, respectively). Regarding gender, males were more likely to be primary cases throughout the entire period (60.7%) (Supplementary Tables 4 and 5).

Characteristics of cases generating secondary transmissions
Compared with phase 3, the proportion of cases generating secondary transmissions was lower in other phases, whereas there was no significant difference in the proportion of cases generating ≥5 transmissions between phases 1 and 3 ( Figure 4 , Supplement Table 7). Those aged 40-49 to 80-89 years were more likely to cause a secondary transmission than those aged 20-29 years. However, there was no significant difference in transmission to 5 or more people between the age groups. Children aged 0-9 years were less likely to generate secondary transmissions but were more likely to generate ≥5 secondary transmissions; how- Table 1 The estimated effective reproductive number ( R ), overdispersion parameter ( k ), proportion of infectious cases responsible for 80% of all cases, and proportion of cases that did not generate any secondary transmissions stratified by the phase and age group. ever, this finding was not statistically significant. If cases had no symptoms at confirmation, they were less likely to generate secondary transmissions. Cases diagnosed within 2 days from onset were significantly less likely to generate secondary transmissions, although the likelihood of generating ≥5 secondary transmissions was lower in those diagnosed later. We did not find any genderrelated differences in the frequency of generating secondary transmission.

Discussion
We found that just over three-quarters of the included cases did not generate any secondary transmissions during the entire study period ( Table 1 ), which is in line with previous reports from other countries ( Adam et al., 2020 ;Laxminarayan et al., 2020 ;Sun et al., 2021 ). All sensitivity analyses showed relatively low values of k ( < 0.5) in all phases, suggesting a high possibility of extinction for most transmission chains ( Lloyd-Smith et al., 2005 ). In contrast, a study in Korea showed that k was larger in the later epidemic period ( Lim et al., 2021 ). There are 2 possible reasons for the difference between Japan and Korea. First, in Japan, cluster-based approach was implemented from the beginning and reflected k in all phases. Second, in Korea, religious gatherings, which may have contributed to the overdispersion in the early stages, were banned in later phases. We believe that Japan's cluster-based approach focusing on super-spreading can suppress transmission because most infected individuals do not contribute to transmission, and sustained transmission is unlikely to occur without super-spreading events ( Endo et al., 2020 ) ( Sneppen et al., 2021 ).
Analysis showed that R was estimated to be < 1 for all periods, even during the increase phase (phase 1 and 3). This is inconsistent with the effective reproduction number estimated from the epidemic curve because our estimation indicates how many secondary transmissions were observed on average from the primary cases with no identified exposure.
Our data indicated that COVID-19 transmission in Japan during the first 2 waves was driven mainly by individuals aged 20-69 years; 80% of primary cases causing secondary transmission belonged to these age groups. However, the predominant age group comprising primary cases shifted from middle-aged adults (40-69 years) in phase 1 to younger adults (20-39 years) in phase 3. Studies in other countries have shown a similar demographic transition ( Monod et al., 2021 ;Oster et al., 2020 ). Although the proportion of primary cases generating secondary transmissions was highest among those aged 20-39 years, secondary transmission was more common in those aged 40-89 years ( Figure 4 and Supplementary Table 6). A higher rate of secondary transmission in older age groups has been reported in other countries, which may be due to the high viral load in these age groups ( Hu et al., 2021 ;Jones et al., 2021 ;To et al., 2020 ). In addition, we found that the absence of symptoms at the confirmation was associated with a low secondary transmission rate, as reported in other studies ( Buitrago-Garcia et al., 2020 ;Heavey et al., 2020 ;Sayampanathan et al., 2021 ). People in older age groups are more likely to develop typical clinical symptoms, such as fever and cough ( Davies et al., 2020 ). This may be another possible reason for higher secondary transmission in older age groups, as suggested by a modeling study ( Chen et al., 2021 ).
Our data showed that most transmissions occurred between younger age groups in earlier phases of each wave (phases 1 and 3) and that increased transmission between different age groups occurred in later stages (phases 2 and 4). This resulted in more cases among older individuals ( ≥70 years) during the later stages (phases 2 and 4).
Our data also indicated that children aged 0-9 years comprised only a small proportion of primary cases (4%). However, R for those aged 0-19 years was higher in phases 3 and 4 than in phases 1 and 2, indicating that the transmission from this age group increased during this period. These changes were possibly caused by the reopening of schools in phase 3 after the universal closure of schools during almost all of phases 1 and 2. However, our data showed Figure 3. Relationship between primary and secondary cases by age group and gender for each phase. The color scale shows the proportion of observed total transmission pairs in each phase. Note that the color scales are different between phases 1, 2, and 4 and phases 3. F, female; M, male. that children did not play a major role in increasing community transmission even after schools reopened. We also found that secondary transmission was less common in children aged 0-9 years. A low transmission rate of COVID-19 in children has been reported in our previous study  and other studies ( Zhu et al., 2021 ). These data are consistent with other studies that have not found children to be major drivers of community transmission of SARS-CoV-2 ( Davies et al., 2020 ).
Identifying and monitoring the age group that principally drives transmission is vital to implement more specific interventions. Although our data clearly indicated that individuals aged 20-39 years had a predominant role in transmission during the second wave in Japan, implementing effective measures targeting these young adults is challenging; they are less likely to develop typical or severe symptom ( Davies et al., 2020 ), which may result in underdiagnosis. Notably, some seroepidemiologic studies have shown that children and young adults have greater under-ascertainment rates ( Havers et al., 2020 ;Pollán et al., 2020 ;Yoshiyama et al., 2021 ). Moreover, implementing health promotion strategies to induce behavioral change is particularly challenging in this age group ( Kim and Crimmins, 2020 ), and data from other countries indicate that it is also difficult to achieve high vaccine coverage in this age group ( Murphy et al., 2021 ). Our data also showed that males, particularly those aged 20-59 years, had a more significant role in transmission even though there was no observed difference in the frequency of generating secondary transmission by gender.
The dominant role of males in COVID-19 transmission has been reported in other studies ( Galasso et al., 2020 ); biological, social, and behavioral factors likely contribute to this difference ( Capraro and Barcelo, 2020 ). There were more transmissions to the older individuals from younger age groups as well as older-to-older transmission in the later phases of each wave. This is consistent with a previous study that reported community cluster outbreaks among the older adults in Japan ( Furuse et al., 2021 ). Because most severe . OR and aOR by comparing between a) the proportion of cases that generated ≥1 secondary transmissions and that of cases with no secondary transmission and comparing between b) the proportion of cases that generated ≥5 secondary transmissions and that of cases that generated 1-4 secondary transmissions. ORs were calculated between phases, age group, gender, presence of symptoms, and days from onset to confirmation, with phase 3, those aged 20-29 years, male gender, symptomatic at confirmation, and within 2 days from onset to confirmation as the reference for each category. OR, odds ratio; aOR, adjusted odds ratio; adjusted for phase, age, presence of symptoms at confirmation, and days between onset and confirmation. cases occur in the older people ( Driscoll et al., 2021 ), interventions focusing on preventing infection in older age groups are essential to minimize the impact of COVID-19.
When cases were detected within 2 days after illness onset, they were less likely to cause secondary transmission ( Figure 4 ). A previous study showed that infectiousness peaked around illness onset  and that the possibility of SARS-CoV-2 isolation decreased a few days after illness onset ( Wölfel et al., 2020 ). Our data indicate that the early detection and isolation of cases are useful in suppressing transmission, as reported previously ( Pung et al., 2020 ). However, delays in case isolation after 5 days of illness onset would not substantially increase the chance of secondary transmission.
Our study has several limitations. First, because we used contact tracing data on public domain, case, contact, and cluster ascertainment might have affected the results. In particular, before May 29, 2020, testing was only done when a close contact presented symptoms; later, testing of all close contacts was recommended regardless of whether they presented symptoms. Therefore, it is possible that transmission pairs were underestimated especially during phase 1 and 2 in our study. In contrast, during phases 3 and 4, with high case numbers, there may have been underestimation owing to the reduction in contact tracing caused by the increased workload of public health centers. Under-reporting generally leads to an underestimation of R but overestimation of k , especially when super-spreading events are missed ( Blumberg and Lloyd-Smith, 2013 ;Endo et al., 2020 ). Thus, the overdispersion of secondary transmission would remain unchanged even after accounting for the effects of these biases. Second, the data released by the local governments were inconsistent, particularly the data on close contacts, and might be subject to certain biases depending on the quality of the interviews. We addressed this as much as possible by limiting the cases to those from municipalities where the quality of data on epidemiologic links was considered to maintain a certain level. Still, under-ascertainment of transmission chains was likely to exist. In this case, as mentioned earlier, R is generally considered to be underestimated, and k is overestimated. In addition, we excluded many cases from prefectures without sufficient information. In particular, all cases in Tokyo were excluded; however, other prefectures with large populations, such as Aichi, Osaka, and Fukuoka, and many small prefectures were included (Supplementary Table 1, Supplementary Figure 4), suggesting that our study has some generalizability to all of Japan. Third, primary and secondary transmissions decided by our criteria may not reflect valid infector-infectee pairs. However, if we had used different definitions of primary and secondary transmissions with different inclusion criteria, the results would not have changed substantially. The consistency of our findings regarding the multiple sensitivity analyses strengthens the robustness of this study. Fourth, because our study period included only the first and second waves of the outbreak in Japan, it does not reflect the transmission characteristics of newly emerged variants of SARS-CoV-2, such as the alpha and delta variants. The new variants are now predominant in the community transmission of SARS-CoV-2 in Japan, potentially changing transmission patterns from those observed in this study. A Korean study reported no significant difference in delta compared with pre-delta in terms of the likelihood of super-spreading events ( Ryu et al., 2022 ). However, there is still limited empirical evidence using country-level data on the delta variant, and future studies are warranted.
In conclusion, this study estimates the frequency of secondary transmission of SARS-CoV-2 in Japan, and its overdispersion was observed in the whole period up to the first and second waves. We also revealed the temporal changes in the predominant population in the transmission chain and the characteristics of people who generated the secondary transmissions; further interventions should be performed on the basis of these characteristics.

Conflict of Interest
The authors declare no competing interests.

Ethical Approval statement
This study did not require ethical approval because the data were public, anonymous, and collected as part of the outbreak response.

Funding Source
The work was supported in part by the Emerging/Re-emerging Infectious Diseases Project of Japan from the Japan Agency for Medical Research and Development, AMED under Grant Number JP19fk0108104 . The work was also supported by the Health, Labour and Welfare Policy Research Grants, Research on Emerging and Re-emerging Infectious Diseases and Immunization (Grant number 20HA2007), and Japanese Society for the Promorion of Science (JSPS) KAKENHI Grant Number JP21K19624. Y.K.K. received a scholarship from Takeda Science Foundation.