Estimation of epidemiological parameters and ascertainment rate from early transmission of COVID-19 across Africa

Country reported case counts suggested a slow spread of SARS-CoV-2 in the initial phase of the COVID-19 pandemic in Africa. Owing to inadequate public awareness, unestablished monitoring practices, limited testing and stigmas, there might exist extensive under-ascertainment of the true number of cases, especially at the beginning of the novel epidemic. We developed a compartmentalized epidemiological model to track the early epidemics in 54 African countries. Data on the reported cumulative number of cases and daily confirmed cases were used to fit the model for the time period with no or little massive national interventions yet in each country. We estimated that the mean basic reproduction number is 2.02 (s.d. 0.7), with a range between 1.12 (Zambia) and 3.64 (Nigeria). The mean overall report rate was estimated to be 5.37% (s.d. 5.71%), with the highest 30.41% in Libya and the lowest 0.02% in São Tomé and Príncipe. An average of 5.46% (s.d. 6.4%) of all infected cases were severe cases and 66.74% (s.d. 17.28%) were asymptomatic ones. The estimated low reporting rates in Africa suggested a clear need for improved reporting and surveillance systems in these countries.


Introduction
The still ongoing coronavirus disease 2019 (COVID-19) pandemic, caused by the severe acute respiratory syndrome-related coronavirus type 2 (SARS-CoV-2) 1 , has first emerged in late December 2019 in the city of Wuhan, province of Hubei, mainland China 2 .The virus rapidly spread to the neighboring cities and provinces of China, and thereafter was reported in Europe, America, as well as in other continents.As a result, the Director-General of the World Health Organization (WHO) declared COVID-19 initially as a public health emergency of international concern (PHEIC) on January 30, 2021 3 , and subsequently, as a global pandemic on March 11, 2020 4 .As of May 5, 2022, the COVID-19 pandemic is affecting more than 220 countries and territories, with a dramatically high toll of infections (more than 513 million cases and 6 million deaths) 5 .
However, the COVID-19 spread has been and still is, quite uneven.Since the report of the first case of COVID-19 in Egypt on Feb 14, 2020 6 , the increase of new infections in African countries has been relatively modest compared to the rest of the globe, suggesting a slow spread of SARS-CoV-2 in Africa [7][8][9] .By May 5, 2022, forty-seven countries in the WHO African region have been affected with more than 8 million cases and over 170,000 deaths 10 , significantly lower compared to over 216 million cases and 1.9 million deaths in Europe, 153 million cases and 2.7 million deaths in Americas and 57 million cases and 700,000 deaths in South-East Asia 5 .Some features of SARS-CoV-2 itself make it challenging for case detection including the existence of asymptomatic infections which are nonetheless capable of transmitting the pathogen 11,12 .Moreover, research suggested that a smaller fraction of clinical symptoms are manifested among the young population 13 .Therefore for countries with younger age structures like those in Africa (life expectancy of sub-Saharan Africa in 2019 is 61.63 years, compared to 75.40 years for East Asia, 73.94 for Europe and Central Asia and 73.78 for North America and Middle East 14 ), more cases tend to go undetected.Nevertheless, given limited testing and public health resources, inadequate public awareness, cultural stigmatization, self-medication and the use of complementary/alternative medicine, or a not yet well established monitoring practice in the initial outbreak on the continent, the true number of cases might have been largely under-estimated.According to WHO, only one in seven cases was being detected in Africa 15 .
The COVID-19 transmission in Africa has been studied in several aspects: the basic reproduction number in the initial phase was estimated for selected countries in Africa using a mechanistic model with a Bayesian inference framework 16 , phenomenological models 17 , and exponential growth rate 18 , time-varying effective reproduction number and infection attack rate were also estimated 19 , the effect of different lockdown and control strategies on COVID-19 transmission in African and West African countries was assessed 20,21 , the spatio-temporal dynamics of COVID-19 within the first 62 days on the continent 22 was investigated and the preparedness and vulnerability of African countries against imported cases were evaluated 23 .There have also been estimations of under-ascertainment of COVID-19 infections in many locations around the world by various methods [24][25][26][27] .However, estimation and description for and among African countries remains lacking.
In this study, we reconstructed the initial transmission of SARS-CoV-2 in 54 African countries when there are no major mitigation interventions using a deterministic mathematical model accounting for under-reporting and also various levels of severity of infections.Reporting rates and critical quantities characterizing COVID-19 transmission within each country were estimated which constitute an epidemic profile for Africa for the initial stage of COVID-19.The dynamics of COVID-19 in the early stage when major mitigation measures were not in place, can be characterized by a simple susceptible-exposed-infectious-recovered (SEIR) model with the infected state further stratified into three compartments: severely infected (I s ), mildly infected (I m ) and asymptomatically infected (A).The mathematical model we developed to describe the initial dynamics of COVID-19 with under-reporting is thus based on the augmented SEIR model presented in (1)  with the corresponding illustration diagram depicted in Figure 1.

S E E I m E I
In ( 1), S, E, I s , I m , A, and R are numbers of susceptible, exposed, severely infected, mildly infected, asymptomatically infected, and removed individuals.We assumed that exposed people do not transmit the pathogen [28][29][30] , and that severely infected individuals are not so critical as to be refrained from community activities due to hospitalization or isolation but carry a higher virus load.Therefore people in I s , I m and A are all capable of contracting the disease but with different infectiousness levels.These levels were specified by ρ s and ρ a , denoting an increased and a decreased infectiousness of severely and asymptomatically infectious individuals compared to mildly infected ones.A fraction p a of exposed individuals was assumed to develop into asymptomatic infections, and among the rest, a further fraction p s will develop into severe infections while the remaining become mildly infected.For the rest of the parameters, N is the constant total population size; β is the per contact transmission probability of mild cases; σ is the disease progression rate from latency to infectiousness with 1/σ being the mean incubation duration; γ s , γ m , and γ a are recovering rates of different types of cases with the reciprocals equal to the corresponding mean infectious periods.It was also assumed that the infection-acquired immunity lasts longer than the time period under study, so re-infection was not considered.
The basic reproduction number of system (1) was calculated using the next-generation matrix approach 31 as where the first term accounts for the average number of secondary infections coming from severe infections, the second term from mild infections and the last from asymptomatic infections.Through (2), β can be expressed in terms of R 0 and other parameters as To incorporate under-reporting, we assumed that a fraction r s of severe cases, a fraction r m of mild cases and no asymptomatic cases were reported, which is often the case with a novel infection outbreak.The reporting rates r s and r m were fixed to be constants over time in light of the relatively short period of time considered in this study, though they may well vary in different stages of transmission.With these changes, we obtained a new system of equations ( 4)-(5) based on (1).This system is similar to the systems formulated in earlier studies [32][33][34] .In (4), C I and C K denote the true and reported cumulative number of cases.The derivation was completed by observing that C I (t) = σ E(t), and thus equation of E(t) can be rewritten using the equation of S(t) as E (t) = −S (t) −C I (t).Integrating both sides and using initial conditions S(0) = N and C I (0) = E(0) = 0 yields

3/17
Electronic copy available at: https://ssrn.com/abstract=4131409 with non-negative initial values In ( 5), R is the number of cases that had already recovered before the start of official reporting and C K0 is the number of reported cumulative cases on the first day of reporting.

Data
We collected daily reported cases and cumulative reported cases of COVID-19 which are publicly available on Our World in Data 35 .Based on data availability, we considered 54 African countries: Algeria, Angola, Benin, Botswana, Burkina Faso, Burundi, Cameroon, Cape Verde, Central African Republic, Chad, Comoros, Congo, Cotedlvoire, Democratic Republic Congo, Djibouti, Egypt, Equatorial Guinea, Eritrea, Eswatini, Ethiopia, Gabon, Gambia, Ghana, Guinea, Guinea Bissau, Kenya, Lesotho, Liberia, Libya, Madagascar, Malawi, Mali, Mauritania, Mauritius, Morocco, Mozambique, Namibia, Niger, Nigeria, Rwanda, S ão Tom é and Pr íncipe, Senegal, Seychelles, Sierra Leone, Somalia, South Africa, South Sudan, Sudan, Tanzania, Togo, Tunisia, Uganda, Zambia and Zimbabwe.Each country's total population size N in 2020 and the initial number of reported cases C K0 were also collected from the same source 35 .
The first COVID-19 case was reported on different dates for each country and their reaction to the pandemic in terms of mitigation measures were also quite different.Hence, the initial phase of the COVID-19 outbreak considered in this study differs by country on the dates and numbers of data points included.The starting point was chosen to be the date on which the first case was reported, whereas the ending point was chosen to be the last date with Country Stringency Index ≤ 50 from Oxford COVID-19 Government Response Tracker 36 .Five out of 54 countries (9.3%) have missing Stringency Index data (Burundi, Comoros, Equatorial Guinea, Guinea Bissau, and S ão Tom é and Pr íncipe).36 countries (66.7%) have the duration of this phase shorter than 14 days, which means these countries either imposed their strict intervention strategies within two weeks of their first officially reported case or had begun to do so before the occurrence of their first case.To have enough data points for fitting and also taking into account the delayed compliance to government policies in reality, for countries with no index data or a too short period, a minimum of 14 days was used instead.All country-specific parameter values are summarized in Supplementary Materials.

Fitting model to case data
The model ( 4)-( 5) was fitted to reported cumulative cases and daily cases of COVID-19 for each country taken from Our World in Data 35 during the beginning of the epidemic.The number of daily reported cases is denoted by I K and was calculated by our model as We used Markov chain Monte Carlo method (MCMC) with Delayed Rejection Adaptive Metropolis (DRAM) algorithm with non-informative uniform priors and sum of squares (SS) given in (7) to fit the model, where t data are the chosen time points for each country; C C C K,data and I I I K,data are observed cumulative and daily cases corresponding to t data .The algorithm was implemented through MCMC toolbox for MATLAB 37 with 5000 iterations and adapted proposal every 500 iterations for each country.Since milder cases are usually less transmissible due to less coughs 38 , we assumed a decreased transmissibility of asymptomatic cases (ρ a ≤ 1) but an increased transmissibility of severe cases (1 ≤ ρ s ≤ 1.5).The corresponding values for

4/17
Electronic copy available at: https://ssrn.com/abstract=4131409fixed parameters and the initial values, minimum and maximum values used for MCMC simulations for estimated parameters are listed in Table 1.Posterior distributions for the true basic reproduction number, report rate of mild and severe cases, fraction of severe and asymptomatic cases, and relative infectiousness of severe and asymptomatically infectious individuals to mild infectives along with initial values were then estimated.

Results
The densities of posterior distributions of the estimated parameters are given in Supplementary Materials.From these, we further calculated the mean estimates of reproduction number of reported cases (denoted by R 0K ), fraction of mild cases among all cases (denoted by pm ), fraction of severe cases among all cases (denoted by ps ) and overall report rate (denoted by r all ) as in (8).
Table 2 summarizes all mean values of these parameters for all countries.The countries are grouped into regions for organizational purpose.
Cumulative and daily reported cases together with the predicted values obtained from our model are presented in Figure 2. The red and green circles are the cumulative and daily reported cases of COVID-19 respectively, while the mean predicted

5/17
Electronic copy available at: https://ssrn.com/abstract=4131409values are shown in red and green lines and the colored bands represent the 95% high-density intervals (HDI) of prediction.Judging from Figure 2, the mean fits of reported cumulative cases C K and new cases I K to data are generally good for each country.Notice that in this initial phase, reported cases grow exceptionally fast in Egypt, Algeria, South Africa, Ghana, Mauritius, Burkina Faso and Rwanda, while case numbers were not updated and stayed at a low level for Togo and S ão Tom é and Pr íncipe.In countries like Benin, Botswana, Burundi, Cabo Verde, Central African Republic, Comoros, Congo, Eswatini, Gambia, Guinea, Lesotho, Liberia and Mauritania, plots of cumulative cases demonstrated a stepwise shape implying a possibility of discontinued and unstable testing or reporting practice.
In Figure 3, the predicted mean true cumulative cases C I (black line) with 95% HDI (shaded area) are plotted as opposed to reported cumulative cases C K (blue line for modeled values and blue stars for data).For each day's estimated true cumulative cases, they were further decomposed into severe cases (darker red bars), mild cases (lighter red bars), asymptomatic cases (lighter green bars) and already recovered cases before the official reporting started (darker green bars).The number on the top right corner indicates the percentage of cases reported cumulatively by the end of this initial phase.Throughout countries, the predicted true numbers of cumulative cases are high above the reported ones in all time, with Sudan, Gambia reporting collectively most (27.03%,22.19% respectively) while most countries reported less than 5%.Severe cases made up only a very small fraction of all predicted cases, but are of a more significant proportion in Libya, Gambia, Madagascar, Uganda etc. Meanwhile apparently asymptomatic infections are the majority.Extraordinarily, in Seychelles the already recovered cases constitute the largest portion.
We estimated that the mean basic reproduction number R 0 is 2.02 (SD 0.70), with a range between 1.12 (Zambia) and 3.64 (Nigeria), whereas the mean basic reproduction number for observed cases R K0 was estimated to be 0.17 (SD 0.17), with a range between 0 (S ão Tom é and Pr íncipe, Seychelles, Tanzania, South Sudan, Mozambique, Liberia, Togo) and 0.68 (South Africa).The true and basic reproduction number for observed cases for all countries are plotted in Figure 4. Ubiquitously, R K0 is less than R 0 under the assumption that asymptomatic cases were not reported and only a proportion of mild cases were reported.Countries with high R 0 may have surprisingly low R 0K like Nigeria, Cameroon, Niger, Democratic Republic Congo and Lesotho etc. due to a majority of infections being asymptomatic or mild and/or a very low report ratio of mild infections.Moreover, countries with R 0K < 1 suggesting no outbreak within the territory may actually have true R 0 way higher than 1, indicating a definite outbreak.Thus, neglecting under reporting will likely cause a large underestimation of the basic reproduction number and furthermore the severity and magnitude of localized epidemic.
We estimated that the mean overall daily report rate r all is 5.37% (SD 5.71%) among all countries, with the highest 30.41% in Libya and the lowest 0.02% in S ão Tom é and Pr íncipe.The estimated mean report ratio of severe infections r s was 38.21% (SD 12.71%) with the highest 62.23% in Cameroon and the lowest 5.33% in Togo, and the estimated mean report ratio of mild infections r m was 13.74% (SD 14.45%) with the highest 47.83% in Benin and the lowest 0% in Mauritania.Reporting ratios of all kinds for all countries are plotted in Figure 5. Figure 6 summarizes the overall report rate across countries.Notice that in most countries, the severe infection report rate was higher than mild infection except in Uganda, Benin and Ethiopia, although the ratios did not differ much.Also, countries with a relative high/low severe infection report rate tend to have high/low overall report rate, but those with very high mild case report rate may still exhibit low overall rate due to a high percentage of asymptomatic case constitution like Cameroon, Zimbabwe and the Central African Republic.
As for proportions of different types of infections, the mean estimates were 5.46% (SD 6.40%) for severe infections, 27.80% (SD 15.39%) for mild ones and 66.74% (SD 17.28%) for asymptomatic ones.Proportions for all countries are plotted in Figure 7.This figure suggests that subclinical infections generally make up the majority of all COVID-19 cases, but specific proportions vary from location to location.The mean estimation of relative infectiousness of severely and asymptomatically infected individuals were 1.25 (SD 0.02) and 0.44 (SD 0.20) with values for all countries plotted in Figure 8.The variation of those for severe infection was comparatively small among countries, suggesting a more intrinsic characteristic to SARS-CoV-2 rather than one subject to demographic or geological change.The relative infectiousness of mild infection was lower in a small number of countries like Morocco, South Africa, Eswatini, Cameroon, Guinea Bissau etc.

Discussion
By fitting a simple deterministic mathematical model distinguishing various levels of infections and accounting for levelspecific reporting rates to case data for each African country, we estimated crucial epidemiological parameters describing and characterizing the COVID-19 early outbreak and thereby provided a country-wise epidemic profile for this continent.Our findings suggest that the basic reproduction numbers were much higher than when only clinically reported cases were taken into consideration and the average overall case reporting rate was rather low in this early stage of localized outbreak.
Africa is unique in its younger population structure.Younger people are shown to be less susceptible to SARS-CoV-2 13 implying limited spread and less severity in epidemic magnitude 17,46 .They are also found to be less likely of showing clinical symptoms compared to senior people 13 , which further suggests a higher possibility of lower detection rates in countries mostly made up of young population like in Africa.Additionally, in an early stage of a novel transmissible disease outbreak, Electronic copy available at: https://ssrn.com/abstract=4131409Electronic copy available at: https://ssrn.com/abstract=4131409The meta mean over all countries of R 0 is 2.02 (SD 0.70) and 0.17 (SD 0.17) of R 0K .
observed data to have an estimation of the under-ascertainment rate and the true epidemic magnitude, can therefore help to some extent unveil the hidden infections and thereby better evaluate the disease severity and cope with it more appropriately.
Our estimation of the mean overall reporting rate of 5.37% for Africa is consistent with the estimation in the early stage of COVID-19 in other countries.In Wuhan China, where the very first SARS-CoV-2 cases were reported, the case ascertainment rate was estimated to be 5.0% (95% CI, 3.6-7.4) 47.Another estimation for 375 cities in China before the initial travel restriction of the documentation rate was 14% (95% CI, 10-18) 27 .In a modeling study in 92 nations across the world, the estimated ratio of true to reported cumulative cases through the end of 2020 was 7•03 (10%-90% percentile, 3.2-18), corresponding to a cumulative reporting ratio of 14.22% (10%-90% percentile, 5.56-31.25) 24.It is also consistent with seroprevalence study: less than 1% of all infected cases were reported according to a fall 2020 seroprevalence study in Juba in South Sudan 48 , compared to our estimation of overall reporting rate of 0.06% for South Sudan in the beginning stage .Our estimation that on average, more than half (66•74%) of all infections were asymptomatic is in agreement with the asymptomatic infection rate estimate of 46% (95% CI, 18.48-73.60) 49, 50% in 92 countries 24 and 40%-45% in cohorts of various locations 50 .And similarly, the asymptomatic ratio displays small variations across nations 24 .Our estimation of the relative infectiousness of asymptomatic population of 44% also roughly agrees with the estimation of 55% (95% CI, 46-62) in China 27 .Nonetheless, our more detailed model provided extensive estimates as to the reporting ratios of various levels of infections, detailed proportions of the infections among all cases and their corresponding relative infectiousness.Noting some of the quantities reflect intrinsic characteristics of SARS-CoV-2, these estimations could be used for understanding or modeling outside Africa too.
The reporting ratio was assumed to be constant throughout the initial phase of the outbreak in our study, which is acceptable due to the short average duration of this phase.But it needs acknowledgement that testing and reporting practices greatly vary as the disease outbreak goes on.Therefore, for a better estimation of under-reporting for longer period, time-dependent rates

9/17
Electronic copy available at: https://ssrn.com/abstract=4131409 Figure 5. Reporting rates r m of mild infections (green bars), r s of severe infections (yellow bars) and overall reporting rates r all of all infections (red bars).The meta mean over all countries is 13.74% (SD 14.45%) of r m , 38.21% (SD 12.71%) of r s and 5.37% (SD 5.71)% of r all .are necessary, as in studies utilizing also the mechanistic models 26,27 .Inversion method can also be of use for such estimation without the need to assume a specific functional form for ascertainment rate over time.Techniques other than mechanistic mathematical modeling had also been used to estimate the level of under-ascertainment, such as ratio of baseline Case Fatality Rate (CFR) and delay-adjusted CFR 51 , which gives a time-dependent under-ascertainment rate rather than a constant one.
One major limitation to this study was the short duration and great data uncertainties within this period.African countries implemented their public health and social measures (PHSMs) very early: 36 out of 50 (72%) countries implemented their first stringent PHSM a mean of 15 days before reporting their first case and the rest 14 countries implemented a mean of 9 days after their first case 52 .Thus the initial stages were generally quite short for all countries considered here.Although there is an advantage in studying only the initial stage of COVID-19 outbreak, that there were no major mitigation interventions yet, allowing simpler epidemiological models, there is also great disadvantage in the reliability of the data.As mentioned before, case reporting and data collecting were not yet well established, making the data a relatively poor representation of the routine surveillance and reporting practice of the country.
Few studies have tried to provide a complete COVID-19 epidemic profile for Africa.Our work here might give some insight into the characteristics of the transmission of SARS-CoV-2 in this continent and a comparison among nations.Our more detailed model might also provide an extended estimation of under-ascertainment of cases, thereby offering material for a more accurate assessment of the true epidemic size in Africa.Nevertheless, the estimated low reporting rates in Africa suggested a clear need for improved reporting and surveillance system in these countries.

Figure 1 .
Figure 1.An illustration of the compartmentalized mathematical model for the early transmission of COVID-19.

Figure 2 .
Figure 2. Mean model prediction and data for cumulative reported cases (red line and red circles) and daily reported cases (green line and green circles) in each country for the initial time period with no major mitigation at national level.Colored areas show 95% high density intervals (HDI) of prediction.

Figure 3 .
Figure 3.Estimated mean true cumulative cases (black line) with modeled and data of reported cumulative cases (blue line and blue star) in each country for the initial time period with no major mitigation at national level.Colored bars show the percentage of severe (darker red), mild (lighter red), asymptomatic (lighter green) and already recovered cases before the reporting started (darker green).Grey shaded areas show 95% high density intervals (HDI).

Figure 4 .
Figure 4. Mean basic reproduction number R 0 for all infections (green bars) and mean basic reproduction number R 0K for observed cases (red bars).The meta mean over all countries of R 0 is 2.02 (SD 0.70) and 0.17 (SD 0.17) of R 0K .

Figure 6 .
Figure 6.Estimated overall COVID-19 case reporting rate r all for 54 countries in Africa.

Figure 8 .
Figure 8. Relative infectiousness of severe ρ s (orange bars) and asymptomatic infections ρ a (yellow bars).The meta mean over all countries of them are 1.25 (SD 0.02) and 0.44 (SD 0.20) respectively.

Table 1 .
) Table of fixed parameters and unfixed parameters with initial values, minimum and maximum values for MCMC run.

Table 2 .
Mean estimation of parameters for each country.