Deficiency in civil registration and vital statistics reporting in remote areas: the case of Sabah, Malaysia

Malaysia has a well-established civil registration system dating back to the 1960s. Birth registration is virtually complete at the national level. However, the quality of civil registration in some remote areas is doubtful, as evidenced by the abnormally low birth and death rates in several districts. This study focuses on identifying districts in Sabah, where the reporting of births seems problematic. Sabah is the least developed state in Malaysia, and it is sparsely populated, despite being the second most populous state in the country. Sabah’s civil registration lags behind the other states, to the extent that birth and death statistics were not reported for the state in the vital statistics report for the period 2000 to 2009. A 2016 study found that death registration is almost 100%, except for Sabah (88%). The plausible reasons behind the ultra-low birth rate reported in several remote districts in Sabah include misreporting of the place of occurrence as the usual residence, delayed reporting, non-coverage, ignorance of the law, inaccessibility, presence of a large number of migrants, miscommunication, and errors in data entry. The under-reporting of births may have serious consequences, such as misallocation of resources and deprivation of services to those affected. In line with the transformative promise of “leaving no one behind,” the Sustainable Development Goals urge all countries to strive to improve data quality for planning; this includes complete birth registration for creating effective development programs to reach target groups more effectively.

imposes a heavier penalty on those who fail to register birth or death events, to a maximum fine of Malaysian Ringgit (RM) 20,000 or 3-year imprisonment, compared to the previous penalty of a maximum of RM250 or 12 months in prison.
One of the Sustainable Development Goals (SDGs) targets is to provide legal identity for all, through birth registration, by 2030 (SDG target 16.9). Birth data are needed to measure and achieve universal health coverage (SDG target 3.8) for policymakers and planners to target persons in need of health care, education, and other services. Rahman et al. (2019) found that more than 110 low-and middle-income countries have deficient civil registration and vital statistics (CRVS) systems. Within each country, this deficiency tends to be more profound in rural areas and among the poorest segment. Efforts must be made to improve CRVS systems for the effective delivery of health and social development programs (World Health Organization [WHO], 2013).
The Department of Statistics, Malaysia (DOSM) is the focal point in the compilation and reporting of SDG Indicators Development at the national level. It is also a team member of an Inter-agency Expert Group in the Southeastern Region. Over the years, DOSM has developed its expertise in collecting and disseminating official statistics with increased use of technology (National Population and Family Development Board [NPFDB], 2018).
In a rapid assessment of CRVS in 2011, the DOSM found that Malaysia has achieved the minimum percentage for both birth and death registrations. A 2016 study on underreporting of deaths concluded that death registration is almost 100%, except for Sabah (88.0%) (NPFDB, 2018). However, the completeness of birth registration at the national level or even at the state level may not reflect the actual fertility level in smaller geographical areas.
There is a sizable literature on the population of Malaysia at the national level (Arshat et al., 1988;Chander et al., 1977;Hugo, 2011;Khoo, 2005;Leete, 1989Leete, , 1996Leete, , 2007NPFDB, 2016;Saw, 2007;Sidhu & Jones, 1981;Tey et al., 2015Tey et al., , 2020. The few spatial demographic analyses revealed wide differentials in demographic dynamics, health behavior, and health care utilization across districts and smaller geographical areas (Abd Majid et al., 2019;Abdul Rashid, 2017;Azreena et al., 2016;Hazrin et al., 2013;Ling et al., 2014;Masron et al., 2012;Md Bohari et al., 2019;Tey et al., 1985). The earliest studies on assessing the quality of vital registration data may be mentioned in Hirschman and Tan's (1971) work on evaluating mortality data in vital statistics and Shamsuddin and Lieberman's (1998) work on linking death reports with birth and death certificates. Under-registration of births and completeness of birth registration have been widely studied in other countries (Brito et al., 2017;Duryea et al., 2006;Garenne et al., 2016;Lima et al., 2018;Makinde et al., 2016;Nannan et al., 2019). However, no study has been carried out to evaluate the completeness of birth registration in Malaysia since the seminal work of Saw (1964) on the under-registration of births in Peninsular Malaysia (then known as Malaya) before independence in 1957, and that of Leete and Kwok (1986) on the estimation of fertility in East Malaysia.
This study sought to evaluate the completeness of birth registration by examining birth statistics for small geographical areas obtained from vital statistics reports. The reasons for the exceedingly low birth rate in a few districts will be explored. This study focuses on Sabah, where registration of vital events, especially in small geographical areas, is more problematic. Sabah is the least developed state in Malaysia. A sizable proportion of Sabah's population lives in remote areas. Sabah has a very low population density of 53 persons per square kilometer; this figure is as low as four to seven persons per square kilometer in two districts (DOSM, 2020c, d). The lack of infrastructure hampers service delivery to remote areas, which is probably an important factor for the under-registration of births and deaths. Sabah's vital statistics system has lagged behind other states, and it is the only state, where the DOSM did not report birth and death statistics from 2000 to 2009. The findings of this study will provide a strong case for a national study to be carried out to assess the reliability and accuracy of birth registration at the district level and in smaller geographical areas. It is hoped that the findings from this study will be used to improve civil registration and reporting systems. Comprehensive data from the vital registration system can be used to assess the quality of data obtained from population censuses and household surveys.

Background
Malaysia comprises 13 states and three federal territories. Sabah, Sarawak, and the federal territory Labuan are located on the northern part of Borneo Island, separated from Peninsular Malaysia by the South China Sea (Fig. 1). Malaysia's population was estimated at 32.5 million in 2020. The population grew by 2.5% over the second half of the twentieth century, but the rate of population growth decelerated rapidly to just 0.4% in 2020. Fertility has dipped below the replacement level since 2013. Non-citizens make up about 10% of the total population. Among the citizens, Malays and other Indigenous populations, collectively known as the Bumiputeras or sons of the soil, made up 69.3%, Chinese 22.8%, Indians 6.9%, and others 1.0% (DOSM, 2020a). All Malays are Muslim, while most Chinese and Indians are Buddhist and Hindus, respectively (DOSM, 2011a). There is a sizable number of Christians among some of the other Bumiputera, Chinese, and Indians. The United Nations (UN) classifies Malaysia as a very high Human Development Index (HDI) country (HDI = 0.81), and the World Bank classifies it as an uppermiddle-income country. Sabah comprises 25 administrative districts (Fig. 2). Its population was estimated at 3.9 million in 2020. The Kadazan-Dusun, Murut, and Bajau are the three main ethnic groups. The high population growth of 4-5% per year in the three decades from the 1970s through the 1990s decelerated to around 2% during the 2000s. The population stopped growing in 2020. Non-citizens make up about 29% of the state population (DOSM, 2020a); they are concentrated in a few districts, such as Kinabatangan (77.8% of

Methodology
DOSM has published national vital statistics data annually since the formation of Malaysia in 1963. The data are cross-classified by a few variables, such as state, ethnicity, and age of the mother (of the babies). Information such as the number of live births and crude birth rate (CBR) at the district level was made available since 2015, but the total fertility rate (TFR) was available only up to the state level.
This paper begins with a description of fertility at the state and district levels, followed by the bivariate analysis of CBR with selected variables. The indirect estimates of TFR were computed based on the child-woman ratio (CWR) method, using details from the population pyramid proposed by Hauer and Schmertmann (2020). Since there is no information on recent births in the census, the indirect approach was used to estimate the TFR at the district level.
The CWR is a crude measure of fertility from census data. It is used to estimate the fertility level for small geographical areas, where information on the number of births is unavailable. Scatter plots of CWR and CBR for the 10 Association of Southeast Asian Nations (ASEAN) countries and Malaysian states show high correlation coefficients of 0.963 and 0.705, respectively, indicating the close correspondence between these two measures, as shown in the appendix. Using data from vital registration, the number of births and infant and toddler deaths over the 5 years preceding the census was estimated by taking the sum of these events in 2006 to 2009 and the average number for the years 2005 and 2010, yielding 2,430,048 births and 20,112 infant and toddler deaths, respectively, at the national level (Table 1). The 2010 population census enumerated a total of 2,426,957 persons aged under 5 years (or 2,447,069, if infant and toddler deaths are taken into account) for Malaysia as a whole. The cumulative number of births over 5 years preceding the census corresponded very closely (99.3%) with the enumerated population aged under 5 years at the national level.
The vital statistics report for the years 2005-2009 did not show Sabah's births and deaths, probably due to incomplete registration. Hence, vital events for Sabah during this period were derived by taking the difference between the national data and all other states in the country ( Table 1). The estimated number of births in Sabah was about 27% (100-73.3%) lower than the under-5 population that was enumerated during the census, but the under-count at the national level was rather negligible at 0.7% (100-99.3%). The large discrepancy of these figures in Sabah indicates the substantial under-registration of births in the state. Confining the analysis to the citizen population reduces the under-estimation to only 2.4% (100-97.6%) for Sabah. The problem that arises in Sabah was mainly due to the non-reporting or under-reporting of vital events among non-citizens, especially in remote districts, as highlighted in a recent paper by Tey et al. (2021).
Because the published report of the 2010 population census does not provide the agesex distribution of the population at the district level, the CWR was derived from the 2% sample, which is rather representative of the total population. For instance, the sample data shows that 11% of the population were under 5 years, compared to 10.9% in the published report (DOSM, 2011a).
The 2010 population census took place from 6 July to 22 August 2010. Detailed information on the demographic, social, and economic characteristics was available for the 13 states and three federal territories in Malaysia. A multi-modal data collection method was used, including face-to-face interviews and self-enumeration (drop-off and pick-up, and e-Census), based on the de jure approach. Detailed information on the census methodology can be obtained from the DOSM published census report (DOSM, 2011b).
TFR is the sum of the age-specific fertility rates (ASFR) for women aged 15 to 49 multiplied by the width of the age interval, n (typically the 5-year age group). This equation can be written as where B x is the number of births to women aged x to x + n, and W x is the number of women aged x to x + n. Since the general fertility rate (GFR) is defined as the total number of births per woman aged 15-49 years, the GFR can be fitted into the TFR function and multiplied by the width of the interval to approximate the TFR. Hence, Eq. (1) can be rewritten as follows: Equation 2 extrapolates the TFR as the ratio of the sums for the entire reproductive years (15-49 years) to construct an approximation of the TFR. With the assumptions of no infant mortality and no migration among women of childbearing age and their dependent young aged 0-4 years in the past 5 years, the CWR may be substituted into the term representing the GFR in Eq. (2) and used to estimate TFR as follows: where C is the number of children aged 0-4 years, and W is the number of women enumerated at childbearing ages (15-49 years). This derivation was named the implied TFR (iTFR), as it reflects the implied TFR present in the population's age structure. The multiplier of 7 is obtained by assuming that child mortality is approximately 0 over the first n years of life and that women are uniformly distributed over 35 years of reproductive ages (35/5).
In checking the multiplier of 7 in Malaysia's context, state fertility schedules are examined for which the true TFR is known. Hauer and Schmertmann (2020) computed the average TFR over the five previous years for each country c and time t: TFR c,t−k . In this paper, the approximation is based on the average fertility rates computed for each state between 2010 and 2014, as the actual TFR for Sabah was not available before 2010. The empirical TFR * ct is then divided by the observed CWR, C ct W ct to compute the multiplier under the assumption of negligible child mortality. The iTFR formula assumed that the multiplier equals 7. Based on the state-level fertility data in Malaysia, the multiplier was within 10% of 7 (6.3-7.7) in 37.5% of state-years; this was much lower than the 88.6% reported by Hauer and Schmertmann (2020) using the Human Fertility Database (HFD). The plausible reason could be the small number of states in Malaysia (16 states, including three federal territories) compared to 1,804 fertility schedules in the HFD. Women of reproductive age may be clustered in high-or low-fertility age groups, and women's age distribution should thus be considered in the approximation. The extended TFR (xTFR) is estimated by adjusting the non-uniform distributions of women within reproductive ages (Hauer & Schmertmann, 2020). Using the proportion of women aged 25-34 years among those who are aged 15-49 years (as denoted by π 25-34 ) as a predictor in a simple regression with the state data in Malaysia produces the approxima-tionTFR * ct / C ct W ct ≈ 10.26 − 11.64π 25−34 . The estimated intercept and slope coefficients were only slightly different from those in Hauer and Schmertmann (2020) (10.65 and −12.55, respectively). The equation used to compute xTFR is written as follows: Equations (3) and (4) are used to compute iTFR and xTFR, respectively, at the state and district levels, using the 2% sample of the 2010 population census.
(2) Table 2 shows the CBR and TFR by state in Malaysia between 2010 and 2019. Pulau Pinang has the lowest CBR since 2010, while Putrajaya has the highest CBR. Putrajaya is the seat of the federal government, with many working women in the reproductive age groups. The 2019 statistics showed that the CBR in Perak, Pulau Pinang, Sabah, Sarawak, and Selangor was considerably lower than the national level of 15.0.   Malaysia's TFR had dipped below replacement level since 2013, and it continued to decline to 1.78 in 2019. Sabah had the second lowest TFR of 1.41 children per woman, behind Pulau Pinang (1.28).

Results
The CBR in Sabah declined from 14.7 per thousand population in 2015 to 13.5 in 2019, compared to 15.0 at the national level. Table 3 shows that the CBR was highest in Pitas (ranging between 20.9 and 24.0) and lowest in Kinabatangan (with a range of 3.5 to 4.5). Besides Kinabatangan, the CBR in five other districts was way below the state average: these include Putatan (6.8), Tongod (9.4), Beluran (10.2), Tawau (10.4), and Sandakan (10.6).
The abnormally low CBR recorded in Kinabatangan may be due to the under-registration of births, which can be explained by the very high proportion of non-citizens (77.8%) in the district. Beluran and Tongod had a large proportion of non-citizens, and these two districts also lack health facilities for childbirth. The low CBR in Sandakan and Tawau can be partly explained by the higher proportion of ethnic Chinese, with an ultra-low TFR of 0.6 at the state level in 2018 (DOSM, 2020e). Kota Kinabalu reported a relatively high CBR, given its level of development. This anomaly could be due to the concentration of health facilities in Sabah's capital city, and some of the births by women from other districts might have been reported under Kota Kinabalu. The districts located nearer to the borders of Indonesia (Nabawan, Sipitang) and the Philippines (Kudat, Pitas, Kota Marudu) had a slightly higher CBR than other districts in Sabah.
Five of Sabah's districts were among the top 10 districts with the lowest CBR in Malaysia in 2019 (Table 4). These figures are far below the average CBR for middle-income countries, which stands at about 18 per thousand population (UN, 2019). The exceedingly low level of CBR suggests the high likelihood of under-registration of births. Moreover, the low CBR in the five districts could be the reporting by place of occurrence instead of residence. The wide discrepancy in the ranking of the CBR and CWR, shown in Table 4, demonstrates the probable under-registration of births in the very low-CBR districts in Sabah and Sarawak.
The results show that despite the time lag of 9 years, the CWR is indicative of the fertility level for these districts. For instance, in Peninsular Malaysia, where birth registration is complete, there is a close correspondence between CBR and CWR in Kampar and Timur Laut.

Explaining CBR differentials across districts
Differentials in the fertility rate across the Sabah districts may be attributed to several factors, such as the proportion of non-citizens, number of health facilities, and income level. The associations between CBR and these variables are examined with the use of scatter plots. Figure 3 exhibits an inverse relationship between the percentage of non-citizens and CBR by district in Sabah. In 2019, the CBR ranged from 3.5 in Kinabatangan to 23.1 in Pitas. The proportion of non-citizens varied widely across the districts, from 3.1% in Kota Marudu to 77.8% in Kinabatangan. Kota Marudu had the second highest CBR in the state and the lowest proportion of non-citizens, while Kinabatangan had the lowest CBR and highest proportion of non-citizens. Most of the districts with a high proportion of non-citizens had a low CBR. Because some non-citizens and those who married non-citizens may choose not to register their children's births, under-registration of births is highly probable in the districts with a large presence of non-citizens.
Another problem is reporting vital events according to the place of occurrence instead of residence. Figure 4 demonstrates a positive relationship between the number of maternal community health clinics (also known as rural clinics) and CBR by district in Sabah. In 2019, the number of maternal community health clinics ranged from 0 in  institutional deliveries must be in other districts. These births could have been registered at the place of occurrence, resulting in the under-counting of Kinabatangan births. Figure 5 reveals a negative relationship between median monthly household gross income and CBR by district in Sabah. The income level varies across districts. The 2019 statistics showed that the median monthly household gross income ranged from RM1,999 in Pitas to RM6,004 in Kota Kinabalu. Districts with a higher income tended to have a lower CBR. This finding corroborates the evidence of the negative association between fertility and income level in past studies (Becker, 1960;Jones & Tertilt, 2008;Lee & Mason, 2010).

Estimation of fertility schedules
The estimation of fertility schedules was first performed at the state level to gauge the accuracy of the estimation method by comparing the estimated figures with the reported figures. Table 5 shows that the values of iTFR differ from the TFR reported in the vital statistics report, with the absolute difference ranging from 0.01 in Kedah to 1.10 in Kelantan. The xTFR values correspond more closely with the reported TFR, with the absolute difference ranging from 0.03 each in Kedah and Negeri Sembilan to 0.92 in Kelantan. The estimated iTFR and xTFR were higher in Sabah at 2.85 and 2.70, compared to 2.35 and 2.24, respectively, for Malaysia as a whole. These compare with the reported TFR of 1.84 for Sabah and 2.14 for the country in 2010. This result provides further evidence of the magnitude of under-reporting of the fertility level in Sabah.
Across districts in Sabah, iTFR ranged from 1.51 in Kuala Penyu to 5.06 in Tongod (Fig. 6). The xTFR exhibited only slight discrepancies, and the value was slightly lower than iTFR except in Ranau, Kota Belud, Tuaran, Kota Marudu, Beaufort, Sipitang, Tenom, Nabawan, and Tambunan. It is to be noted that Tongod showed a different pattern after controlling for the distributions of women within reproductive ages, with xTFR  3) and (4), respectively of 3.16 (as compared to iTFR of 5.06). This reversal may be explained by a high concentration of women of prime reproductive ages in this district, contributing to the high iTFR. It can be observed that the iTFR and xTFR in Kinabatangan and Putatan are about the same as most other districts, despite these two districts having the lowest CBR since 2015 (Table 3). This finding further reaffirms the under-registration of births in some districts. Table 5 shows a close correspondence between actual TFR and xTFR at the national level, indicating that CWR, as reflected by xTFR, may provide a fairly accurate estimate of fertility. However, these two measures deviate quite widely for some states, including Sabah. This discrepancy suggests a deficiency in the data at the sub-national level. The reasons for the deficiency will be discussed in the next section.

Discussion and conclusion
Malaysia has well-established civil registration and reporting systems dating back to the 1960s. However, the quality of vital statistics collection and reporting in Sabah lags behind the other states, to the extent that birth and death statistics were not reported for the state in the 2000-2009 vital statistics report. The registration of deaths in Sabah was estimated at only 88% in 2016 (NPFDB, 2018). One plausible reason for the under-registration of births and deaths in Sabah is the presence of a large number of non-citizens, as in the case of Kinabatangan. Hence, vital statistics should be reported separately for citizen and non-citizen populations to facilitate an investigation of the sources of underregistration of births and deaths so that remedial actions can be taken.
Accurate data on the number of births and the birth rate in small geographical areas are needed for human resource planning and allocation of resources. Demographic data are used to develop effective development programs to provide education, healthcare services and facilities, childcare services, and amenities. Under-reporting of births will result in budgetary misallocations and may deprive people in these localities of the resources they need. A study by the United Nations Children's Fund (UNICEF) found that accurate information on birth and death events remains highly deficient in many developing countries (UNICEF, 1998). The under-reporting of births will result in sub-optimal investments in health care and education . Children with no legal identity are highly vulnerable to the risk of human trafficking and exploitation (Makinde, 2016;Makinde et al., 2017).
This study drew attention to the under-reporting of birth data in several districts in Sabah and explored the reasons behind the low CBR in these remote areas. Vital statistics reports show ultra-low CBR in some districts, which raises questions on the accuracy and completeness of vital registration and the possibility of reporting births by place of occurrence instead of place of residence. This study's findings corroborate with findings from past studies which show the deficiency in birth registration in rural areas and among poor and marginalized groups in developing countries (Bhatia et al., 2017;Li et al., 2010). The deficiency in birth registration in remote areas in Sabah may be attributed to various reasons, such as non-registration of newborns who die soon after birth, inaccessibility, ignorance of the law, and procedures which corroborate findings from past studies (Apland et al., 2014;WHO, 2013;Wodon & Yedan, 2019). The coverage deficiency could also be due to delayed reporting, deliberate non-reporting, miscommunication, errors in data entry, and lack of coordination among the various agencies involved in the registration and reporting of vital events, resulting in information leakage (Emery, 1990;UNICEF, 2013;WHO, 2013;World Bank & WHO, 2014).
Incomplete birth registration in remote districts in Sabah could affect the delivery of services and the provision of basic infrastructure and amenities to reach target groups more effectively. It is crucial to register all births and provide legal identity for all to ensure the achievement of SDG 16.9 by 2030. Completeness and coverage of birth registration are essential to achieve other SDGs, such as SDG 1 (no poverty), SDG 4 (quality education), SDG 5 (gender equality), SDG 10 (reduced inequalities), SDG 16 (peace, justice, and strong institutions), and SDG 17 (partnerships for the goals).
Births should be reported based on both the current place of residence and place of occurrence, as it is crucial to reflect the local level's actual situation. Complete and accurate data on birth, death, marriage, divorce, and migration (from the change of address in the identity card, which should be enforced) will allow the authorities to have better and timely population data for planning purposes. Following proposals by Lima et al. (2018), more significant investments in the vital registration system, coupled with campaigns to inform the public of the importance of registering births, are needed to ensure that population data from census or other estimates are useful.
DOSM published district-level vital statistics in the 1980s but discontinued the practice in the 1990s; it was resumed only in 2015. DOSM, as the central agency responsible for the compilation of SDG indicators, should strive to collect high-quality data required for monitoring and evaluating the various programs towards achieving the SDGs.
Evaluation of the quality of data from vital registration should be carried out from time to time. DOSM has published reports on progress in compiling the SDG indicators in Malaysia since 2019, but the district-level information is scarce. Efforts have been made to combine census data with vital statistics, administrative records, and service statistics towards producing comprehensive and timely data for planning purposes. For instance, the number of enrolments at the primary level can be used as an alternative measure of the number of births, with a time lag.
The present study uncovers under-registration in remote areas in Malaysia, which up to now has received little research attention. A comprehensive study should be conducted nationwide to identify areas, where civil registration is deficient. DOSM should make available raw data from vital registration and censuses to researchers to conduct in-depth analysis.
Less-developed states in Malaysia can emulate the more-developed states in improving civil registration. Malaysia can also benefit from the experience of other countries, such as the national campaign in India in 2003 to create massive awareness among the public on the need for birth registration to bridge regional gaps (Singh, 2004). As recommended by WHO (2013), special campaigns and incentives targeting local registration authorities, along with engagement of community leaders and health workers, may be needed to improve coverage. In addition, stricter enforcement of the law should be considered, including a heavier penalty for non-compliance.

Limitations
Estimates derived from population censuses refer to fertility in the last 5 years. Migration is likely to introduce bias in census estimates, especially in districts with a substantial proportion of non-citizens. Hence, the data should be interpreted cautiously. The indirect method used to estimate fertility schedules assumed no infant mortality and no migration among women of childbearing age and their dependent young aged 0-4 years in the past 5 years, which might affect the accuracy of the estimation. The lack of health and education statistics, such as vaccination and immunization records and school enrollment, may conceal the actual level of under-registration in these districts. The present study adopted a quantitative approach, which may not provide a good understanding of the role and relevance of social context. Future research should consider a mixed-method approach by incorporating qualitative research and field visits to better understand the reasons behind the under-registration and misreporting of births and deaths in remote areas.