Identifying risk clusters for African swine fever in Korea by developing statistical models

Introduction African swine fever (ASF) is a disease with a high mortality rate and high transmissibility. Identifying high-risk clusters and understanding the transmission characteristics of ASF in advance are essential for preventing its spread in a short period of time. This study investigated the spatial and temporal heterogeneity of ASF in the Republic of Korea by analyzing surveillance data on wild boar carcasses. Methods We observed a distinct annual propagation pattern, with the occurrence of ASF-infected carcasses trending southward over time. We developed a rank-based statistical model to evaluate risk by estimating the average weekly number of carcasses per district over time, allowing us to analyze and identify risk clusters of ASF. We conducted an analysis to identify risk clusters for two distinct periods, Late 2022 and Early 2023, utilizing data from ASF-infected carcasses. To address the underestimation of risk and observation error due to incomplete surveillance data, we estimated the number of ASF-infected individuals and accounted for observation error via different surveillance intensities. Results As a result, in Late 2022, the risk clusters identified by observed and estimated number of ASF-infected carcasses were almost identical, particularly in the northwestern Gyeongbuk region, north Chungbuk region, and southwestern Gangwon region. In Early 2023, we observed a similar pattern with numerous risk clusters identified in the same regions as in Late 2022. Discussion This approach enhances our understanding of ASF spatial dynamics. Additionally, it contributes to the epidemiology and study of animal infectious diseases by highlighting areas requiring urgent and focused intervention. By providing crucial data for the targeted allocation of resources for disease management and preventive measures, our findings lay vital groundwork for improving ASF management strategies, ultimately aiding in the containment and control of this devastating disease.


Introduction
African swine fever (ASF) is a severe viral infection that causes hemorrhagic fever in pigs, often leading to a high fatality rate of approximately 100% (1).According to the World Organization for Animal Health (WOAH) (2), ASF is a priority disease due to its significant health and economic repercussions for swine producers and government disease control agencies.ASF infection is caused by the African swine fever virus (ASFV), which belongs to the genus Asfivirus, and can be transmitted through both direct and indirect pathways.Direct transmission occurs through contact with the live bodies or carcasses of infected pigs, while indirect transmission happens via contact with contaminated objects, such as feed, water, and needles (3,4).Some studies have utilized Geographic Information Systems (GIS) and remote sensing technologies to analyze disease spread in wildlife based on environmental factors and spatial data, confirming their potential role in monitoring and management (5)(6)(7)(8).These studies have indicated that ASF spread can vary according to spatial characteristics.
The ASF outbreak was first documented in Kenya in 1921 (9) and became endemic in some regions of Africa.Subsequently, it spread to Europe and South America, where it was mostly eradicated.However, in 2007, the virus was introduced to Europe through Georgia (10), leading to widespread transmission.In Asia, the first case was reported in China in August 2018, followed by occurrences in other Asian countries (11)(12)(13)(14).ASFV is typically classified based on pathogenicity into high, moderate, and low virulence.Highly virulent strains cause death within approximately 8 days, moderately virulent strains within about 20 days, and low virulence strains result in subclinical or chronic disease (4, 15).Chronically infected individuals play a crucial role in the long-term persistence of the virus, making early eradication difficult (16).
In the Republic of Korea, the first confirmed case was identified in September 2019 at a pig farm in Paju, Gyeonggi Province, and it has since continued to spread, primarily in the Gyeongbuk region (17,18).Between October 9, 2019, and May 20, 2024, approximately 3,555 cases have been reported, involving 40 domestic pig farms and 3,515 wild boars (19,20).Low virulence strains of ASFV have been predominantly identified in endemic regions such as Northern Europe and China.In China, the emergence of chronically infected individuals has been attributed to the production and use of illegal vaccines (21).Conversely, ASF outbreaks in the Republic of Korea have been confirmed to be caused by highly virulent strains, leading to death within 8-10 days post-infection (22,23).Based on this information, we consider that ASF-infected carcasses reveal the overall spread patterns of ASF infection in the Republic of Korea, though they may not perfectly reflect real-time infection trends.This assumption is supported by previous studies, which also utilized carcass data to identify risk clusters for ASF outbreaks in the Republic of Korea (24).
Another characteristic of ASF spread in the Republic of Korea is that direct transmission of ASF between wild boars and domestic pigs is relatively unlikely because domestic pigs are confined in enclosed pigsties within fenced buildings, such as intensive indoor housing (25).However, ASF transmission to domestic pigs is possible via objects associated with ASF-infected wild boar or human interactions (3,26).Accordingly, the Korean government implemented systematic and comprehensive interventions to prevent the spread of ASF, including reducing the density of wild boars, promptly disposing of carcasses, and installing fences around ASF-infection areas (27)(28)(29).To further preemptively block the spread of ASF between regions, extensive fencing was installed in six phases (stage 1, stage 2, …, stage 5-1, stage 5-2) from November 2019 to May 2022, spanning a total of 1,831 kilometers across 34 districts (3,17,30,31).Several studies have confirmed the importance of active preventive measures, including the installation of fences (29,32,33).Notably, a study by Lim et al. (32) showed that the third phase of national fencing decreased the infection pressure on individuals in neighboring habitats by 47% compared to the same geographical habitat.This finding supports the effectiveness of fencing in limiting wild boar movement and reducing ASF transmission.However, despite these interventions and achievements, outbreaks in pig farms and wild boar populations have not been completely controlled.The continued occurrence of ASF highlights the need for a deeper understanding of the mechanisms of disease spread, emphasizing the necessity of predicting and analyzing risk clusters to improve ASF control strategies.
To understand the mechanisms of ASF spread, many studies have been conducted on factors influencing disease transmission, including the environmental and geographical factors.An analysis of the impact of environmental factors on ASF outbreaks and control revealed that the presence of roads and rivers effectively reduces the transmission rate by approximately 37% on average (34).Additionally, wild boars living at altitude above 1,000 meters are difficult to control through hunting, and the probability of transmission is higher in certain forest areas, such as the Taebaek Mountains across the Republic of Korea and the Democratic People's Republic of Korea (32).In the present study, we aimed to analyze the patterns of ASF outbreaks using the geographical coordinates (longitude and latitude) of ASF-infected carcasses, along with spatial information such as forest area, slope, and altitude.By estimating the risk clusters for emerging ASF outbreaks, this study could provide alternative approaches for developing surveillance systems.

Materials and methods
We conducted a statistical analysis to identify risk clusters under surveillance for ASF outbreaks by investigating the continually evolving spreading patterns of ASF based on information on ASF-infected carcasses reported in the Republic of Korea from October 2019 to April 2023.Using Standard Deviation Ellipse (SDE) analysis and rank-based method, we aimed to identify regions at risk of ASF outbreaks.Figure 1 illustrates the statistical analysis approach used to identify risk clusters.We assessed the direction and variance of the spread using the SDE to comprehend the spatial transmission patterns at each time point.Furthermore, we constructed a generalized linear model (GLM) to compute the number of wild boar carcasses in the region for each period using the designated probability distribution.We developed a statistical model using a rank-based method to evaluate ASF risk.This model estimates the average weekly number of carcasses by district over time, allowing us to analyze and identify risk clusters.To mitigate the underestimation of risk and account for observation errors caused by undetected ASF-infected individuals, we estimated the number of ASF-infected animals and incorporated observation error by considering different surveillance intensities.

Data description
The spatial distribution of this surveillance data is illustrated in Figure 2A.The surveillance data on wild boar carcasses infected with ASFV from October 2019 to April 2023 in the Republic of Korea was The administrative boundary data in the Republic of Korea comprises approximately 250 districts and 17 regions.In this study, a "district" is defined as a city, county, and borough, while "area" refers to several cities.To analyze the trends in the spread of ASF, we collected information on the installation of extensive fencing and environmental data, including forest area (m 2 ), elevation (altitude), and slope data by district, all classified by administrative boundaries.Extensive fencing data, sourced from the Ministry of Environment (17), indicates that 1,831 kilometers of fencing were installed across 34 districts.To interpret the effectiveness of these fencing control measures, we categorized the areas with additional fencing installed annually from 2020 to 2022 and organized this along with the ASF-infected carcass data, as shown in Figure 2B.The ASF-infected carcasses data and environmental data for Gyeonggi and Gangwon regions are summarized in Supplementary Table S1.

Estimation of ASF-infected counts from ASF-infected carcass data
ASF-infected carcass information serves as useful indicators of infection spread and is therefore used as primary data.Many previous studies have utilized ASF-infected carcass data to understand the dynamics of ASF (18,36,37).However, due to potential underestimation of the outbreak scale caused by delays in carcass detection and the period between becoming infectious and death, ASF-infected carcass data may be insufficient to accurately reflect the actual spread of infection.Thus, we estimated the number of infected individuals (estI) over time and used this estimation as comparative data to the observed number of ASF-infected carcasses (obsC) to more accurately represent true infections that could impact actual transmission.
ASFV found in the Republic of Korea from 2019 to 2023 is typically highly pathogenic, causing death in infected pigs within a week (18,23).Infection experiments conducted in the Republic of Korea revealed that the estimated time to virus detection was 3.7-4.8days, the incubation period was between 3.4 and 5.2 days, and the time to death was 8.9 to 9.1 days (38).Based on these findings, we assume that the time from infection to carcass discovery follows a uniform distribution with a mean of 9 days and a range between 0 and 18 days.Throughout the entire time, we estimated estI by backcalculating from the obsC.This process involved estimating the time interval between the observation dates of obsC and the estimated infection dates, which were randomly generated from a Uniform (0, 18) distribution, allowing us to estimate estI over time.We compared the risk clusters identified using obsC and estI to verify differences in risk assessment.This comparative approach is expected to preemptively identify risk clusters and enable more proactive predictions compared to relying solely on carcass-based observations.

Spatial and temporal dynamics of ASF transmission
Observational data have revealed that ASFV infections are more prevalent during the winter and spring seasons, which coincide with the wild boar breeding season (39), contrasting with lower frequencies of infections observed in summer and autumn (32).Accordingly, we have stratified the year into two distinct seasons for analysis: the "high transmission season (HT season), " spanning from November to April, and the "low transmission season (LT season)." This division allows for a systematic investigation of the temporal heterogeneity and transmission dynamics of ASF.The distinction is supported by the observation that about 80% of ASF cases in the Republic of Korea occurred during the HT season from 2020 to 2022.Additionally, we categorized the 250 districts into two groups: epidemic districts, where ASF cases have been reported, and non-epidemic districts, where no cases have been reported.This analysis particularly focused on the Gyeonggi and Gangwon regions, which account for over 80% of the carcass count.
We conducted statistical hypothesis tests to analyze temporal variations in carcass counts between HT and LT seasons and distributional differences of environmental factors such as forest area, elevation, slope between epidemic and non-epidemic districts.
First, we employed the augmented Dickey-Fuller (ADF) test to examine the stationarity of cumulative ASF-infected carcasses during the HT and LT seasons.The ADF test serves as a statistical tool to determine the stationarity of time-series data based on the null hypothesis that stationary data do not maintain constant statistical properties over time (40).This approach is particularly relevant for identifying temporal variations in ASF transmission rates, offering a statistical basis to assess the effects of seasonality on the spread of ASF.
Second, we applied non-parametric tests including the Ansari-Bradley and Mann-Whitney U tests to investigate distributional differences in forest area, slope, and elevation (altitude) between the epidemic and non-epidemic districts.Higher p-values from the Ansari-Bradley test indicate variability in dispersion patterns, while lower p-values from the Mann-Whitney U test highlight significant differences in central tendencies.These tests were selected because of their efficacy in managing the nonnormal distribution of data, thereby enabling a robust comparison of the variances and median values between the two district groups (41,42).
Through this methodological approach, we aimed to enrich the comprehension of the spatial and temporal analysis conducted in our study, subsequently providing insights into the unique transmission characteristics of ASF in the Republic of Korea, described within specific seasonal and geographical contexts.

Statistical modeling for estimating the number of carcasses 2.3.1 Standard deviation ellipse
The SDE is a spatial statistical tool widely used to describe the directional trend and dispersion of geographical features in spatial distribution, utilizing the longitudinal and latitudinal locations of ASF-infected carcasses (43,44).Historically, SDE has been used to analyze spatial dispersion and directional bias in Poland (45) and to investigate the directional trend and spread of Foot and Mouth Disease (FMD) in China (46).Building on these applications, we employed the SDE method using observed carcass data to identify annual and monthly changes in occurrence regions of ASF in the Republic of Korea.This method facilitates the creation of ellipses that capture spatial characteristics such as orientation, spatial dispersion, and directional trends of ASF outbreaks.
The outcomes of the SDE method include the lengths of longitudinal and latitudinal axes, their ratio, angle, and the center point.The lengths of the longitudinal and latitudinal axes quantify the dispersion in the east-west (horizontal) and north-south (vertical) directions, respectively, based on the variance of the obsC data used for measurement.Moreover, the angle of the ellipse, determined by the longer axis, indicates the principal direction of data spread, starting from north and moving clockwise.This reveals the direction in which the spatial dispersion pattern of the data tends.The SDE ratio, a longitude-to-latitude measure, indicates the ellipse's deviation from a circular shape based on the lengths of the longitudinal and latitudinal axes (44).A ratio between zero and one indicates a vertical dispersion tendency, whereas a ratio greater than 1 implies a horizontal dispersion tendency.As the axis lengths approach equality, nearing a ratio of one, the ellipse tends to resemble a circle, suggesting limited propagation in a specific direction.
The dimensions of these ellipses, represented by their long and short axes, are defined by the variances in longitude and latitude of the observed carcasses.The size of the ellipse, adjustable based on the carcass count, is governed by the variance in both dimensions (47, 48), and the angle of the ellipse is derived from the covariance between longitude and latitude.Using sigma (σ ) to denote the standard deviation for both longitude and latitude, a single sigma (1σ ) along each axis typically encompasses approximately 66.7% of all carcasses within the ellipse.Two sigma (2σ ) captures about 95.5%, and three sigma (3σ ) includes approximately 99.7% of the total carcasses (48,49).

Effect of the surveillance intensity on ASF-infected carcasses
Surveillance data on ASF-infected carcasses is collected through a government-implemented carcass collection policy, and this data is crucial for analyzing the response to the ASF outbreaks.Typically, surveillance and control policies, such as installing fences to prevent spread, are initiated in districts where ASF-infected carcasses are found (17).Considering the limited resources available for response policies, the intensity of response may vary according to the outbreak severity in different areas.It is expected that surveillance will be intensified in districts with a higher number of ASF-infected carcasses discovered.Conversely, districts with lower surveillance intensity are likely to have a higher number of undiscovered infected carcasses.This difference between the actual and reported number of infected individuals, defined as the observation error, can vary with the surveillance intensity.
To understand the impact of surveillance intensity, we conducted a scenario analysis focused on observation errors.We employed a spatial dispersion analysis of ASF spread using SDE analysis of ASF-infected carcass data collected during the observation period to delineate surveillance areas.Surveillance intensities were adjusted across different zones of the ellipse based on the standard deviation (sigma) setting.These zones included 66.7% of the data within the 1-sigma ellipse, 95.5% within the 2-sigma ellipse, and 99.7% within the 3-sigma ellipse, with surveillance intensity decreasing progressively from the 1-sigma to the 3-sigma ellipse.
We calculated the adjusted number of ASF-infected carcasses (adjC i t , ) in district i at time t from the obsC i t , through different observation error rates (ε %), determined by the intensities of surveillance, as follows: The obsC i t , are set according to designated observation error rates at each stage of surveillance intensity as follows: • (Strong intensity) In 1-sigma districts, the surveillance is conducted at maximum intensity with an observation error of 0%, implying that adjC i t , equals the observed number:

=
• (Intermediate intensity) In 2-sigma districts, the observation error is defined by a specific factor, adjusting the observed number to estimate the total number as: • (Low intensity) In 3-sigma districts, the observation error is twice that of 2-sigma districts: 1 2 100 ε • (Not implemented) Surveillance is not implemented outside the 3-sigma ellipse, where ASF occurrences are negligible, covering only 0.3% of the observation data.

Analysis of the generalized linear model
We employed a GLM to estimate the weekly number of ASF-infected carcasses across 250 districts in the Republic of Korea over time.The GLM extends the linear regression model to accommodate response variables following various probability distributions, such as Poisson, Negative binomial, and Zero-inflated distributions, by connecting them with the response variable through a link function, f ⋅ ( ) .In our model, we assumed the number of carcasses, Y it in district i at time t, adheres to these designated distributions.The GLM to estimate the number of ASF-infected carcasses (estC i t , ) in district i at time t , is formulated as: Where β β β β , , and represent the coefficients for each variable, and α is the intercept.The outcomes of the SDE analysis revealed spatial features that were previously unconsidered in the regression equation.The Distance variable represents an exponential decay of the Euclidean distance, thereby assigning higher transmission risk to closer districts.By incorporating these insights, we defined the Distance variable for district i at time t as Distance ) .
Here, d t ij ( ) represents the Euclidean distance between district i and the center point j of the SDE ellipse at time t.j is defined as the center point of the SDE ellipse using carcasses data from 2022 if t ∈ 2022 or using carcasses data from 2023 if t ∈ 2023.To capture any non-linear relationships between geographical distance and obsC, we also introduced the Distance 2 variable.The Season variable serves as an indicator variable to distinguish between HT and LT seasons, reflecting seasonal variations in ASF occurrence.Additionally, the Time variable is introduced to track the progression of weeks during the study period, starting from one and sequentially increasing, which helps incorporate temporal dynamics into our analysis (i.e., Time = 1, …, 16 for Late 2022 and Time = 1, …, 17 for Early 2023).By integrating these variables, our model captures the heterogeneity of ASF dynamics across different geographical regions and over time.This comprehensive approach enhances our ability to predict ASF spread more accurately in the Republic of Korea.
This approach, supported by previous research (50), assumes that carcass counts follow designated distributions, with Poisson regression widely used in epidemiology for analyzing count data (51,52).However, the Poisson distribution's assumption of equal mean and variance may not always align with real-world data, prompting us to include models based on Negative Binomial (NB) and Zero-Inflated Poisson (ZIP) distributions.This expansion, inspired by prior research including studies on pig infection counts in Sardinia (51), aims to overcome the limitations of the Poisson model.Our analysis suggests that outcomes can vary depending on the model employed to estimate the ASF-infected carcass counts.This variability allows for a range of estimation outcomes influenced by the differential impact of diverse distributions on the coefficients of the independent variables.To facilitate this analysis, we generated the estimates from each distribution according to the estimated coefficients, thereby enabling a comparison.This approach offers a comprehensive analysis of ASF outbreak data using various distributions and provides a deeper understanding of the dynamics of ASF outbreaks.
This analysis was implemented using the R programming language.To fit the GLM models, we used the "glm" function from the "stats" package for Poisson distributions, the "glm.nb"function from the "MASS" package for NB distributions, and the "hurdle" function from the "pscl" package for a ZIP distributions.

Identifying risk clusters of ASF outbreaks
We conducted our analysis during the HT period to identify risk clusters for two periods: September to December 2022 (Late 2022) and January to April 2023 (Early 2023).When ASF occurrence in a specific area surpasses the GLM-based estimate, it suggests either randomness or a higher than expected level of occurrence, signaling the need for heightened attention to that area.However, since surveillance data is collected based on the discovery of carcasses of infected individuals rather than the realtime number of infected cases, identifying risk clusters by simply comparing the surveillance data with the estimated counts is less effective.Instead of focusing solely on carcass counts, we compared the rankings of areas with a high risk of ASF occurrence between observed data and estimates from GLM.Specifically, we ranked areas based on the obsC over a week to identify regions with high outbreak risk, while simultaneously ranking areas based on predicted carcass counts to determine expected risk districts.Furthermore, we used the Mann-Whitney U test, also known as the Wilcoxon rank-sum test, which is a non-parametric method for comparing the medians between two groups, to evaluate whether the actual rank is significantly higher than the estimates.Subsequently, we defined a risk cluster as one where the rank of the actual risk cluster is significantly higher than the rank of the Frontiers in Veterinary Science 07 frontiersin.orgexpected risk cluster.We elucidated the following procedures to identify risk clusters for ASF outbreaks, described in Figure 3: (i) Estimation of the ASF case counts using GLM: Given the observed number of ASF-infected carcasses (obsC i t , ), we estimated the number of ASF-infected carcasses (estC i t , ) in district i and week t , where i = 1, …, 250 and t = 1, …, 16 for Late 2022 and t = 1, …, 17 for Early 2023.Here, we assumed that ASF-infected carcass counts follow a designated distribution.To ensure the robustness of our estimates, we generated 10,000 random samples s s  Outline for detecting the risk clusters of ASF outbreaks. 10.3389/fvets.2024.1416862 Frontiers in Veterinary Science 08 frontiersin.orghigher than expected from GLM at least once.Additionally, areas where more than half of the time points within the given period are significant, resulting in a rank score is 8 or higher, are designated as the most likely clusters (MLC).

Statistical analysis of the spatial and temporal dynamics of ASF transmission
Figure 4 shows the obsC data for the spatiotemporal distribution of ASF occurrences across the Republic of Korea from November 2019 to April 2023.The figure highlights the temporal dynamics of ASF outbreaks throughout the observation period, distinguishing between the HT and LT seasons to verify ASF seasonality.Figures 4A,B provide a detailed account of the daily and cumulative number of cases, reporting 2,698 ASF cases.During the HT seasons, there were 2,139 discovered carcasses, approximately 3.82 times higher than the 559 carcasses identified during the LT seasons, as shown in Figure 4A.This discrepancy is further highlighted by a more pronounced increase in cumulative cases during the HT season, as depicted in Figure 4B.Moreover, Figure 4C illustrates that the monthly instances during the LT season were significantly lower compared to the monthly averages of 71 in 2020, 80 in 2021, and 73 in 2022, emphasizing a more rapid spread of ASF throughout the Republic of Korea during the HT seasons.
The ADF test was used to evaluate the stationarity between the HT season and the LT season based on the cumulative number of ASF-infected carcasses.The ADF test yielded a p-value of 0.9025, exceeding the threshold of 0.05, thereby not rejecting the null hypothesis of the test, which posits that the time-series data are nonstationary.This indicates that the mean or variance of the data on the ASF spread may vary over time.Consequently, through our comprehensive analysis of seasonality, we elucidated the temporal heterogeneity in the presence of ASF in the Republic of Korea.
The environmental data used to analyze spatial characteristics and potential risk factors by administrative districts include forest area data collected from the Korea Forest Service shown in Figure 5A, and elevation and slope data extracted using QGIS (version 3.26.2) shown in Figures 5B,C.Between 2019 and 2023, 81.01% of all ASF-infected carcasses were found in the Gangwon and Gyeonggi regions in the Republic of Korea.Figures 5D-F provide a monthly comparison of the forest area, elevation, and slope data between the epidemic and non-epidemic districts within the Gyeonggi and Gangwon regions.The data shows that epidemic districts exhibited significantly higher metrics in terms of forest area, elevation, and slope compared to non-epidemic districts.Correlation analysis further supported the geographical influence on the spread of ASF, as depicted in Supplementary Figure S1.There was a notable difference in the correlation coefficients between environmental factors and carcass counts when comparing all districts to the specific regions of Gyeonggi and Gangwon.For all districts, the correlation coefficients between carcass counts and elevation, slope, and forest area were 0.50, 0.46, and 0.48, respectively (Supplementary Figure S1A).However, these correlations were significantly higher in the Gyeonggi and Gangwon regions, with coefficients of 0.75 for elevation, 0.75 for slope, and 0.78 for forest area (Supplementary Figure S1B).
To quantify the distributional differences between epidemic and non-epidemic districts, we conducted the Ansari-Bradley test and the Mann-Whitney U test of non-parametric statistical tests.The higher p-values in the Ansari-Bradley test suggest variability in dispersion patterns, whereas the lower p-values from the Mann-Whitney U test point to concrete disparities in central tendencies.The results from the Ansari-Bradley test produced p-values of 0.371 for forest area, 0.095 for elevation, and 0.221 for slope, surpassing the threshold of 0.05, suggesting variance discrepancies across all environmental factors between the epidemic and non-epidemic districts.Conversely, the Mann-Whitney U test yielded p-values of 0.002 for forest area, 0.004 for elevation, and 0.005 for slope, all below 0.05, denoting significant distributional differences in the median values for the factors assessed.These findings highlight the distinct distributional differences of environmental factors between the epidemic and non-epidemic districts.Specifically, the forest area was highlighted as a critical factor in challenging the null hypothesis.This comprehensive analysis of environmental factors elucidated the spatial heterogeneity of ASF in the Republic of Korea and affirmed the interplay between geographical features and ASF distribution patterns.

Analyzing the southward trend of ASF using standard deviation ellipse
Figure 6A illustrates the annual directional trend and dispersion characteristics of ASF spread from 2019 to 2023, using the SDE method.Table 1 reveals that during this period, longitude-axis changes were minimal, ranging from 2.7709 to 1.3462, while latitude-axis exhibited a significant increase, from 0.5672 to 2.4875.This indicates a notable evolution from horizontal to vertical  Figure 6B displays the expansion of fencing areas from 2020 to 2022.As the discovery of ASF-infected carcasses gradually moved southward, the large-scale fencing was extended towards the south to block further spread.During the observation period, the positioning of the 2023 SDE ellipse directly above the fencing installed in 2022, indicates that these barriers effectively influenced the containment of ASF's southward expansion.This representation highlights the strategic placement of extensive fences in response to the shifting dispersion patterns of ASF, underscoring their role in mitigating the geographical spread of ASF.
We extended the SDE analysis by incorporating its results into a GLM analysis, assuming a designated distribution to estimate the number of ASF-infected carcasses across various districts during the observation periods.As detailed in Table 2, the analysis, based on a Poisson distribution GLM, used the dataset from 2019 to 2022 as the training set, enabling a comprehensive evaluation of the variability in ASF spread across districts.The analysis revealed that the coefficient for the Time variable was −0.005 (i.e., e − ≈ 0 005 0 9950 . .
), suggesting that the impact of Time on carcass count estimates was minimal.However, the Season variable showed a coefficient of approximately 1.9 (e 0 648 1 9117 . .

⊕
), indicating that during the HT season, the impact on carcass estimates was approximately double that of the LT season.The Distance variable, calculated as the negative exponent of the Euclidean distance derived from geographical coordinates, had a coefficient of 25.18.This signifies a substantial increase in carcass count as the distance decreased, confirming that closer proximity correlates with higher ASF-infected carcass detection.

Identifying the risk clusters of ASF outbreaks by using rank-based method
Due to obsC not reflecting the delay between infection and carcass discovery, it does not accurately represent real-time infection status.To address this, we estimated the estI based on obsC.Assuming a period of 9 days from infection to death, we set the infection date prior   to the date of obsC using a uniform distribution with a mean of 9 days.Supplementary Figure S2 illustrates the comparison between estI and obsC over time from October 2019 to April 2023.Subsequently, we applied a rank-based method incorporating the Poisson distribution to identify ASF risk clusters based on both estI and obsC, as depicted in Figure 7.This comparative approach facilitates more efficient identification of risk clusters compared to methods that rely solely on obsC.
We conducted an analysis to identify risk clusters for two distinct periods, Late 2022 and Early 2023, utilizing data from ASF-infected carcasses (Figures 7A,B) and estimates of infected individuals (Figures 7C,D).Risk clusters are marked in red on the map, with districts of higher rank scores shown in deeper shades, indicating a greater risk level.Districts achieving a rank score of 8 or above are classified as MLC and bordered with light green color.Additionally, we compared districts with high carcass counts to those based on rank scores, as detailed in Table 3 and Supplementary Figure S3.
In Late 2022, the risk clusters identified by obsC and estI were almost identical, particularly in the northwestern Gyeongbuk region, north Chungbuk region, and southwestern Gangwon region (Figures 7A,C).No districts met the MLC criteria using obsC (Figure 7A); however, based on estI, the Mungyeong district (A10) in the northwestern Gyeongbuk region was confirmed as an MLC (Figure 7C).In Early 2023, we observed a similar pattern with numerous risk clusters identified in the same regions as in Late 2022 (Figures 7B,D).Additional risk clusters were discovered in the northeastern Gyeongbuk region, northwestern Chungbuk region, and southeastern Gangwon region, leading to an expansion of identified risk clusters and MLCs.The eastward movement of wild boars in 2022 was also noted.Using obsC, only the Wonju district (A2) in the southwestern Gangwon region was identified as an MLC.In contrast, analysis using estI identified a total of five districts, including the northeastern Chungbuk region and western Gyeongbuk region, as meeting the MLC criteria.Although the identified risk clusters were generally similar between the two data sources, rank scores estimated using estI were observed to be 2-3 points higher on average than those based on obsC.This suggests that relying solely on carcass data might underestimate the actual risk level.Furthermore, although districts with high carcass counts did not always align with the risk clusters, four out of the top five districts with high carcass counts were included as top five risk clusters based on rank score in both Late 2022 and Early 2023 (Table 3).Additionally, the district with the highest incidence was classified as an MLC.These results underscore that the identified risk clusters, which require significant monitoring, consistently included districts with high incidence rates throughout the observation period.
To enhance the robustness of our analysis, we explored outcomes based on obsC across various distributions: Poisson, NB, and ZIP, as illustrated in Figure 8. Analysis for both Late 2022 and Early 2023 revealed common risk clusters across all three models, with no districts meeting the MLC criteria.However, the rank scores of districts varied among the models.Furthermore, these clusters demonstrated statistical significance based on the rank-based method, marking them as potential risk clusters for disease outbreaks and thus primary targets for surveillance.This consistency across models underscores the robustness of our risk cluster estimates.This comprehensive method of comparing results across different distribution-based models enriches our understanding of ASF outbreak dynamics.It distinctly showcases the strength of risk cluster identification derived from varied distributional assumptions and analytical methods.
In previous analyses (Figures 7, 8), it was assumed that intensities of surveillance were constant across all affected areas.However, considering the limited resources, it is plausible to adjust response intensities based on observed severity in different districts.Therefore, we designated surveillance intensity based on the concentration of discovered ASF-infected carcasses as a criterion for severity.We adjusted carcass counts accordingly and used these estimates to identify risk clusters, as depicted in Figure 9.
Surveillance zones for each period were delineated using the SDE method.The surveillance area was segmented into three zones based on sigma settings, with intensities assigned as strong, intermediate, and low (Figures 9A,B).We analyzed risk clusters and MLC with an observation error rate of 25%, as shown in Figures 9C,D, and found the results generally similar to those from analyses with uniform surveillance intensity at an observation error rate of 0% (Figures 7A,B).Even when the observation error rate was increased to 50%, hardly any significant differences were observed in the identified risk clusters (Figures 9E,F).Most districts with high carcass count and elevated risk levels were situated in areas of strong intensity (1-sigma districts), mitigating the impact of observation errors.As a result, the top-ranked risk clusters remained stable across all settings, demonstrating that the identification of risk clusters is reliable, even with considerable observation errors in districts with lower surveillance intensity.
Additionally, most of the lower-ranked risk clusters were located in districts with low intensity (3-sigma districts).The comparison of the bottom five risk clusters based on rank scores is detailed in Supplementary Table S2.Across settings with observation error rates of 0, 25, and 50%, some differences in rank scores were observed.However, during the Late 2022 period, the lower-ranked risk clusters were consistently identified across all settings.In Early 2023, four out of the five districts identified at a 0% observation error rate were also included in the bottom five risk clusters at the 25 and 50% error rates.This confirms that even significant observation errors lead to relatively minor differences in identifying lower-ranked risk clusters.Consequently, despite varying error levels, no significant differences were observed in the identification of risk clusters, underscoring the robustness of the risk identification process under different surveillance intensities and error conditions.

Discussion
The continuous nationwide spread of ASF has led to large-scale pig disposals, raising concerns within pig-farming communities and imposing significant burdens on the livestock industry, public health, and the environment.The absence of vaccinations, effective treatment methods, and the risk of infection from ASF-infected wild animals and carcasses makes controlling and preventing ASF challenging.To mitigate these challenges, the Korean government has installed broad fences as a control measure against the spread of ASF (3), though their effectiveness is limited to certain regions (34).
Understanding the spatial distribution of the spread, as informed by previous incidence data, can improve the effectiveness of containment strategies.The prevalent practice of intensive livestock farming in the Republic of Korea (25), which minimizes contact between farmed and wild animals, underscores the pivotal role of wild animal movements in the spatial transmission of ASF.We analyzed surveillance data on the carcasses of wild boars infected with ASFV in the Republic of Korea from October 2019 to April 2023.Our study aimed to analyze the spatiotemporal distribution of ASF in the Republic of Korea and identify the risk clusters of ASF transmission.We aimed to develop a statistical modeling approach to understand the spreading patterns of ASF and identify risk outbreak areas, thereby identifying crucial regions for surveillance intensities.
Initially, we used the SDE method to quantify the annual spatial dispersion and directional trends of ASF outbreaks based on their geographic locations (Figure 6).Subsequently, we developed a GLM that incorporates the distance of observed districts from the center of the ellipse, along with seasonality and time as explanatory variables.Employing a Poisson distribution, this model calculated the average number of carcasses.Subsequently, we ranked the districts based on these calculations and compared these rankings with the actual carcass counts using a rank-based method to identify risk clusters.To more accurately reflect the real-time infection status, we implemented a comparative approach, analyzing risk cluster identification based on both estI and obsC (Figure 7).Furthermore, we enhanced the robustness of our findings by comparing results derived from Poisson, ZIP, and NB distributions to ensure that our findings were not confined by the assumptions of any single distribution model (Figure 8).Given the constraints of limited resources, it is possible to adjust response intensities based on the severity observed in different districts.In response to this, we estimated adjusted carcass counts accordingly and used these estimates to identify risk clusters (Figure 9).This comprehensive approach allowed us to identify risk clusters from various perspectives, thereby enhancing the robustness of the results.
Our study revealed several key findings regarding the spread of ASF in the Republic of Korea, highlighted by three main points.First, we noted both temporal spatial variations in the ASF outbreaks.Our analysis showed that the number of ASF cases was approximately 3.82 times higher during the spring and winter seasons (HT season) compared to the summer and autumn seasons (LT season; Figure 4).This finding aligns with previous research (53) indicating that ASF occurrences tend to be more prevalent during relatively colder seasons in the Republic of Korea.By confirming temporal heterogeneity through the ADF test, we have statistically validated that the spread of ASF varies significantly across different seasons.Furthermore, spatial analysis indicated that 81.01% of ASF occurrences were concentrated in the Gyeonggi and Gangwon regions, highlighting significant spatial disparities between epidemic and non-epidemic districts (Figure 5).This emphasizes the impact of geographical factors on the spread of ASF.Second, we observed spatial dispersion trends between 2019 and 2020, initially showing a predominantly horizontal eastward trend, then shifting to a vertical southward dispersion in 2021 and 2022.This shift, along with the trend towards a dispersion ratio close to one by 2023, suggests a potential stabilization of ASF spread southward.The dispersion patterns correlated with the habitats of wild boars and major mountain ranges in the Republic of Korea, such as the Taebaek and Sobaek Mountains (54).This suggests that the spread of ASF is influenced by natural geography and the movement of wild boars.The observed stabilization from 2022 to 2023 is speculated to result from various control policies implemented in the Republic of Korea, including the installation of extensive fencing systems (30, 54).Third, we identified risk clusters for ASF outbreaks using statistical models, including the Poisson, NB, and ZIP models.Remarkably, the identified risk clusters aligned with the geographic locations of the Republic of Korea's major mountain ranges, as corroborated by spatial dispersion trends.Additionally, we estimated the number of infected individuals to aid in identifying risk clusters.While there were no significant differences in the districts designated as risk clusters, the number of districts classified as MLC differed.Furthermore, we compared the identification of risk clusters based on adjusted ASF-infected carcass counts across varying observation error rates (0, 25, 50%).The results confirmed that risk cluster identification exhibited no significant differences across all scenarios.The findings provide insights to enhance the effectiveness of surveillance strategies and control measures.Specifically, they suggest the potential for developing customized preventive strategies based on seasonal and regional variations, thereby improving the efficiency of resource allocation for disease control and providing evidence for targeted intensive control measures to effectively suppress the spread of ASF by identifying risk clusters.These insights underscore the utility of statistical models in enhancing preemptive actions such as wildlife population control, carcass removal, and the installation of extensive fencing, all of which are currently implemented in the country.This study has several limitations.Due to the challenges in collecting real-time data on ASF-infected wild boars, our analysis was restricted to surveillance data derived from wild boar carcasses, as observed in previous studies (36,37).This reliance on surveillance data may lead to potential risk underestimation compared to actual occurrences.Therefore, our approach focused on understanding dispersion trends and identifying potential risk clusters rather than predicting the exact number of carcasses in real time.Unlike previous studies that have largely focused on predicting risk clusters rather than estimating carcass numbers (53), our research incorporates environmental factors, such as forest area, distinguishes time intervals into HT and LT seasons, and analyzes risk clusters, marking a novel approach in the field.
Furthermore, the inability to determine the exact moment of infection based on the timing and location data of the carcasses limits precise outbreak analysis.Recognizing these constraints, previous research has concentrated on assessing the risk of infection.However, by applying the GLM and rank-based score methods, our study goes beyond merely identifying regions with a high number of carcasses by pinpointing areas with significantly higher occurrences than average, based on the discrepancy between actual data and model estimates.The application of various probability distributions, including Poisson, NB, and ZIP distributions, to identify risk clusters demonstrates the importance of a comprehensive analysis for understanding ASF dynamics.Additionally, we estimated ASF-infected cases from surveillance data and compared these with identified high risk and caution areas to verify underestimation due to risk estimation based on ASF-infected carcasses.This approach highlights the significance of identifying areas with significantly higher risk potential than expected, offering a nuanced understanding of ASF spread, and contributing to more effective disease management and prevention strategies.
Another point is the enhancement of efficiency and accuracy in identifying risk clusters through the application of artificial intelligence (AI) methods.In the previous study by Orusa et al. (55), the use of AI methods in conjunction with GIS and remote sensing technologies demonstrated the potential of geospatial deep learning AI to process complex datasets and provide actionable insights for wildlife disease monitoring and management.By leveraging AI, it is possible to develop more robust and adaptable disease management strategies that can respond to the evolving dynamics of diseases such as ASF.This integration enhances the efficiency and accuracy of identifying risk clusters, thereby improving the effectiveness of targeted surveillance and control measures.Such an approach will be instrumental in advancing our understanding of ASF dynamics and enhancing our ability to implement timely and effective interventions.Despite these limitations, the application of multiple statistical models and the integration of environmental factors, such as forest areas and seasonal variations, mark an alternative approach in the field.Our findings underscore the significant influence of temporal and spatial heterogeneity on the spread of ASF in the Republic of Korea, highlighting the intricate relationship between ASF dynamics, geographical features, and the role of wild boar habitats and movement patterns.Identifying risk clusters for targeted surveillance and control measures is crucial, contributing to more effective disease management and prevention strategies.This study offers insights into the strategic planning of surveillance and control measures, aiming for a more targeted approach for managing ASF outbreaks.

Conclusion
This study analyzed the spatial and temporal heterogeneity of ASF in the Republic of Korea using surveillance data and revealed an annual southward propagation pattern.The angles of the SDE ellipse in 2021 and 2022, denoted as 109.5758and 193.6745, respectively, indicate an annual southward propagation pattern.However, the ratio of the SDE ellipse in 2023, which was 0.9819, indicates that the southward movement was suppressed in 2023.This could be interpreted as an effect of measures such as the installation of extensive fences in certain areas during that year.We introduced a new statistical model that allowed us to predict the average monthly number of carcasses per district.We successfully identified risk clusters with significantly higher ranks based on observed ASF-infected carcasses compared to areas with high ranks based on the estimated ASF-infected carcasses.This study contributes significantly to the epidemiology and dynamics of animal infectious diseases by emphasizing the The impact of surveillance intensity and observation error on the identification of ASF risk cluster (A,B) Surveillance zones set up using SDE method; these zones are stratified into three levels of intensity: strong (1-sigma), intermediate (2-sigma), and low (3-sigma), to reflect the severity of ASF infection based on carcass counts.(C,D) Outcomes of the risk cluster when an observation error rate of 25% is applied.(E,F) Outcomes of the risk cluster when an observation error rate of 50% is applied.Districts marked in red indicate risk clusters, with deeper shades signifying higher rank scores.The districts with a light green colored border are identified as the most likely clusters (MLC).
importance of spatially concentrated risk clusters.By providing crucial data for the efficient allocation of disease management and preventive measures, this study lays the foundation for improving ASF management strategies.

FIGURE 2
FIGURE 2 Spread trend of ASF outbreaks from SDE. (A) Five different colors indicate the ellipses obtained from SDE by year.Points, representing the observed locations of carcasses for each year, are colored to match the corresponding year's ellipse.(B) Each colored area outlines the regions where metropolitan fences were installed each year, corresponding to the same color scheme used for the ellipses.

FIGURE 4
FIGURE 4 Temporal dynamics of ASF outbreaks in the Republic of Korea.(A) The daily cases of ASF-infected carcasses observed from 2019 to 2023.(B) The cumulative cases of ASF-infected carcasses observed from 2019 to 2023.(C) The observed instances of ASF-infected carcasses in the Republic of Korea are presented annually from 2020 to 2022.Bars in blue represent the LT season, while red bars represent the HT season.The red dashed line represents the average number of ASF-infected carcasses for that year.

FIGURE 5
FIGURE 5 Spatial distributions of environmental factors and their proportional distributions in epidemic and non-epidemic ASF districts.(A) Spatial distribution of forest area (in m 2 ).(B) Spatial distribution of elevation (in m).(C) Spatial distribution of slope (in °).Each visualization represents an area approximately 667 km vertically and 542.46 km horizontally.(D-F) Proportional distribution of forest area, elevation, slope in epidemic and non-epidemic districts, respectively.The shaded area represents the histogram of the data, and the solid line represents the density curve.

FIGURE 6
FIGURE 6 Spatial distribution of ASF-infected carcasses in the Republic of Korea.(A) Points indicate yearly observed ASF-infected carcasses from 2019 to 2023.(B) Areas where fences were newly expanded from 2020 to 2022.Red dots indicate observed ASF-infected carcasses.

FIGURE 7
FIGURE 7 Identification of risk clusters and the most likely clusters for ASF using obsC and estI.(A,B) Risk clusters based on observed ASF-infected carcass data (obsC) for Late 2022 (A) and Early 2023 (B).(C,D) Risk clusters based on estimated infection data (estI) for Late 2022 (C) and Early 2023 (D).Districts marked in red indicate risk clusters, with deeper shades signifying higher rank scores.The districts with a light green colored border are identified as the most likely clusters (MLC).

FIGURE 8
FIGURE 8The results of a rank-based approach on different distributions.(A,C,E) show the results for the risk clusters from September to December 2022 (Late 2022), in row order according to each model.Similarly, (B,D,F) represent the estimates for the risk clusters from January to April 2023 (Early 2023), in row order according to each model.The colors of the dots signify monthly carcass counts.The districts with a light green colored border are identified as the most likely clusters (MLC).

FIGURE 9
FIGURE 9 .Then, we could determine the significance by examining the p-values.If the p-value is less than 0.05, it is concluded that the actual carcass rank is significantly higher than the estimated carcass rank.(iii) Risk clusters and the most likely clusters: The rank-based score (rank score) is defined such that if the Mann-Whitney U-test result is significant at the 0.05 level of significance, we assign a score of 1 to district i at time t.For Late 2022, there are 16 time ( ) .Subsequently, we calculated the median rank for each district based on the ranks assigned to all samples, with the median rank for district i and week t denoted as R i t ,  .In other words, the ranks generated for each sample in district comparison for district i and week t, resulting in a total of 10,000 ranks r i t , 1 ( ) , …, r i t , < points, and for Early 2023, there are 17 time points.Therefore, the maximum rank score for each district  is 16 for Late 2022 and 17 for Early 2023.Here, we defined the risk clusters as those where the rank score is greater than 1.In other words, a district with a rank score greater than 1 indicates that at least one time point was significant, indicating that the actual risk rank was

TABLE 1
Geographical information of Standard Deviation Ellipse by year.
*Ratio indicates the longitude-to-latitude ratio.

TABLE 2
The result of Poisson regression.*Signifies that the variable is statistically significant within the Poisson regression model, as indicated by the coefficient of variable yielding p-value below 0.05. *

TABLE 3
Comparison of top five risk clusters based on data for observed ASF-infected carcass counts (obsC) and estimated ASF-infected individuals (estI) with districts of high carcass counts., A2, …, A12 denote the location information for each district, as indicated in Supplementary FigureS3.(Late 2022: September to December 2022, Early 2023: January to April 2023).(•) denotes the rank score.*Most likely cluster. A1