Modeling nation-wide U.S. swine movement networks at the resolution of the individual premises

The spread of infectious livestock diseases is a major cause for concern in modern agricultural systems. In the dynamics of the transmission of such diseases, movements of livestock between herds play an important role. When constructing mathematical models used for activities such as forecasting epidemic development, evaluating mitigation strategies


Introduction
The swine production industry of the United States (U.S.) is one of the largest in the world, with an estimated total market value of $26.3 × 10 9 (USDA, 2019).In addition, pork and pork products constitute the largest non-vegetable agricultural export of the U.S., with a market value of approximately $7.7 × 10 9 (FAS, 2021).Of major concern for such large agricultural systems are the outbreak risks of transboundary animal diseases (Clemmons et al., 2021) or other highly transmissible infections with the potential to rapidly spread throughout a livestock population and cause large economic consequences.For instance, economic modeling estimates the annual cost of production losses in the U.S. due to porcine reproductive and respiratory syndrome at $664 million (Holtkamp et al., 2013).Another study estimates that a hypothetical outbreak of classical swine fever within the U.S. swine production system could have a direct economic impact of between $2.6 et al., 2020).Such models commonly utilize data from empirical studies, previous outbreaks, and expert opinion together with information about the underlying population of agricultural premises to capture how transmission occurs over different pathways.Examples of potential avenues of transmission include close contacts between animals, contact between animals and wildlife reservoirs (Siembieda et al., 2011), contact between people and animals (Ribbens et al., 2007), contaminated feed or other fomites (Dee et al., 2014), windborne spread with aerosols (Mikkelsen et al., 2003), and more (Clemmons et al., 2021).The relative significance of these transmission pathways varies depending on the pathogen of interest, and all pathways are not relevant in all situations.However, one additional transmission pathway that is commonly of importance regardless of pathogen is the movement of live animals between premises.Livestock shipments facilitate spread, possibly across very large distances, by risking the unintentional introduction of infected animals into naive herds where transmission can easily occur through close contact (Brooks-Pollock et al., 2015;Gates and Woolhouse, 2015).Consequently, to effectively manage long-distance livestock disease spread, a solid understanding of livestock shipment patterns is critical, and retrospective analyses of previous high consequence outbreaks reinforce the notion that knowledge of livestock shipment patterns is critical for developing data-driven, national level policies for disease control and prevention (Ferguson et al., 2001;Keeling et al., 2001;Kao et al., 2007;VanderWaal et al., 2018).For this reason, it has become increasingly common for governmental agencies charged with animal disease prevention to maintain livestock shipment databases which can be used to analyze the premises-to-premises contact structure and inform outbreak preparedness plans, or, in the case of an outbreak, trace transmission back to its source (Caporale et al., 2001;Mitchell et al., 2005;Stanford et al., 2001;Ammendrup and Füssel, 2001).However, due to concerns about privacy, confidentiality, and allocation of costs, no comprehensive, mandatory database or tracing system for livestock shipments currently exists in the U.S. (Blasi et al., 2009;Anderson, 2010) which forces modelers to find other sources of data in order to achieve these goals.
Two major such alternative sources have been used in research on domestic swine in the U.S.: shipment data requested directly from the industry, and information collected during mandatory veterinary inspection in connection with the shipments.In the U.S., swine production is heavily vertically integrated, with individual operations, tightly connected to or owned by large pork production companies, specializing in particular life stages of the animals (Reimer, 2006).These companies maintain private databases of movements for internal use, which, if willingly shared, can be used by researchers.In such cases, the shipments among the operations that are part of the production company has commonly been treated as its own isolated network (Kinsley et al., 2019;Lee et al., 2017;Passafaro et al., 2020;Machado et al., 2019;Amirpour Haredasht et al., 2017;Galvis et al., 2022;VanderWaal et al., 2020).This approach is, due to the vertical integration of the industry, easily justifiable.However, when considering that the studies mentioned includes at most fewer than 2500 premises out of the more than 60,000 swine premises of the entire U.S. (USDA, 2019), it becomes clear that there is also a larger picture to consider.
The second source of swine shipment data consists of certificates of veterinary inspection (CVIs).When moving livestock across state borders for purposes other than slaughter, U.S. federal law requires that an accredited state veterinarian examines the animals for communicable disease (C.F.R, 2021).If the animals are healthy, the veterinarian issues a CVI that is stored with the individual origin and destination state authorities.
Both of these sources of swine shipment information suffer from the issues that there is no single point of access, and no standardized way of storing the information kept in these databases.This means that much effort must be allocated to acquiring and processing the data (e.g.Buhnerkempe et al., 2014;Gorsich et al., 2019;Cabezas et al., 2021) before analyses can even begin.Another issue is that neither data source gives an unbiased picture of the shipment patterns.For data based on CVIs, shipments that occur within states are not captured at all; for privately owned data, only the shipments to and from premises associated with specific companies will be included.Furthermore, access to private data relies on the willingness of the owner to share it, which may be a barrier in some cases.
One way to manage the issues of incomplete data is to construct a model of the shipment network itself, informed by the data that is available, and extrapolate from that model what the unobserved contact structures look like.Few such studies that concern the U.S. swine shipment network exist, but the approach was taken by Valdes-Donoso et al. (2017) who applied machine learning techniques to scale up shipment data collected from the swine production premises within two Minnesota counties to a much larger network covering 34 counties.Similarly, Moon et al. (2019) used publicly available county-level information on the number of swine premises and the corresponding number of sales to estimate the probabilities of shipments occurring between the sub-populations of premises in different size classes within the state of Iowa.
Both of these studies focus on a subset of the entire population of the U.S. swine industry, but similar work with cattle has demonstrated that it is feasible to construct detailed synthetic shipment networks that cover the entire conterminous U.S by scaling up shipment data gathered from CVIs.This previous work on cattle networks has resulted in the US Animal Movement Model (USAMM)-a Bayesian Markov chain Monte Carlo model capable of simulating detailed cattle shipment networks with complete geographical coverage (Buhnerkempe et al., 2014;Lindström et al., 2013;Brommesson et al., 2021).Together with the U.S. Disease Outbreak Simulation model, USAMM has been used to simulate the spread of foot-and-mouth disease within the entire U.S. cattle population of over 850,000 premises (Tsao et al., 2020).
Due to lack of the necessary swine shipment data, USAMM has until recently existed only as a model for cattle shipments, but here we present a version of USAMM constructed specifically to model domestic swine shipments in the U.S.This new development is made possible through the recent collection of national-scale samples of two types of interstate swine shipment data.The first type of shipment data consist of CVIs collected from a subset of eight origin states for the years 2010-2011 (Gorsich et al., 2019), and resembles what has been used in previous model versions for cattle.The second data set consists of shipments registered according to the requirements of swine production health plans-agreements between swine producers and state animal health authorities that enables movement of swine across state borders without the need of a CVI (C.F.R, 2021).This development of USAMM for swine enables new possibilities of investigation into the spread, prevention and control of infectious livestock disease, where movements of animals between premises play a big role.
Here we describe the new swine USAMM version, and use the model to analyze one year of the CVI shipment data together with the swine production health plan shipment data set.Based on this analysis, 250 complete U.S. premises-to-premises swine shipment networks are simulated which we contrast to both the training data and the additional year of CVI data in a validation process.We make these simulated networks publicly available to download from http://dx.doi.org/10.25675/10217/235130,and accessible for exploration in the form of an interactive online web interface at https://webblabb.github.io/usammusdos/shiny.html and it is our hope that these networks will find use among the swine disease modeling research community and elsewhere.Further, to highlight how detailed information about animal shipments helps in the analysis of disease dynamics on large spatial scales, we also construct a rudimentary predictive model for the countylevel prevalence of porcine epidemic diarrhea virus (PEDv) into which we incorporate the simulated networks.

Overview
A Bayesian statistical model of the data-generating process behind the U.S. domestic swine trade shipment network was constructed.The aim was to enable the up-scaling of incomplete data to predict premises-to-premises shipment networks (i.e.excluding slaughter shipments) covering the entire contiguous U.S., while capturing spatial and temporal variation.The network model was fit to data sets of nonslaughter shipments with a Markov chain Monte Carlo approach using a combination of Gibbs sampling and Metropolis-Hastings methods.Two types of whole-year, interstate, shipment data were included, each with incomplete spatial coverage but with full coverage temporally across the year.The network model was defined at the scale of individual premises, but the data sets only included reliable information about the origin and destination counties, not about the exact physical origin and destination premises.Therefore the network model was constructed to make inferences about the identities of the premises involved in each shipment.This made the model more complex than it would have been if inference was at the level of counties, but at the same time enabled predictions to be made about the effect of the premises' herd sizes on the number of shipments sent and the shipments' respective sizes.Model parameters were defined spatially at the national, state, county and premises level; and temporally at the level of quarter.A subset of the parameters were defined separately for the two types of shipments to allow for diverging shipment patterns, while others were shared between the shipment types.After fitting the model to the data sets the resulting posterior parameter distribution was used to simulate 250 complete shipment networks.The simulated networks were subsequently validated against the two training shipment data sets, as well as one set of shipments from an additional year.

Shipment data
In the U.S, domestic swine that are being moved across a state border for any purpose except slaughter requires inspection by an accredited veterinarian prior to being shipped in order to ensure that the appropriate state and federal animal movement regulations are being followed.One possibility is that the veterinarian issues a certificate of veterinary inspection (CVI) detailing, among other things, the origin and destination counties, the number of swine being moved, and the shipping date.However, when shipments occur regularly between two premises within a shared swine production system, an agreement can be set up that allows interstate shipments without CVIs.This requires the premises involved to establish a program designed to detect health related issues, such as communicable disease, within the herds through regular on-site veterinary inspections (swine production health plans, C.F.R, 2021).When such agreements are in place and ratified by animal health authorities of the states involved, shipments between the particular premises do not require CVIs, but shipment information is still collected and stored for tracing purposes.In this work we refer to shipments within production systems with swine production health plan agreements in place as health plan agreement (HPA) shipments.Three sets of shipment data were used in this study, two of which were based on collected CVI records from different years, and one consisting of records of shipments made over HPAs.
The two sets of CVI data are described in detail by Gorsich et al. (2019) and consisted of a sub-sample of all the export records for a subset of states during the years 2010 and 2011.The proportional size of the sub-sample in relation to all collected CVIs of each particular origin state was known and is denoted  CVI  , and was  CVI  = 0.3 for all origin states  that were included.For the year 2010 the states were California, Iowa, Minnesota, North Carolina, New York, Texas and Wisconsin; for 2011 the states were the same with the addition of Nebraska.Some shipments were missing critical information such as date of occurrence or shipment size, while some had inconsistencies with how the data had been sampled, showing either that the shipment occurred within the state, or that it originated from a state not included in the sample.Such shipments were removed from the data set, and  CVI  for each state adjusted accordingly.After removing 87 such shipments from the data set, it encompassed 1,654,640 animals, 3171 shipments and counties; while after removing 87 shipments from the 2011 data set, it encompassed 1,701,604 animals, 3446 shipments and 787 counties.The 2010 CVI data was used for model fitting, while the 2011 data was used for out-of-sample validation of the model prediction.This choice was made as the addition of Nebraska in the 2011 CVI data set allows not only validation against data from another year, but also validation against a completely unobserved origin state.This provides a spatial component to the comparison which would not be possible with the 2011 as training data.
The HPA data set consisted of all of the intestate HPA shipments leaving California, Michigan, Iowa, New York, North Carolina and Wisconsin, as well as all the interstate HPA shipments entering Iowa in 2014.Thus, for these states we define  HPA,out  = 1.0, and for all other states ,  HPA,out  = 0.0.Similarly, for Iowa we define  HPA,in  = 1.0, and for all other states ,  HPA,in  = 0.0.As this data set did not consist of a sub-sample, additional information involving shipments to and from states not included in the sample was acquired when the origin was any of the states in the sample or the destination was Iowa.For instance, since 100% of the HPA shipments to Iowa were observed, the data included all shipments to Iowa even from states for which  HPA,out  = 0.0.In total, the HPA shipment data included 15 states that were associated with incoming or outgoing (or both) shipments.For every shipment, the HPA data listed the origin and destination counties, the number of animals moved and the production system within which the shipment occurred.The production systems are enterprises consisting of multiple physical production sites connected by ownership or contractual relationships (C.F.R, 2021).HPA shipments are not allowed between premises of different production systems, meaning that the HPA shipment data consists of a number of subnetworks, each associated exclusively with one production system.In addition, the HPA data provided limited information about premises identity in that each operation listed as the origin or destination in the HPA data was associated with a unique id number.This id number could not be used to determine the exact identity of the premises in relation to the premises demography data (see below), but did allow inference about which shipments were connected to the same physical operation.We refer to these id numbers as health plan agreement (HPA) herd identifiers.In other words, while each CVI shipment is completely independent of other CVI shipments, the HPA shipments come in groups of one or more incoming and outgoing shipments, all tied to a single HPA herd identifier.In the following sections the HPA herd identifier is denoted , and the production system to which it belongs is denoted .For the same reasons as mentioned in the previous paragraph, as well as because of missing information about which production system they belonged to, a total of 307 HPA shipments were dropped from the original HPA data set, and the relevant  HPA,out  and  HPA,in  were corrected accordingly.The final HPA data set covered a total of 7,107,524 transported swine, 8317 shipments, 16 production systems, 2163 unique HPA herd identifiers and 3135 unique pairs of HPA herd identifiers with one or more shipments sent between them.A total of 15 states and 145 counties were represented in the HPA data.

Premises demography data
The model requires information about the swine premises population and its respective herd size (i.e.number of animals) distribution within each county.The only comprehensive nationwide U.S. swine premises data set available is provided by The National Agricultural Statistical Survey census of agriculture (NASS, USDA, 2014a).This data-set consists of county-level distributions of the number of swine Limits of herd size bins used in the model.The herd size assigned to a premises that fall into the respective bin is denoted ℎ and constitutes the midpoint of the interval between the lower and the upper limit rounded to the closest multiple of 5 using two significant digits.The last column indicates the number of premises in the demography data within each size bin.premises in a set of fixed herd size categories.For counties with few enough premises that there is a possibility to identify specific premises, NASS censors the size data in order to protect privacy (USDA, 2014b).Therefore, instead of using the NASS premises data directly to inform the model, we used premises data predicted by the Farm Location and Animal Population Simulator (FLAPS, Burdett et al., 2015).FLAPS uses the NASS data together with various geographical variables to simulate realistic premises size distributions at the county level.FLAPS also simulates the spatial location of premises, but for USAMM, the exact location of premises is not needed, and we used FLAPS only as a tool to inform premises size distributions for counties where data was missing due to censoring.We denote the set of all individual swine premises in this data as  data .In total, 62,974 premises with a joint herd size 66,015,003 swine in 2882 counties across all 48 states of the conterminous U.S. were included in  data .The data provided by NASS did not include any information about which premises are involved in HPA shipments, so the number of such premises in each county had to be inferred from the HPA shipment data which included unique HPA herd identifiers associated with each shipment.In order to connect the premises of the demography data with these HPA herd identifiers, premises were sampled from the appropriate county's premises distribution and assigned to the HPA herd identifiers as a part of the model.
The premises' herd sizes in FLAPS are based on the NASS size distributions, but are simulated as exact numbers, but for computational feasibility of the model the herd sizes needed to be assigned to discrete size bins.For this, the bins used by the model were the same as those defined in the NASS census.However, the largest NASS size bin is defined as ≥1000, but FLAPS predicts many premises with significantly larger herd sizes.Therefore, three additional size bins were arbitrarily defined and used in the model.All size bins were defined by an interval, and the premises size used by the model was defined by the midpoint of the interval rounded to the closest multiple of 5 with two significant digits.For the largest interval, defined as [10, 000, ∞] and lacking a defined midpoint, the average herd size (18,751) of all premises that fell into this interval was used for this purpose instead.A summary of the size bins used in the model, and the number of premises in each of them, is shown in Table 1.

Temporal trends in the U.S. swine trade network
As the shipment and demography data sets all correspond to different years between 2010 and 2014, changes in the demography over this period has the potential to increase uncertainty in the model predictions.In order to evaluate if this was something likely to impact the results, NASS state-level survey data (USDA, 2022) for swine inventory and in-degree of swine shipments, as well as NASS census data (USDA, 2014a) for the number of swine operations was inspected for temporal trends over the relevant time period.

Industry covariates
The effect that the origin and destination counties had on the shipment rate between premises was included in the model in the form of county-level measures related to the swine production industry.We refer to these measures as county covariates or simply covariates.They are vectors where each element represents the value of a covariate for one specific county.The selection of the covariates was made from a set of ten candidates taken from the NASS agricultural census of 2012 (USDA, 2014a) using the Quick Stats tool (https://quickstats.nass.usda.gov).To this set was added the number of swine premises in the county according to the demography data, for a total of eleven potential covariates.Two sets of covariates were selected, one to explain the effect on shipment rate when a county was the origin and one set when a county was the receiver of a shipment.
The selection process started out with using linear regression to find which of the potential covariates that best explained the variation in incoming and outgoing swine shipments in the training data (CVI plus HPA shipments) at the county level.The one with the highest regression coefficient was picked and the remaining set was analyzed for correlation with the selected covariate.The remaining potential covariates which were found to have a correlation coefficient ≥ 0.9 were removed as they were deemed to explain largely the same variation as the selected covariate.This process was repeated by then selecting the covariate with the second highest regression coefficient from the resulting set, analyzing the correlation between the newly selected covariate and all the remaining ones, and dropping the covariates that had a correlation coefficient ≥ 0.9.This was repeated until no more covariates could be dropped based on the correlation condition.The two sets of covariates (origin and destination effect on shipment rate, respectively) that resulted from the selection process were identical, so in the end only one common set was used for both incoming and outgoing effect on shipment rate.The final selection consisted of: the number of breeding inventory at breeding facilities (BI), the total inventory of premises with a production contract (PI), the number of operations with production contracts (PO), the number of operations with sales (SO), and total number of swine according to the demography data (NS).In order to put them on the same comparable scale and facilitate model interpretation, each covariate was standardized to have a mean of 0.0 and a standard deviation of 0.5 (Gelman, 2014).The vector of standardized covariates of each county  is referred to as () henceforth.For a full list of the initial set of covariates and further detail of how values censored in the NASS census were handled, see Appendix S1.

Overview
The model treats the number of shipments   in quarter  between each pair of origin premises  and destination premises  as either a Poisson distributed random variable in the case of CVI shipments, or as a gamma-Poisson distributed random variable in the case of HPA shipments.The size (in terms of animals sent) of every shipment is modeled as a gamma-Poisson distributed random variable.The parameters that govern these probability distributions are in turn functions of various model parameters and properties of the origin and destination premises and the counties and states that they reside in.The properties related directly to the premises included herd size (i.e.number of animals at the premises), which required that the identity of the origin and destination premises of each shipment were known.However, only the origin and destination counties were known from the data and therefore the actual identity of the origin and destination premises of each shipment were treated as model parameters and were sampled from the available premises population of the origin and destination counties of the shipment.
S. Sellman et al.Brief inspection of the shipment data sets revealed some extent of temporal variation in the number of incoming and outgoing shipments across the quarters of the year present in all three data sets.However, no clear trend was apparent, with the variation itself varying substantially between states and depending on if the state was the receiver or sender of the shipment.In order to capture any systematic temporal variation present in the shipment data that was too subtle to detect by this cursory inspection, and make the driving factors behind it more clear, many of the model parameters were defined separately for each quarter of the year.
The assumption was made that the two types of shipments (CVI and HPA) were mutually exclusive in the sense that a premises sending or receiving one type will not be involved in sending to or receiving from premises associated with the other type.In the case of CVI shipments, each shipment was considered independent of any other CVI shipments, and one premises could be associated with multiple CVI shipments.The HPA shipments, however, came in subsets of one or more outgoing or incoming (or both) shipments associated with one single physical operation, given by the unique HPA herd identifier  in the HPA shipment data.Which entry in the premises demography data this HPA herd identifier was associated with was unknown, but which county and production system it belonged to was known.Thus in order to determine the origin and destination premises of HPA shipments, premises from the demography data were sampled and associated, not with single shipments, but with HPA herd identifiers during the model fitting process.Further assumed is that HPA shipments were only sent within the same HPA production system, , and no shipments were sent between different production systems.For convenience we refer to those premises associated with CVI shipments as CVI premises and those associated with an HPA herd identifier as HPA premises.
As a consequence of these assumptions, one single premises within a county can in theory be the sole sender and receiver of all CVI shipments going to and from that county.This means that a county that receives or sends multiple CVI shipments is feasible within the model as long as there is at least one premises in that county that is not associated with an HPA herd identifier.However, since a premises can only be associated with either one or more CVI shipments or at most one HPA herd identifier, for counties where the number of HPA herd identifiers exceed the number of premises (plus one for counties with CVI shipments) no solution where all shipments are assigned to a premises will exist.With the combination of premises demography data and shipment data the model was fit to, this situation occurred for twelve counties in total.In order to be able to assign a premises to every shipment, a total of 212 additional premises, not part of the original premises demography data set were added to these counties.These additional premises were given an unknown herd size that was treated as a model parameter and estimated during the fitting process.We refer to this set of unobserved premises as  unobs , and the full set of premises, , is the union of the set of premises from the demography data and the set of unobserved premises,  =  data ∪  unobs .

Number of shipments
The number of CVI shipments  CVI  in quarter  between each pair of origin premises  and destination premises  that were not associated with HPA herd identifiers was modeled as a Poisson random variable with the expected number of shipments (i.e.rate)  CVI  .For shipments between premises associated with an HPA herd identifier within a particular swine production system, , the number of shipments  HPA  was modeled as a gamma-Poisson random variable with production system-specific shape parameter  HPA  and the expected number of shipments between the premises  HPA  , and (2) The rationale behind modeling the number of HPA shipments as a gamma-Poisson distribution is based on the expectation that the shipment distribution among the premises within a given production system will be characterized by high loyalty between few specific pairs, while most pairs interact very little or not at all.This can be thought about as each pair of HPA herd identifiers having its own associated random effect parameter,  HPA  , which scales the expected number of shipments as  HPA  ûHPA  .However, this random effect is not modeled explicitly in the parameter estimation step, but is instead indirectly included through an overdispersion of  HPA  specific to each production system (via the parameter  HPA  ) which captures this variability.The means of the two distributions,  CVI  and  HPA  , were both functions of a set of model parameters dependent on  and  themselves, as well as the counties and states that the premises belonged to, and In the above equations,  CVI  and  HPA  are baseline premises-to-premises shipment rates for CVI and HPA shipments in quarter .The parameters  CVI,out    and  HPA,out    are the effects of the origin state   on the CVI and HPA shipment rates, while  in    is the effect of the destination state   on the shipment rates and is shared between both types of shipments.Through the parameters  out and  in , the variable  controls how the shipment rates scale with the herd sizes of the two premises as where ĥ. is the logarithm of a premises' herd size, normalized by the average herd size across all premises, ĥ = ln The counties in which  and  are located have an effect on the expected number of shipments sent between the premises through five county-level covariates related to the swine production infrastructure.Each county   of a premises  is associated with its own vector of covariates (  ) (departing from the usual index notation to avoid multiple levels of subscripts).The covariates are associated with two vectors of weights that control the influence of the covariates in quarter ,  out  and  in  .Which one that is used depends on if the premises  is the origin (out), or the destination (in) of a shipment.The total effect of the covariates in the expressions for the expected number of shipments between two premises (Eqs.( 3) and ( 4)) is given by and Finally, the expected number of shipments is given a distance dependence through a monotonically decreasing dispersal kernel function on the form where  is the shipment distance in meters, and  and  are the shape and scale parameters of the function.The function has its maximum value of 1.0 at  = 0. To give the parameters associated with the distance kernel a more straightforward interpretation we redefine the function so as to be expressed with parameters  and .The first of these parameters describes the distance in meters at   and  HPA    ) shipments.

Shipment size
The size   of each shipment  (i.e.number of animals) was modeled as a random variable from a shifted gamma-Poisson distribution defined with parameters shape  and mean .The same approach was taken for both CVI and HPA movements, but each with separate sets of parameters.Since a shipment realistically will always contain at least one animal, the distribution was shifted by one to the right in order to constrain the minimum shipment size to ≥ 1.The expected size  CVI  of a CVI shipment between  and  in quarter  is a function of the herd sizes of the two premises, ĥ and ĥ ; two herd-size scaling parameters,  CVI,out  and  CVI,in

𝑞
; and an intercept  CVI  , The shipment size is given by And equivalently if  is an HPA shipment

Likelihood function
Here we outline the likelihood function which was later used in the parameter estimation step.The likelihood function gives a measure of how likely the observed data is given the model parameters, and was formulated conditional on that the identities of the origin and destination premises  and  were known.This was not the case in the shipment data, but this assumption is made here, and in Section 2.7 we show how the premises identities were sampled as model parameters in order to make such a likelihood formulation possible.Denoting observations informed by the shipment data by an asterisk, we define a data point as the set of individual shipments that are observed to pass from premises  to premises  in quarter  as  *  , and the number of shipments of a data point as  *  = | *  |.We note that  *  may be equal to zero, but that the CVI and HPA shipment data sets naturally only contain data points for which  *  > 0. However, all the data points for which  *  = 0 also constitute observations in our model formulation and are included under our definition.The probability of observing one data point with exactly  *  number of individual shipments  ∈  *  , each with the size  *  in the shipment data is given by Here, the cases represent the probability of observing a data point given two premises that both either lack any association with an HPA production system (  =   = 0), or two premises that are part of the same HPA production system (  =   ≠ 0).Shipments between two separate production systems are not allowed in the model, which is represented by the third case.Following Eqs. ( 1) and ( 2), the first S. Sellman et al. factors in the first two cases of Eq. ( 14), P CVI ( *  |) and P HPA ( *  |), are given by the probability mass functions of the Poisson and gamma-Poisson distributions (Appendix S3), respectively.However, as only a fraction of the total number of shipments are included in the data, the mean number of shipments sent,  CVI  and  HPA  , must be adjusted by multiplying with  CVI  and  HPA  to correct for the probability that an interstate shipment that occurred was also actually sampled (see Section 2.2).Thus the number of shipments that are actually observed to move from premises  to  in quarter  in the shipment data is modeled as where the shape parameter of the gamma-Poisson distribution is the swine production system specific parameter  HPA  , while  CVI  and  HPA  are given by Eqs. ( 3) and ( 4).Finally, the factors P CVI ( *  |) and P HPA ( *  |), which represent the probabilities that the observed shipment  have exactly the size  *  , are given by Eqs. ( 11) and ( 13).In the case when  *  = 0, the product over these factors will reduce to an empty product equal to the multiplicative identity (1.0).
Given the above, the probability of observing the entire combined CVI and HPA interstate shipment data set  * is the product across the probabilities of all individual data points, where the condition   ≠   indicate that data points where both premises are in the same state  are excluded to reflect that the data sets consist exclusively of interstate shipments.

Prior distributions
In order to facilitate the choice of prior distributions, the model parameters were, as far as possible, defined to be easy to interpret.Many prior distributions were defined using ranges given by the 2.5th and 97.5th percentiles, rather than the more commonly seen parameters (such as shape, scale, etc.).This makes the choice of prior distributions easier by reducing the choice of prior (or hyperprior) parameters to a range within which one can be fairly certain that the value of the parameter of interest lies, while still allowing more extreme values if the data strongly contradict the choice.We use the notation  2.5th and  97.5th to indicate these percentiles.For a summary of the parameters and their corresponding prior distributions and hyperprior distributions see Table 2.
The parameters  out  ∈ (−∞, ∞) and  in  ∈ (−∞, ∞) control how the expected number of shipments between two premises scale with the premises' herd sizes, as   ∝ ĥ out  ĥ in  .The herd sizes ĥ are the logarithms of the original herd sizes divided by their mean so that an average-sized premises will have ĥ = 0.0.We expected larger herd sizes to be associated with higher shipping rates and define the priors for  out The elements of the vectors  out  and  in  that control the weight of each individual swine industry covariate, were each given a normally distributed prior with  2.5th = −2.0 and  97.5th = 2.0.This gives equal possibility for the covariates to either have a strong positive or a strong negative relationship with the expected number of shipments between premises.
The parameters of the distance kernel, the state-level random effects and the HPA producer-specific shape parameters were assigned hierarchical prior structures where the parameters of the prior distributions were themselves estimated by the model.Each of these hyperparameters were assigned a hyperprior which is detailed below.This approach allows the data-rich states to inform the parameters for states with little or no data via the common prior distribution.
The scale parameters of the distance kernel functions,  CVI  and  HPA  , represents the distance in meters, , at which the kernel function has dropped by half of its value when evaluated at  = 0 for CVI and HPA shipment respectively.These parameters shared a common lognormal prior distribution defined with hyperparameters mean,   , and coefficient of variation,   .The between-centroid distance of the two counties farthest apart in the contiguous U.S. is approximately 4.6 × 10 6 m, which sets an absolute upper limit to the shipment distance.At the lower end, we do not expect shipments shorter than one kilometer to be common.Given these assumptions   was given a lognormal hyperprior distribution with  2.5th = 1.0 × 10 4 and  97.5th = 4.0 × 10 6 .The hyperprior of   was defined in terms of the ratio between the 97.5th percentile and median of   .When this ratio is high, there is a large variation in the prior distribution, and subsequently in the parameter   between the various states and quarters.The reverse is true when the ratio is low.Given this, we defined the hyperprior of   to be lognormal with  2.5th = 0.365 and  97.5th = 1.724 which correspond to the ratios 2 and 10.
Similarly, for the shape parameters of the distance kernel functions,  CVI  and  HPA  , lognormal prior distributions with hyperparameters mean,   and coefficient of variation,   were used.The shape parameter of the kernel function is defined as the ratio between   and the distance at which the function has fallen to 5% (see Section 2.6).To ensure a wide range of possible shapes of the kernel function, a lognormal hyperprior distribution defined with  2.5th = 2 and  97.5th = 1.0 × 10 3 was assigned to   .For the hyperparameter   the same approach was made as for the hyperprior of the kernel function's scale parameter and a lognormal hyperprior distribution with  2.5th = 0.365 and  97.5th = 1.724 was used.
The random effects on the state level,  CVI,out  ∈ (0, ∞),  HPA,out  ∈ (0, ∞) and  in  ∈ (0, ∞), were each assigned a gamma distribution with mean = 1.0 and shape hyperparameters  out and  in , respectively as prior.Given that the parameters themselves are unitless, this approach easily allows a state's weight in a given quarter to be interpreted in relation to an average state of any quarter (for which the parameter will be 1.0).The hyperparameters  out and  in were themselves given uninformative half-Cauchy distributed hyperpriors with scale = 1.0.

Sampling of origin and destination premises
Neither the CVI or the HPA data sets contain information that allows the identity of the origin or destination premises to be tied to a specific premises in the demography data set with any certainty.Instead, the exact identity (in terms of the identities in the demography data) of the origin and destination premises   and   for each shipment  were treated as model parameters and sampled from the populations of available premises in the corresponding county (which is known).The approaches used differed between CVI shipments and HPA shipments.
For each CVI shipment  ∈  CVI , both the origin premises   and the destination premises   were Gibbs sampled directly from the distribution of available premises, conditional on the current model parameters  and the current assignment of origin and destination premises to all other shipments.This was done once every model iteration, first for   conditional on the current   , and then for   conditional on the current   .Any premises that was not already associated with an HPA herd identifier was considered available for sampling as origin or destination premises of a CVI shipment, regardless of whether the premises was already associated with other CVI shipments or not.For a single CVI shipment, , occurring in quarter   , denoting the subset of all premises that lie in the county (  ) that  originates from as P(  ) and the equivalent for the premises of destination county (  ) as P(  ) , the premises were sampled according to probabilities proportional to the weights and where   is the shipment size, P(  |., .,   ) is the gamma-Poisson PMF following Eq.( 11) and  CVI .,., is given by Eq. ( 3).The process of sampling premises associated with HPA shipments was approached differently.Instead of sampling particular premises for single individual shipments, for premises associated with HPA herd identifiers, which were possibly associated with multiple incoming and outgoing shipments at once, we implemented a shuffling algorithm.For each HPA herd identifier , a premises within the same county was selected with which the premises currently associated with  would switch shipments with, and by extension the HPA herd identifier was assigned a new premises.Given a premises , currently associated with , and the set of shipments connected to , a weight was calculated for each premises  within the county of  (including  itself) representing how likely a solution was where  became associated with the shipments of  and  became associated with the shipments of  in relation to the current solution.Using these weights, a distribution conditional on  and all the current shipment associations of all premises was constructed where each discrete outcome represented that  would switch shipment associations with another premises.A sample was then drawn from this distribution and shipment associations were changed according to the result.This procedure was repeated for all HPA identifiers  each model iteration.For further details see Appendix S4.

Sampling of unobserved premises sizes
The herd size ℎ  for each one of the 212 unobserved premises  ∈  unobs was sampled directly, every model iteration, from a discrete probability distribution of possible herd sizes.The distribution was constructed separately for each  conditional on the current parameters and incoming and outgoing shipments that were associated with  following The different possible outcomes ℎ * ∈  consisted of the binned herd sizes given in Table 1, and the sets  out  and  in  consisted of all the outand incoming shipments that were currently associated with premises  and with destination and origin premises  and , respectively.The factor P () (ℎ * ) was a probability distribution which followed the relative herd size distribution of the observed premises in the state (), and was used as a state-specific prior on ℎ.In essence, Eq. ( 19) express the probability, according to the model formulation, of observing the shipments associated with  if the herd size of  was ℎ * , weighted by the prior on ℎ * .For a full account of how the sampling distributions were constructed, see Appendix S5.

Estimating the joint posterior distribution and assessing model convergence
Together with the likelihood function defined in Section 2.6.4,Markov chain Monte Carlo methods were used to approximate the posterior parameter distributions.The only parameters that could be sampled directly from their conditional distributions were the individual elements of the vector  CVI .For these parameters a Gibbs-sampling algorithm was used to update each element every model iteration.For all other parameters, direct sampling was not possible as the conditional distributions lacked standard forms, and the Metropolis-Hastings algorithm (Hastings, 1970) was used.All proposals for the Metropolis-Hastings updates were made exclusively from univariate or multivariate normal distributions with adaptive scaling of the covariance matrices as described in Garthwaite et al. (2016).
The number of model iterations that could be run was bounded by limits imposed by the computer cluster available to the authors.This allowed at most 168 consecutive hours of run time.Within this time, 190,000 model iterations could be completed, out of which the first 25,000 were discarded as burn-in.Due to high autocorrelation, this does not directly translate to the same amount of independently distributed samples from the joint posterior, and therefore multiple MCMC chains were run.To reduce the autocorrelation within each chain and reduce the final amount of data, only every 100th iteration was kept, meaning that each chain finally consisted of 1650 samples from the posterior distribution.To determine how many chains were necessary, the effective number of independent samples,  ef f , was calculated as described in Gelman et al. (2004) and additional chains were run until  ef f ≥ 250 for all parameters.To satisfy this criterion, 18 separate chains had to be run which were then pooled into one final posterior distribution from which samples were drawn to simulate networks.
The model was implemented in C++, compiled with GCC 7.3.0and parallelized using OpenMP.The parameter estimation and the network simulation were performed on the 32-core compute nodes of the Tetralith High Performance Computing cluster at the National Supercomputer Center (NSC) in Linköping, Sweden.

Simulation of networks
The model was used to simulate 250 networks of inter-and intrastate CVI and HPA shipments.These synthetic networks consisted of a predicted 100% sample from all states with swine premises.Each network was based on a single joint random sample from the posterior parameter distribution.The simulation procedure was to first simulate the number of shipments  CVI  as a Poisson random variable with rate  CVI  given by Eq. ( 3) for every combination of premises  and  not associated with an HPA herd identifier in the given posterior sample (and subsequently not associated with a production system, i.e.   =   = 0) for every quarter .Each of these shipments were then assigned a size by sampling from a gamma-Poisson distribution according to Eq. ( 11).
Similarly, for every combination of premises  and  with an associated HPA herd identifier and belonging to the same production system (  =   and   ≠ 0) the number of HPA shipments,  HPA  , was also simulated from a Poisson distribution as in Eq. (2).However, instead of sampling the shipment rate ûHPA  from a gamma distribution as in Eq. ( 2), a random effect unique for the particular premises pair and quarter,  HPA  , was sampled as where | HPA      | is the number of HPA shipments from HPA herd identifier   to   observed in the shipment data and  HPA  is given by Eq. ( 4).Subsequently, the number of HPA shipments between  and  in quarter  could be sampled as This way the assortative behavior among HPA herds is preserved so that particular combinations of origin and destination HPA herd identifiers that are observed to have high shipment traffic in the data will still be important nodes in the simulated networks.Conversely, combinations of HPA herd identifiers that are observed to have little or no traffic in the data despite being part of the same production system will contribute few shipments to the simulated networks.This sampling necessitates information about the number of shipments between each pair of HPA herd identifiers in addition to the posterior distribution of the model parameters in order to simulate shipments.Theoretically, the variable  HPA  could have been sampled in the analysis step rather than in the simulation step as a model parameter, and used when simulating rather than the shipment numbers.However, as the number of possible appropriate combinations of   and   is very large, and would have to be stored for every iteration, this approach quickly becomes unfeasible.Each HPA shipment generated this way was finally given a size by sampling from a gamma-Poisson distribution according to Eq. ( 13).

Validation
For the purpose of validation, the simulated shipment networks were down-sampled to match the data collection process.For CVI shipments this meant removing within-state shipments, removing all shipments originating from a state not included in the data sample, and taking a random (100 ×  CVI  )% sample of the remaining outgoing shipments from each state .For HPA shipments it meant taking a (100 ×  HPA,out  )% sample of the outgoing shipments and a (100 ×  HPA,in  )% sample from the incoming shipments of each state .These down-sampled shipment networks were then aggregated to the level of individual counties and states and compared to the 2010 CVI and 2014 HPA training data sets.For within-sample validation the simulated CVI and HPA sub-networks were then compared to the respective training data set at the network-level using a selection of measures and properties relevant for disease spread (Dubé et al., 2011).Nine measures were those used by Gorsich et al. (2019) in their original analysis of the swine CVI data: the total number of nodes, the number of undirected edges, the number of shipments, the undirected diameter of the networks, the size of the giant strongly connected component (GSCC), the size of the giant weakly connected component (GWCC), density, assortativity, transitivity and reciprocity.Together with these metrics we also report the total number of animals moved and the mean shipment size.Additionally, for an out-of-sample validation we compare the simulated CVI sub-networks to the additional CVI data set from 2011 in a similar manner.
The simulated networks were also compared spatially to the two sets of training data based on three different node-level network measures: betweenness centrality, in-degree and out-degree.Betweenness centrality is defined as the proportion of shortest paths between all node pairs that pass through a focal node (Freeman, 1977).The three node-level network measures were weighted by both the number of shipments and the number of animals, and the county-level median of the resulting six metrics across the 250 simulated networks were visualized using maps.
To provide an out-of-sample comparison, the simulated networks were also compared in a similar fashion to the additional 2011 CVI data set.The network analysis was done using the python interface to the iGraph C library (Csardi and Nepusz, 2006).

Using synthetic networks in a predictive disease model
To illustrate how the large-scale shipment networks simulated by USAMM can be useful in epidemiological applications, we analyzed the importance of shipments for explaining the prevalence of porcine epidemic diarrhea virus (PEDv) in U.S. counties.This was done by fitting a set of generalized linear models (GLM) that included various combinations of both shipment-and non-shipment related county-level variables to historical PEDv prevalence data.One GLM was constructed for every possible combination of five shipment-related explanatory independent variables in addition to a set of four null predictors included in all models, resulting in a total of 32 models.The dependent variable in all models was the number of observed cases (infected premises) in each combination of quarter and county, and was assumed to be binomially distributed with the population size parameter being equal to the number of premises in the county, and the probability parameter being fitted by the GLM (see Appendix S6 for details).These disease models were fit to a data set covering observations of PEDv in the U.S. over the period August 2014 to March 2018 obtained from the USDA Emergency Management Response System (EMRS).The null predictors were number of swine premises, number of PEDv cases in the previous quarter, state, and year.The shipment-related predictors were in-degree weighted by number of animals in the same quarter, in-degree weighted by number of animals in the previous quarter, betweenness centrality weighted by number of animals in the same quarter, proportion of incoming animals in the same quarter that arrived from a county with at least one infected premises, proportion of incoming animals in the previous quarter that arrived from a county with at least one infected premises.All the shipment related predictors were informed by the simulated USAMM shipment networks and were taken to be the mean for each county across all 250 full network replicates.The predictors state and year were treated as categorical variables and the remaining predictors were numerical variables and were standardized to have a mean of zero and standard deviation of 1.0.The GLM analysis was performed using the Python module statsmodels (Seabold and Perktold, 2010) and selection between the models was performed using the Akaike information criterion (AIC).Using the estimated coefficients of the model with the best fit according to AIC, a model prediction was made of the expected number of infected premises for the first quarter of 2018.In order to make a model prediction that covered all of the conterminous U.S., including states without observed cases (and thus no fitted coefficient for state effect), the mean state effect across the states with observed cases was used for states with no observed cases.This prediction was used as a way to validate that the GLM was capable of reproducing the observed spatial distribution of PEDv.Next, in order to illustrate how the shipment predictions can be used to inform infectious disease preparedness by quantifying risk, an additional GLM was constructed based on the predictors and coefficients from the GLM with the best fit.The goal with this risk-model was to create a map based purely on the contact structure and premises demography showing how these underlying factors affect the relative vulnerability to infection of each single county.Therefore, the risk-model was equal to the GLM with the best fit but excluding the counties' infection status in the previous quarter, the proportions of shipments arriving from infected counties in the same quarter and the proportions of shipments arriving from infected counties in the same quarter.This risk-model was then used to calculate the predicted number of infected premises in each county and the counties were subsequently ranked by setting the one with the highest number of predicted cases to 1.0 and expressing the risk of all other counties in relation to this maximum-risk county.

Posterior parameter distributions
Out of the parameters that affect the expected number of CVI and HPA shipments by scaling the baseline shipment rates ( CVI and  HPA (Fig. 1, left panel) up or down, very clear effects were seen for all industry covariates for the origin county ( out ) and two of the covariates for the destination county ( out , Fig. 3), as well as for the state-level random effect for the destination state ( in , Figure S1).Out of the industry covariates that had a clear effect on outgoing shipments, the effect of breeding inventory (BI) and number of operations with production contracts (PO) were positive, the number of operations with sales (SO) was negative, and total number of swine (NS) was negative in the first quarter and positive in the last quarter.For incoming shipments, SO had a negative effect while NS had a positive effect.There was also some differences in the individual estimates of the origin state-level effects for HPA shipments ( HPA,out , Figure S2).Apart from a few exceptions, little difference was evident between individual statelevel estimates for the distance kernel parameters ( CVI ,  HPA ,  CVI and  HPA ) and origin state-level random effects for CVI shipments ( CVI,out ).Interestingly, the estimates of  out and  in were only slightly positive meaning that there was very little effect of herd size on the expected number of shipments between premises (Fig. 1, right panel).
For most parameters, there was little variation between the quarters.The most temporal variation was seen for the covariate weights for origin counties,  in Fig. 3, and the state-level random effect on incoming shipments,  in  (Figure S1).Additionally some temporal variation was seen among the distance kernel parameters for CVI shipments from Texas (Figure S3 and Figure S4).
For functionally similar parameters that were defined separately for CVI and HPA shipments, many indicated a clear difference between the two shipment types.The parameters controlling shipment size showed that both the shape parameter () and intercept () for the mean shipment size function were higher for HPA shipments than for CVI shipments.This translates into larger and less variable sizes for HPA shipments compared to CVI shipments (Fig. 2, left panel).At the same time the HPA shipment size was nearly independent of herd size, while for CVI shipments there was a clear positive relationship between shipment size and herd size of both origin and destination premises via the parameters  out and  in (Fig. 2, right panel).
For the parameters related to the distance kernel, the scale parameter () for HPA shipments was substantially higher than the equivalent for CVI shipments.This is most clearly seen in the difference between hyperparameters  ,CVI and  ,HPA (Fig. 4, left panel) which is more than an order of magnitude, indicating that HPA shipments are sent over longer distances.At the same time there was little difference between the mean of the kernel shape parameters  between CVI and HPA shipments (Fig. 4, left panel), and the variation in both  and  as indicated by the coefficient of variation of the kernel priors was small (, Fig. 4, right panel).For detailed intervals of the kernel parameters related to each individual state, see Figure S3, Figure S4, Figure S5 and Figure S6 in the supplemental information.
There was also a difference between the shape hyperparameters of the gamma priors on state-level random effects on the expected number of outgoing shipments ( CVI,out and  HPA,out ).The shape hyperparameter for HPA shipments,  HPA,out , was estimated to be substantially lower than the CVI equivalent,  CVI,out , leading to a higher variation in the estimates for  HPA,out compared to  CVI,out (Fig. 5).For the individual state-level random effect parameters governing outgoing shipment rate, see Figure S7 and Figure S2.
A very large difference between the baseline shipment rate parameters  CVI  and  HPA  can also be noted in Fig. 1.However, these parameters are not comparable as the set of available premises pairs to send and receive CVI shipments consists of the entire population of premises that are not associated with a HPA herd identifier, while for HPA shipments the available destination premises consists only of those associated with an HPA herd identifier belonging to the same production system.

Validation of the posterior predictive shipment networks
Comparison of the synthetic networks simulated by the model fitted to the 2010 CVI, and HPA data sets were made separately and are summarized in Tables 3 and 4. The comparison revealed a tendency of the model to over-estimate the extent of the network in terms of the number of shipments of both types.However, despite this, it accurately predicted the sizes of individual shipments with average shipment sizes very close to that of the data.The over-estimation was more severe for the CVI subsets of the simulated networks, made evident from the fact that apart from mean shipment size, the 95% credible interval as predicted by the model only encompassed the point metric from the data set for diameter and transitivity in Table 3.For the HPA subsets of the networks, all intervals included the corresponding measure based on the data set, although variation was high, and the simulated median was often estimated towards the higher end of the interval Table 4.
The spatial distributions of both in-and out-degree of the 2010 CVI and HPA data sets were also well captured by the model (Figs.6 and  7).For CVI shipments, there was a slight tendency for the model to spread the number of transported animals out among more counties in the simulated networks compared to the data.The model also captured betweenness centrality well for CVI shipments, although due to the nature of the swine shipment network, where nodes tend to be either sources or sinks, most counties showed a betweenness centrality score of zero (gray in the figures).This tendency was even more pronounced for the HPA shipment network for which betweenness centrality was not found to be a meaningful measure due to the lack of nodes acting as both sources and sinks.The spatial distribution of HPA networks is presented at the level of USDA agricultural districts in order to preserve confidentiality of the premises and production systems involved.
The general trends of the shipment size distributions of the training data sets were also captured well by the model (Fig. 8, left and right panels).For CVI shipments, there was a slight overestimation of very small shipments, and for both CVI and HPA shipments there were specific peaks and valleys apparent in the data that the model fail to capture.
In order to determine how well USAMM trained to data of a specific year can predict patterns seen in a subsequent year, an out-of-sample comparison between the results of the predictive posterior simulations based on the CVI data of 2010 was made to the CVI data of 2011.Since the 2011 CVI data included Nebraska in addition to the states present in the 2010 CVI data, this state is also included in the downsampled simulated networks that this out-of-sample validation is based on.A comparison of network metrics is shown in Table 5.It can be seen that the over-estimation of the network extent is present here as well, but interestingly to a lesser degree.Once again the mean shipment size is captured well, but the network diameter, density, assortativity, transitivity and reciprocity of the data are also within the 95% credible intervals predicted by the model.For the frequency distribution of shipment sizes, the model predicts the sizes in the 2011 data set as well as for the 2010 data set (Fig. 8, center panel).

Temporal trends in the U.S. swine trade network
The time-series analysis of NASS survey data (USDA, 2022) showed that for most states there was no, or only slight changes in the total Fig. 2. Premises-level parameters controlling the expected shipment size for HPA and CVI premises pairs respectively: the intercept of the mean shipment size function (), the shape parameter of the shipment size gamma-Poisson distribution () and the herd-size scaling parameters () associated with the destination (in) and origin (out) premises.Whiskers show the 95% credible intervals around the median of the parameters' posterior distributions for each quarter.Fig. 3. Parameters controlling the weight () of individual swine industry covariates in the scaling of the expected number of shipments between premises (  ) on the side of the origin county (out) and destination county (in).Whiskers show the 95% credible intervals around the median of the parameters' posterior distributions for each quarter.The covariates are: inventory of breeding facilities (BI), inventory of premises with a production contract (PI), number of operations with production contracts (PO), number of operations with sales (SO) and total number of swine (NS).swine inventory and the number of incoming shipments (Figure S8 and Figure S9) during the time period of 2010-2014, into which the data used to inform the model falls.Likewise, in the data from the NASS census of agriculture (USDA, 2014b) for the number of swine operations per state, no clear general trend was discernible (Figure S10).However, the census is only performed every five years (1997, 2002, 2007, etc.), which makes the temporal resolution relatively coarse for this metric.For Iowa, the state with the largest swine population in the U.S., and consequently the most likely to influence model results, the total inventory increased by 3.7%, and the number of incoming shipments increased by 26.4% during 2010 to 2014.Census data for the number of operations are not available for those years specifically, but over the years 2007 to 2017 the average yearly relative change in the number of swine premises in Iowa was a reduction of 3.2% which, assuming a linear trend, translates to a reduction of 12.8% over the period of 2010 to 2014.

PEDv model
Out of all the combinations of PEDv predictors used in the GLMs, the set that included all predictors had the lowest AIC score at 12,664.2.This was 683.7 lower than the null model which had an AIC score of 13,347.9.The validation step of the PEDv model showed that the model could reproduce the spatial patterns of outbreaks seen in the data in the states of the Midwest in general and the high-density regions of Iowa, Minnesota, North Carolina in particular (Fig. 9.The smaller outbreak clusters seen in southern Kansas, Oklahoma and northern Texas, as well as in the south-west of Utah were not captured by the model. The relative (i.e.normalized by setting the highest county risk to 1.0) county-level PEDv infection risks predicted by the disease model are shown in Fig. 10

Discussion
A solid understanding of the contact structures of the underlying system is of great importance for successful epidemic modeling of livestock disease (Kao et al., 2007;Brooks-Pollock et al., 2015;Lindström et al., 2011).The movement of live animals between premises has the potential to have very large effects on the development of an outbreak (Fèvre et al., 2006).For this reason, many countries have implemented mandatory registration of livestock movements in national databases.Such databases can be used to map the shipment network, analyze it for risk structures, and develop outbreak mitigation strategies.Out of concerns for high economic costs and breaches in S. Sellman et al.

Table 5
Comparison of network metrics of the CVI subset of the simulated shipment network based on the model trained with the 2010 CVI data to the same metrics of the additional 2011 CVI data.GSCC is the giant strongly connected component, GWCC is the giant weakly connected component.privacy (Anderson, 2010), no such system has been implemented in the U.S. Consequently, no complete data set of the different U.S. livestock shipment networks exist.The first version of the U.S. Animal Movement Model (USAMM) was conceived with the goal of estimating the full beef and dairy cattle shipment networks (Lindström et al., 2013).Since then, a number of improvements to the original model has been published (Brommesson et al., 2021(Brommesson et al., , 2016) ) and USAMM predictions have been used to inform a variety of epidemiological studies (Beck-Johnson et al., 2019;Gorsich et al., 2018;Tsao et al., 2020;Kao et al., 2018) Here we have presented the first version of USAMM for modeling the U.S. swine shipment network.

County scale
For previous versions of USAMM, defined at the level of counties, it was not possible to include premises-level predictors.In this version, however, we take the same approach as in the latest iteration of USAMM for cattle (Sellman et al., 2021, preprint), and define the model at the resolution of individual premises.Consequently, this allows parameters to be defined that tie premises-level characteristics, such as herd size, to model predictions about number of shipments and shipment sizes.Although such an increase in resolution also requires a substantial increase in complexity, we are confident that this does not mean that there is a trade-off in other areas of the model estimates.Comparison of USAMM for cattle defined at the level of individual premises to a previous version of the model defined at the county level showed that the more complex model does as well as the previous version in regard to spatial distribution of shipments, with the added benefit that premises-level predictions bring (Sellman et al., 2021, preprint).Although no previous, less complex swine version of USAMM exists to compare to, we believe that the result from the comparison of cattle models can be extrapolated to this situation and indicates that the increased complexity does not detract from the predictive power of the new swine USAMM.
Just as previous iterations of USAMM, this new version is based around scaling up a sub-sample of interstate shipments informed from certificates of veterinary inspections (CVI) to shipment networks that cover the entire conterminous U.S.However, for the first time, US-AMM also makes use of an additional shipment data set collected within the context of swine production health plan agreements (HPA).Shipments over HPAs are different from CVI shipments in that they occur strictly between premises that are part of the same production system, with recurring shipments from one HPA herd identifier to one or a small subset of others, and possibly no shipments elsewhere.This has necessitated a novel way to model HPA shipments compared to how CVI shipments have been modeled in this and previous USAMM versions.We have attempted to capture the strong coupling that exists between specific pairs of HPA herd identifiers by fitting a random effect on the shipment rate of each such pair, directly scaling the flow of shipments between the premises of each specific pair.In the analysis, this random effect was captured by modeling the shipment rate between each pair of premises involved in HPA shipments as a gamma distributed random variable with a shape parameter specific to the production system that the pair belonged to.This shape parameter then controlled the skewness of the distribution of premisespair random effects, and consequently allowed the premises within different production systems to be more or less strongly connected with other specific premises.We believe that this strong coupling between certain premises is a very important characteristic in swine shipment networks, as it funnels shipments in certain directions which, in turn, can have a strong effect on disease outbreak dynamics.However, we also note that premises in the U.S. swine production industry generally are specialized into different stages of the production (Reimer, 2006;Key and McBride, 2007), an aspect of the system that is not explicitly modeled by USAMM.A logical consequence of this feature of the industry is that many premises-to-premises contacts are unidirectional in the sense that animals may move from one premises to another, but not necessarily in the other direction.For instance, shipments may flow from a premises producing farrowed pigs to a finishing herd, but the reverse flow is unlikely to occur.With the premises-pair random effect defined for HPA identifiers, this one-way behavior is also captured, albeit phenomenologically, as the random effect is directional (i.e. HPA  is distinct from  HPA  ).However, the types of premises themselves are not modeled explicitly, which precludes any inference about typespecific behavior which could be of interest.For premises not involved in HPA shipments, any such behavior is not captured at all by the model.Treating the premises' types explicitly would be a valuable development of the model, but such efforts are hampered by the lack of type information in the shipment and demography data used to train the model.
The within sample validation procedure, where the posterior predictions of the network model were compared to the training data, showed high accuracy in both the spatial and frequency distributions of animal-weighted in-and out-degree as well as betweenness centrality of the simulated networks (Figs. 6 and 7).The in-and out degree weighted by the number of animals sent and received are among the most fundamental node-level network measures and play a central role in network analysis as they reveal the key sources and sinks in the   network.Betweenness centrality is one of many centrality measures shown to be of importance in disease dynamics and targeting of control measures (Büttner et al., 2013).Spatial distributions of centrality measures and network hot-spots play a large role in outbreak dynamics and are features that impact disease surveillance and transmission modeling (Martínez-López et al., 2009a,b;Dubé et al., 2011).Further, the model also predicts the mean shipment sizes (Tables 3 and 4) and the general shapes of the shipment size frequency distributions (Fig. 8) of the data very well.As far as the authors are aware, no other model that predicts the size of swine shipments in the U.S. is available, which makes this a significant result.Good estimates of animal flows are useful in disease modeling, not only because more animals may translate into a higher risk of transmission, but importantly because they enable quantifying the costs of control measures such as movement restrictions.Consequently, with information about shipment sizes, policies can be formed which efficiently minimize both the severity or risk of an outbreak and economic losses.The good match for all of these metrics applied to both CVI shipments and HPA shipments, which means that USAMM is capable of predicting largescale network structures throughout the U.S. swine production industry that are critical from a disease control perspective.
For the part of the simulated networks that only consisted of HPA shipments, both the number of nodes involved and the number of unique edges was also well predicted, with the 95% credible intervals of in the simulated networks covering the respective measure of the HPA data (Table 4).However, the corresponding extent of the simulated CVI sub-networks was over-estimated by the model, particularly at the county level, with more nodes (i.e.counties or states) being involved in the inter-state trade compared to the CVI training data set (Table 3).A plausible explanation for this over-estimation of nodes and edges in simulated CVI sub-networks is the potential existence of a loyalty component in the actual trade network.Realistically, due to the way that the U.S. swine production industry is organized, with operations specializing in different life stages of the animals (e.g.sow farms, nurseries, finishing operations, etc., Reimer, 2006;Key and McBride, 2007), it is likely that there exists certain pairs of premises that have a particularly strong connection in terms of number of interstate shipments sent between them.As described above, such behavior is captured by USAMM for HPA shipments, but not for CVI shipments, meaning that simulated CVI shipments will be more spread out among the entire population of premises compared to if such a loyalty component exists and was captured accurately by the model.However, the information about premises identity that would be required by the model to quantify such an effect is not part of the CVI data set used to train the model, so a random effect on the CVI shipment rate similar to that implemented for HPA herd identifiers is not possible.Further, implementing such a random effect for premises involved in CVI shipments would be very challenging as the number of potential pairs between which CVI shipments are allowed is vast.This was feasible to implement for the HPA shipment component as the number of HPA herd identifiers is relatively small (2163) and the set of premises pairs that are viable as origin and destination for a shipment is much reduced by the assumption that the premises must be part of the same production company.However, for CVI shipments the entire premises population, minus any premises associated with HPA herd identifiers, make up the set of viable premises between which shipments are possible, and the number of random effects to be modeled would be close to 60, 000 2 which makes such an approach unfeasible.
An additional, related cause for the large number of nodes and edges in the simulated networks could be due to the fact that, apart from herd size, all premises are treated equally and there is no distinction between different types of operations in USAMM.Premises which specialize in certain stages of the pig production will likely be more prone to send or receive shipments that cross a state border than those that specialize in other stages, and thus be over-or under-represented in the data.If the types of the origin and destination premises were known from the shipment data, this could be incorporated in the model in such a way as to allow assortative mixing between the different combinations of types.Such a change to the model would have the potential to limit the set of available receivers or senders of a shipment within a county to premises of certain classes, with fewer nodes involved in the interstate network as a result.However, once again the data sources used in this study does not provide enough information to easily determine the types of premises involved in a shipment, or assign types to the premises in the demography data, which precludes such a solution.
One assumption that was made in the design of the model was that each single premises from the demography data ( data ) could at most be associated with one single HPA herd identifier.This resulted in the issue that in a total of 12 counties there were initially not enough premises to assign one to each HPA herd identifier.In order to make this possible, a total of 212 unobserved premises were added to the appropriate counties and their respective herd sizes were treated as a model parameter and sampled during the analysis.The reason for the mismatch between the unique identifiers in the HPA data and the premises given by the NASS data (which is the underlying source for  data ) can only be speculated about.One conceivable cause may be the fact that the HPA data is from 2014 and the NASS data originates from the 2012 NASS Census of Agriculture, and the demography may have changed during those two years.The census is performed every five years and the 2012 census lies closest in time to the HPA shipment data.
Another possibility is that the mismatch is due to sampling errors that can arise naturally in the census process (USDA, 2014b).Regardless of the reason, we argue that an addition of 212 premises is a very modest increase of only 0.34% in relation to the entire population of 62,974 premises in  data and that the addition is unlikely to have a perceivable effect on the results, while at the same time allowing the model to utilize all available shipment data.
The central issue that is addressed by USAMM is the lack of comprehensive livestock shipment data in the U.S.Here we have used modeling techniques that extrapolate from the limited data that is available to make predictions for the areas where data is missing.It is important to keep in mind, though, that while the model is sophisticated, the extrapolations are naturally influenced by the quality of the data and any biases that may be inherent to it.Two data sources were used to fit the model, both of which exist as an effect of regulations in the U.S. that requires livestock shipments crossing state borders to be registered in some manner.This means that ultimately the model results are based purely on patterns that can be determined from such interstate shipment data.Of course, for the model predictions of interstate shipments, this is not an issue, but a central assumption of USAMM is that the within-state shipment patterns can also be inferred from the between-state shipment data.This assumption becomes an issue if the driving factors behind within-state shipments and betweenstate shipments are different.For instance, it is entirely possible that there is are subtle practical or psychological barrier from the perspective of the individual swine producer that makes shipments out of state less likely compared to within-state shipments.Further, it is not implausible that the interaction between shipment rate and premises size is different for within-and between-state shipments (i.e.large premises may be more likely to ship out of state or vice-versa).However, without a set of data that describes within-state shipments, the uncertainty regarding these issues are impossible to quantify.If such a data set was available it could be used to validate the predictions of within-state shipments that USAMM produce, and in that way assess the amount of error resulting from this bias.Alternatively, it could be incorporated into USAMM as a third source of data that would help to inform within-state shipment behavior.
Another feature that sets within-state shipments apart from their between-state counterpart is that they, by their very nature, have a much more narrow distance distribution.If between-state shipments occurred only over long distances, and within-state shipments occurred only over relatively short distance, this would be a concern.However, even though between-state shipments have a wide distance distribution with a high upper limit, there is still an ample amount of between-state shipments in the data that only travel a short way.These short-distance interstate shipments inform the distance dependence component of USAMM of how to treat the relatively short-distance within-state shipments, and we do not currently have any reason to believe that short-distance interstate shipments and intrastate shipments over similar distances would have dissimilar distance-dependencies.
A further issue related to the shipment data to which USAMM was fitted was that the various data sets were not collected during the same time periods.The CVI data set used to train the model represents shipments that took place in 2010, the HPA data represents shipments from 2014, and the premises demography data and industry covariates were informed by the 2012 NASS agricultural census and survey, respectively.Ideally, all data sources would cover the same time period, and ideally the data that informs the model would be more recent.The NASS agricultural census and survey are recurring events (the census is performed every five years, and the survey of hog inventory every quarter), and are available for more recent years.But collecting and compiling new shipment data for more recent years would be a very work intensive endeavour, not least due to the fact that the sources are scattered throughout agencies of the individual states, and no standard way of storing the data is in place.Therefore it cannot be reliably expected that new such data will become available in the future, and even if it will, it is not certain that it will coincide temporally with a NASS census.However, inspection of recent temporal trends in the swine inventory, number of incoming swine shipments and number of swine premises at the state level (available through the NASS census and survey) reveal that the U.S. swine industry is fairly stable .This indicates that predictions based on previous data can still hold relevance for the situation today.Of course, year-to-year variation exists, and for some states (e.g.Alabama, Mississippi, Utah) relatively large changes have occurred in recent years.Further, even though the NASS sources generally indicate small temporal changes at the large scale, more subtle and undetectable changes in shipment patterns-for instance changes in the flux of animals between certain state pairs, or the distribution of shipments among counties within states-cannot be ruled out.Therefore it is encouraging that previous analyses have shown only modest difference between the two years of swine CVI shipment data that are available (Gorsich et al., 2019).This was corroborated by the out-of-sample validation performed here, which showed that the prediction of CVI shipments based on the 2010 shipment data matched the 2011 data to the same extent that it matched the training data, both with regard to the various network metrics (Table 5) and shipment sizes (Fig. 8).This supports the notion of using USAMM predictions based on one year in applications targeted at other years, even for scales below the state level.In summary, based on this we believe that the general patterns demonstrated in the predictions of USAMM are relevant, despite the age and temporal disparity of the data sources.Still, the materials available for comparison are not of high enough resolution to evaporate all concerns, and with USAMM estimates based only one year, it is difficult to judge how temporal changes in the shipment patterns fluctuate.Therefore it is difficult to say with certainty for how long the USAMM predictions that we have presented here will remain relevant.Collection of a new data set to which USAMM could be fitted to would certainly have the potential to shed light onto temporal trends in shipment patterns.Further, comparison of the USAMM prediction based on the currently available data to that of predictions based on much more recent data would provide a much higher level of detail than what the comparison to the NASS census and survey is capable of.Therefore we also note that a new set of swine shipment data would greatly benefit the reliability of the USAMM predictions and allow them to be used with more confidence in contingency planning.
The main impetus behind the development of USAMM for swine is to enable detailed modeling of infectious animal disease.To illustrate the usefulness of swine shipment networks predicted by USAMM in such contexts we constructed a rudimentary spatiotemporal model for the prediction of porcine epidemic diarrhea virus (PEDv) cases.PEDv was chosen for this purpose mainly because of the well-known connection between animal movements and transmission of the disease (Lowe et al., 2014;Machado et al., 2019;VanderWaal et al., 2018), and the availability of relatively recent high-quality data over prevalence and spread of PEDv in the U.S. The model used a GLM framework and included predictors unrelated to shipments and predictors that were related to shipments simulated by USAMM.The model selection procedure showed that the best possible fit was achieved when the full set of shipment related predictors was included.Further, the validation of the disease model showed that the it was capable of recreating the main spatial features of the outbreak pattern observed in the available data (Fig. 9).In combination, these two results further affirm the role of shipments in the transmission of PEDv, and show the importance of adequate insight into the shipment structure when constructing models for similar infections.The disease model was also used to estimate a map showing the relative geographical distribution of county-level risk of PEDv exposure based purely on contact structure, as mediated via USAMM-predicted shipments, and demography (Fig. 10).Constructing such visualizations can help with identifying areas where disease surveillance should be prioritized or where preparedness for quick intervention is of particular importance.With this risk map we demonstrate how a geographically comprehensive set of shipments can be used to extrapolate infection risk to include all counties in the conterminous U.S., and put disease transmission risk in a nation-wide perspective.However, we do note that although the disease model is not unsound from a technical point of view, it is cursory and lacks some essential transmission routes.It is mainly intended as a demonstration of what is possible when detailed shipment networks are available, and by having a good depiction of the shipment network on hand, a significant part of the transmission patterns that are seen can be explained.Further nuance and complexity, such as including various strains of PEDv, other transmission routes, and various biosecurity measures, would be crucial additions to improve the scientific rigor of the model and for it to be used credibly in, for instance, policymaking.This is evident from Fig. 9 where areas with high concentration of cases in Kansas, Oklahoma, northern Texas and Utah are not well reproduced by the disease model.A possible explanation for this may be that transmission in these areas was driven by a factor other than movement of live animals between premises.Potential factors which have previously been implicated as important for transmission of PEDv include contaminated feed, general local spread due to spatial proximity or fomites introduced by vehicles other than those used for live animal transportation (VanderWaal et al., 2018).Also, it is likely that a higher resolution model is required to adequately capture all nuances of the outbreak dynamics.The goal of including the disease model in this work, however, was not to construct the ideal model for PEDv, but to show that the national-scale shipment networks predicted by USAMM has the potential to be very useful from a disease modeling perspective.In the absence of shipment data, assessments along the line of what is illustrated by the PEDv risk-map can be hard to produce for any disease where the movement of live animals play an important role in the transmission, and we believe that USAMM for swine can play an essential role in bridging that knowledge gap in the U.S.

Conclusion
Here we have presented a version of USAMM capable of modeling the entire premises-to-premises swine shipment network of the conterminous U.S. The aim was to remedy the lack of real shipment data that is needed for mathematical modeling of infectious livestock disease.With USAMM we simulated swine shipment networks which were shown to capture the known spatial features of the real shipment network well.The simulated networks are of very high resolution, including features such as origin and destination premises and shipment size.The model is not perfect, however, and we note three main shortcomings of the model: it tends to overestimate the number of active nodes in the network; the hierarchical structure of the swine production industry is not explicitly modeled; the amount of error in the withinstate shipments of the posterior predictive simulations is unknown.The modeling problem is not trivial, but it is out hope that these issues can be solved with further development of the model.However, despite the issues, the simulated networks show good explanatory power in our disease model example for PEDv prevalence and can be used for modeling on a very large spatial scale.Consequently, given that these two things are given sufficient mind, it is our firm belief that the 250 simulated anonymized networks created and analyzed in this study will prove useful in further research and contingency planning for the prevention and control of outbreaks of infectious livestock disease in the U.S. The networks are freely available at http://dx.doi.org/10.25675/10217/235130 and https://webblabb.github.io/usammusdos/shiny.html.

CRediT authorship contribution statement
and  in  to be normally distributed with  2.5th = 0.0 and  97.5th = 1.0.The interpretation of  CVI  ∈ (0, ∞) and  HPA  ∈ (0, ∞) are as baselines for the expected number of shipments ( CVI  and  HPA  ) between two theoretical CVI or HPA premises over the course of one quarter.It corresponds to the expected number of shipments in the hypothetical scenario in which the distance between the two premises is zero, and the premises are exactly average in respect to herd size, industry covariates and state-level random effects (i.e.all factors of Eqs.(3) and (4) apart from  CVI  or  HPA  equals 1.0).As it is difficult to have a good idea about the shipment rate in this hypothetical scenario, vague gamma distributed priors with parameters shape = rate = 0.001 were assigned to both  CVI  and  HPA  .

Fig. 1 .
Fig.1.Premises-level parameters controlling the expected number of shipments for HPA and CVI premises pairs respectively: baseline shipment rate () and herd-size scaling parameter () associated with the origin (out) and destination (in) premises.Whiskers show the 95% credible intervals around the median of the parameters' posterior distributions for each quarter.

Fig. 4 .
Fig. 4. Hyperparameters for priors for parameters related to distance kernel.The priors are defined with mean = 1.0 and shape = .Whiskers show the 95% credible intervals around the median of the hyperparameters' posterior distributions.The hyperparameters are the mean () and coefficient of variation () of the lognormal priors of kernel parameters  and .
(left).These relative risks (left) are shown next to the relative spatial distribution of cases observed between August 2014 and March 2018 (right).

Fig. 6 .
Fig. 6.Distributions of network metrics out-degree (top), in-degree (center) and betweenness centrality (bottom) of CVI shipments aggregated to the level of county.The metrics are weighted by the number of animals shipped.Maps to the left show the spatial distribution of the three metrics of the 2010 CVI data used to fit the model, maps to the right show the median across 250 simulated networks, sub-sampled to match the data collection method.The inset histograms show the frequency distribution of the same metrics for the CVI data set (blue bars) and the median (x) and variation in the form of 95% credible interval of the simulated networks.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 7 .
Fig. 7. Distributions of network metrics out-degree (top), in-degree (center) and betweenness centrality (bottom) of HPA shipments aggregated to the level of agricultural districts.The metrics are weighted by the number of animals shipped.Maps to the left show the spatial distribution of the three metrics of the HPA data used to fit the model, maps to the right show the median across 250 simulated networks sub-sampled to match the data collection method.The inset histograms show the frequency distribution of the same metrics for the HPA data set (blue bars) and the median (x) and variation in the form of 95% credible interval of the simulated networks.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 8 .
Fig. 8. Frequency distribution of the size of the CVI shipments (left and center) and HPA shipments (right) of the simulated networks contrasted with the 2010 CVI, (left), 2011 CVI (center) and HPA (right) data sets (blue bars).Whiskers and  show the 95% credible interval and median across the 250 simulated networks, sub-sampled to match the data collection method of the respective data set of the comparison.

Fig. 9 .
Fig. 9. Prediction of GLM using the full set of predictors of the expected number of cases of PEDv in the first quarter of 2018 (left) compared to the actual observed number of cases in the same quarter (right).

Fig. 10 .
Fig. 10.GLM prediction of relative county-level PEDv infection risk in an average year when not taking infection status into account (left), and the relative number of PEDv cases per county observed in the PEDv data over the period August 2014 to March 2018 (right).

Table 2
Model parameters and their respective prior distributions.out,inPremisessizescalingexponents for shipment rate.Normal with  2.5th = 0.0 and  97.5th = 1.0.Scale parameter of the distance kernel function of state  for CVI and HPA shipments respectively.Lognormal with mean =   and coefficient of variation =   .Shape parameter of the distance kernel function of state  for CVI and HPA shipments respectively.Lognormal with mean =   and coefficient of variation =   .State level random effect on shipment rate of origin state  for CVI and HPA shipments respectively.Gamma with mean = 1.0 and shape parameter  out .Gamma with mean = 1.0 and shape parameter  in .Lognormal with mean =   and coefficient of variation =   .Lognormal with  2.5th = 1.0 × 10 −1 and  97.5th = 1.0 × 10 1 .
Baseline shipment rate for CVI and HPA shipments respectively.Gamma with shape = rate = 1 × 10 −3 .Shipment size shape parameter for CVI and HPA shipments respectively.Lognormal with  2.5th = 1.0 × 10 −1 and  97.5th = 1.0 × 10 1 .Mean of  prior.Lognormal with  2.5th = 1.0 × 10 4 m and  97.5th = 4.0 × 10 6 m.Coefficient of variation of  prior.Lognormal with  2.5th = 0.365 and  97.5th = 1.724.Mean of  prior.Lognormal with  2.5th = 2.0 and  97.5th = 1000.Coefficient of variation of  prior.Lognormal with  2.5th = 0.365 and  97.5th = 1.724.outShapeparameter of the prior of  out .Half-Cauchy with scale = 1.0. in Shape parameter of the prior of  in .Half-Cauchy with scale = 1.0.Mean of  HPA prior.Half-Cauchy with scale = 1.0.Coefficient of variation of  HPA prior.Half-Cauchy with scale = 1.0.which the kernel function has fallen to 50% of its maximum value, i.e. (, , ) = 0.5(0.0,, ).The second parameter is defined as the ratio between  and the distance at which the function has fallen to 5%, i.e. (, , ) = 0.05(0.0,, ).For the derivation of  and  from  and  see Appendix S2. I the model we define one set of kernel parameters for each combination of quarter,  and origin state   (i.e. the state of ).Separate sets of kernel parameters were used for CVI ( CVI    and  CVI    ) and HPA ( HPA

Table 3
Comparison of network metrics of the CVI subset of the simulated shipment network to those of the 2010 CVI training data.GSCC is the giant strongly connected component, GWCC is the giant weakly connected component.

Table 4
Comparison of network metrics of the HPA subset of the simulated shipment network to those of the HPA training data.GSCC is the giant strongly connected component, GWCC is the giant weakly connected component.