Origins of the 2009 H1N1 influenza pandemic in swine in Mexico

Asia is considered an important source of influenza A virus (IAV) pandemics, owing to large, diverse viral reservoirs in poultry and swine. However, the zoonotic origins of the 2009 A/H1N1 influenza pandemic virus (pdmH1N1) remain unclear, due to conflicting evidence from swine and humans. There is strong evidence that the first human outbreak of pdmH1N1 occurred in Mexico in early 2009. However, no related swine viruses have been detected in Mexico or any part of the Americas, and to date the most closely related ancestor viruses were identified in Asian swine. Here, we use 58 new whole-genome sequences from IAVs collected in Mexican swine to establish that the swine virus responsible for the 2009 pandemic evolved in central Mexico. This finding highlights how the 2009 pandemic arose from a region not considered a pandemic risk, owing to an expansion of IAV diversity in swine resulting from long-distance live swine trade. DOI: http://dx.doi.org/10.7554/eLife.16777.001


Introduction
Our ability to predict outbreaks of zoonotic pathogens requires an understanding of their ecology and evolution in reservoir hosts. At the onset of the 2009 influenza pandemic, whole-genome sequencing revealed that pdmH1N1 was a novel reassortant virus comprised of segments from three major swIAV lineages (Garten et al., 2009): segments PB2, PB1, PA, NP and NS were derived from a triple reassortant H3N2 swine virus (TRsw) that originated in North American swine during the mid-1990s; the HA (H1) segment derived from the classical swine H1N1 lineage (Csw) that has circulated in North American swine since the 1918 pandemic; and the NA (N1) and MP segments were related to the avian-like Eurasian swine lineage (EAsw) that emerged in European pigs in the late 1970s. Genetic and epidemiological evidence indicate that the first outbreak of pdmH1N1 occurred in humans in Mexico (Brockmann and Helbing, 2013;Chowell et al., 2011;Lemey et al., 2009). However, it remained unclear whether Mexico also was the site of the zoonotic transmission event that gave rise to the 2009 outbreak, given that EAsw viruses had never been detected in swine in any part of the Americas, including Argentina, Brazil, Chile, Mexico, Peru, Canada, and the intensively sampled swine herds in the United States (Anderson et al., 2013;Nelson et al., 2015b;Tinoco et al., 2015). At the time, Asia was the only region where TRsw, Csw, and EAsw viruses were known to co-circulate in swine (Zhu et al., 2013).
Asia is a global source of novel human seasonal influenza viruses (Lemey et al., 2014;Russell et al., 2008), avian viruses associated with the pandemics of 1957 and 1968 (Kawaoka et al., 1989), and emerging avian viruses with pandemic potential Taubenberger Morens, 2009). Since 2009, several studies have implicated Asian swine as a possible source of the TRsw/Csw/EAsw reassortant swine virus that gave rise to pdmH1N1. Asia is the only region where TRsw, Csw, and EAsw are known to exchange segments through reassortment Poonsuk et al., 2013;Takemae et al., 2008;Vijaykrishna et al., 2011). At the onset of the pandemic, IAVs identified in swine (swIAVs) in Hong Kong, SAR, were more closely related to pdmH1N1 than any other swIAVs available globally . One Hong Kong swIAV was identified with a genotype similar to the pdmH1N1 virus except for the NA segment, and transmitted via respiratory droplet in ferrets (Yen et al., 2011). However, the time of the most recent common ancestor (tMRCA) of the Hong Kong swIAVs and pdmH1N1 was still approximately 10 years . Combined with the lack of any complete swIAV genomes from Mexico's large swine populations at the start of the pandemic, and the difficulty of understanding how a virus that evolved in Asian swine caused its first outbreak in humans in Mexico, the geographical location of the swine-to-human transmission event that gave rise to pdmH1N1 has remained uncertain. In the years following the pandemic, new surveillance in Mexican swine has identified Csw, TRsw, and pdmH1N1 and seasonal H3N2 viruses of human origin (Nelson et al., 2015a), but no EAsw viruses. However, the majority of swIAVs in previous analyses were collected in northern and eastern Mexico, and no studies to date have included viruses from major swine-producing states in central Mexico, including Jalisco, Guanajuato, and Puebla ( Figure 1). eLife digest In 2009 a new influenza virus jumped from pigs to humans and spread very rapidly, causing an initial outbreak in Mexico and becoming a global pandemic in just a few months. Although the most straightforward explanation is that the virus originated in swine in Mexico, several studies suggested that this was unlikely because key genetic components of the virus had never been detected in the Americas. Determining the source of the disease is critical for predicting and preparing for future influenza pandemics.
Mena, Nelson et al. sought to better characterize the genetic diversity of influenza viruses in Mexican swine by obtaining the entire genetic sequences of 58 viruses collected from swine in Mexico, including some from previously unsampled regions in central Mexico. The sequences revealed extensive diversity among the influenza viruses circulating in Mexican swine. Several viruses included genetic segments that originated from viruses from Eurasia (the landmass containing Europe and Asia) and had not previously been detected in the Americas. The new sequences contained key genetic components of the 2009 pandemic virus. Furthermore, the sequences suggest that viruses with a similar genetic composition to the 2009 pandemic virus have been circulating in pigs in central-west Mexico for more than a decade. Thus, this region is the most likely source of the virus that started the 2009 pandemic.
Mena, Nelson et al. also found that the movement of viruses from Eurasia and the United States into Mexico closely follows the direction of the global trade of live swine. This highlights the critical role that animal trading plays in bringing together diverse viruses from different continents, which can then combine and generate new pandemic viruses.
A potential next step is to perform experiments that investigate how well the swine viruses can replicate and pass between different animal models. Comparing the results of such experiments with the findings presented by Mena, Nelson et al. could identify factors that make the viruses more likely to spread to humans and produce a pandemic. DOI: 10.7554/eLife.16777.002 Resolving whether the viral precursors of pdmH1N1 evolved in swine in Mexico or Asia has important implications for understanding pandemic emergence and informing risk assessment. We therefore undertook expansive surveillance efforts in Mexico, isolating the virus from pigs with respiratory symptoms in farms from six Mexican states with high swine production, including Sonora in northern Mexico, Yucatan in eastern Mexico, and previously unsampled states in central-east Mexico (Puebla), and central-west Mexico (Jalisco, Guanajuato, and Aguascalientes, Figure 1). These efforts yielded 58 complete viral genomes (sequence accession numbers are available in the Dryad data repository under Doi:10.5061/dryad.m550m) that were used to identify swIAVs that are closely related to pdmH1N1, resolving the origins of pdmH1N1 in its reservoir host. Figure 1. Live swine production in Mexico. Each Mexican state is shaded according to the density of pigs (the number of pigs per square kilometer, light green = lower and dark blue = higher). The six Mexican states where influenza viruses were collected for this study are indicated, and the number of pigs in the state is provided below the state's name. Source data from Mexico's Secretariat of Agriculture, Livestock, Rural Development, Fisheries and Food (SAGARPA) is available in Figure 1-source data 1. DOI: 10.7554/eLife.16777.003 The following source data is available for figure 1:

Extensive IAV genetic diversity identified in Mexican swine
Among the 58 swIAVs newly sequenced from Mexico, genome segments were identified from all three major swIAV lineages (Csw, TRsw, and EAsw) and all three major human IAV lineages of the 21 st century (H3N2, pdmH1N1, and pre-pandemic H1N1), evidence of extensive gene flow between Mexican swine and (a) humans, (b) US swine (Csw and TRsw viruses), and (c) European swine (EAsw viruses) (Figure 2). Except for TRsw (genotype 5) and pdmH1N1 (genotype 6), no other IAV lineages were identified intact in Mexican swine, evidence of frequent reassortment between viral lineages. Of the 12 genotypes identified, 10 were reassortants. Four genotypes were triple reassortants: genotype 1 contains all three swine lineages (Csw/TRsw/EAsw); genotype 9 contains all three human-origin lineages (pre-pandemic H1N1/human H3N2/pdmH1N1); and genotypes 2 and 11 contain one human-origin and two swine lineages (pre-pandemic H1N1/TRsw/EAsw and pdmH1N1/ Csw/TRsw, respectively). Although human-origin viruses were observed in all regions of Mexico, the importance of human-to-swine transmission in the evolution of swIAV diversity in Mexico has been characterized recently in detail (Nelson et al., 2015a). Instead, this study will focus on evolutionary processes that are directly involved in the emergence of pdmH1N1 precursors in Mexican swine, including reassortment and viral migration driven by long-distance movements of live swine.

Independent evolution of swIAVs in regions of Mexico
A high degree of population structure was observed among swIAV populations in Mexico's northern, central, and eastern regions. Viruses collected in Sonora, Yucatan, Puebla, and Jalisco/Guanajuato/ Aguascalientes consistently were positioned in different sections of the phylogenetic trees inferred for individual genome segments (Figure 3), evidence of independent viral introductions into these regions from humans, US swine, and Eurasian swine (summarized in Figure 4A). Viral gene flow was not frequently observed between Mexico's northern, eastern, and central regions (Figures 3,4A), a contrast to the frequency of inter-regional IAV migration observed in US swine herds, driven by ongoing long-distance movement of animals (Nelson et al., 2011). The only evidence for a migration event between two Mexican regions that is observed for multiple segments involves a single Yucatan virus (AVX-57) that is closely related to Jalisco viruses, evidence of Jalisco-to-Yucatan migration ( Figure 4A

Multiple introductions of swIAVs from US and European swine into Mexico
Time-scaled phylogenies suggest that viral introductions from US and European swine into Mexico have occurred periodically for at least two decades, coinciding with a sharp increase in the number of reported imports of live swine into Mexico in the late 1980s ( Figure 5). Notably, EAsw viruses have been introduced from Eurasian swine into central Mexico on at least two occasions. Although the US and Canada are Mexico's primary trade partners for the import of live swine, live swine imports into Mexico also were reported from several European countries during the late 1990s, including the United Kingdom and Denmark ( Figure 6). We estimate that at least three TRsw viruses and two Csw viruses were introduced from US swine into Mexican swine herds in the northern and central regions. Estimates of the number and timing of independent swIAV introductions into Mexico are complicated by gaps in global background swIAV sequence data in the 1990s/early 2000s and topological differences in the phylogenies inferred for different segments arising from reassortment. Our conservative approach of defining introductions based on both high posterior probabilities (>90) on the phylogeny (Figure 3-source data 1) and Bayes factor (BF) support for significant rates of migration (BF > 6) (Figure 4-figure supplement 1) has likely underestimated the number of introductions. We also recognize the need to draw phylogeographic inferences with an awareness of missing data. Major gaps in surveillance in swine in Europe and Asia during the 1990s have resulted in long branch lengths between swIAVs collected in Mexico and Eurasia (Figure 3-source data 1).
Although it is difficult to infer from the tree whether the EAsw MP identified in Mexico was introduced from Asian or European swine (Figure 4-figure supplement 1), the lack of live swine imports reported between Mexico and any Asian country (Figure 6), and the low export of Asian swine globally (Nelson et al., 2015c) suggest that Europe may be a more likely source of the Mexican viruses, similar to the H1 and N1 segments.

Identification of pdmH1N1 precursor viruses in central Mexican swine
An important consequence of the long-distance dissemination of EAsw, Csw, and TRsw viruses into central Mexico was the identification of 18 swIAVs in central-west Mexico with the same combination of EAsw, Csw, and TRsw segments as pdmH1N1 (referred to as genotype 1, Figure 2). Genotype 1 viruses are distinct from the reverse zoonosis introductions of pdmH1N1 from humans to swine (genotype 6) that have been observed globally (Nelson and Vincent, 2015). Phylogenetically, genotype 1 viruses from Mexico form a sister lineage to pdmH1N1 on the PB2, PB1, PA, HA, NP, and NA trees ( Figure 3, Figure 3-source data 2). Genotype 2 viruses from Jalisco (genotype 2 viruses are similar to genotype 1, but contain a pre-pandemic human-origin H1 segment [ Figure 2]) are closely related to pdmH1N1 on the MP tree. No Mexican swIAVs were closely related to pdmH1N1 on the NS tree. However, NS is the segment with the highest tMRCA between pdmH1N1 and the most closely related swIAVs (Figure 7), an indication of larger gaps in sampling and missing data. It is therefore likely that the NS precursor in Mexico was not detected by surveillance or has been replaced in intervening years by reassortment. The tMRCA analysis therefore provides further indication that Mexico is the likely origin of the swine virus that gave rise to pdmH1N1. However, it is difficult to determine the precise location within central-west Mexico of zoonotic transmission, owing to high rates of gene flow and reassortment between swIAVs in Jalisco and Guanajuato and resulting differences in tree topologies (Figures 3,4A). Swine viruses from Guanajuato are most closely related to pdmH1N1 on the phylogenies inferred for the PB2, PB1, H1, and N1 segments. Jalisco viruses are most closely related to pdmH1N1 on the PA and MP trees. Both Guanajuato and Jalisco viruses are related to pdmH1N1 on the NP tree. Therefore, the swine-to-human transmission event could have occurred in Guanajuato, Jalisco, or possibly another state in central Mexico that has not been sampled.

Patterns of reassortment in central Mexican swine
Of the 10 reassortant genotypes identified in Mexican swine, four genotypes were identified in central Mexican swine that contain at least one EAsw segment, indicative of multiple reassortment events involving EAsw viruses. Genotype 1 viruses emerged multiple times in central Mexico and sustained transmission longer than any other genotype in central Mexican swine, as estimated using 'Markov rewards' (Figure 3-source data 3). No EAsw PB2, PB1, PA, NP, and NS segments were Trees include the 58 swIAVs collected in Mexico for this study, representative pdmH1N1 viruses, and other related swIAVs collected globally. The color of each branch indicates the most probable location state. For clarity, the pdmH1N1 clade is depicted as a triangle, the color of which represents the inferred location state of the node representing the inferred common ancestor of pdmH1N1 and the most closely related swIAVs (indicated by an open circle). Posterior probabilities are provided for key nodes. More detailed phylogenies including tip labels are provided in Figure 3-source data 1. along with trees inferred for lineages not shown here. Similar phylogenies inferred using maximum likelihood methods are provided in Figure 3source data 2. Similar phylogenies that use genotype instead of geographic location as a trait are provided in Figure 3-source data 3. Nine Mexican swIAVs were excluded from the phylogenetic analysis because they were outliers in root-to-tip divergence (    (Brockmann and Helbing, 2013). The use of genetic data provides more refined within-country spatial details. Location state posterior probabilities for the root of the tree were highest in Mexico City (0.81), consistent with an origin of the pandemic outbreak in Mexico's capital city (Figure 4-figure supplement 2). Texas shares a long and heavilytrafficked border with Mexico (Weinberg et al., 2003), and the southern United States was the largest recipient of pdmH1N1 introductions from Mexico during the early stages of the 2009 pandemic ( Figure 4B). However, the southern US was not a major source of pdmH1N1 viruses to the United States' Midwest, West, Northeast regions or globally. Rather, viruses identified in New York, Wisconsin, California, and other US states represent independent introductions of pdmH1N1 from Mexico. Notably, the Northeast region of the United States, including New York, had the largest role in the   Figure 3), for a posterior distribution of~2000 trees inferred for each segment. A high proportion of blue dots indicates a higher proportion of trees with Mexico as the inferred location state. The x-axis indicates the tMRCA of the same node, again for each tree. The 95% HPD is provided in brackets. Older tMRCAs are associated with longer phylogenetic branch lengths and gaps in sampling. DOI: 10.7554/eLife.16777.022 Figure 7 continued on next page onward global dissemination of the pandemic virus to Europe, Asia, and South America ( Figure 4B), perhaps driven by high volumes of air traffic from New York City and the size of the city's epidemic during the spring wave .

Discussion
Overall, our findings resolve a long-standing question of where the pandemic H1N1 virus originated in swine. Moreover, they underscore the importance of understanding the global evolution of IAVs in pigs, and the pandemic threat presented by the large number of independently evolving swIAV populations worldwide that periodically experience invasions of new viruses. Our findings demonstrate the central importance of inter-hemispheric swine movements in the long-range dissemination of swIAV diversity, which allowed divergent Eurasian and North American viruses to co-circulate in central Mexico and reassort into genotype 1 viruses (Figure 8). It remains unclear whether conditions in Mexico were uniquely suitable for the emergence of genotype 1 viruses, or whether it is possible that similar viruses also circulate undetected in Asia, the only other location where Csw, EAsw, and TRsw viruses and various reassortants are known to be present . The inability of researchers to produce genotype 1 viruses experimentally in cell cultures or pigs  implies strain-specific differences in reassortment capabilities. The likelihood of pandemic emergence is therefore enhanced by the seeding of diverse swIAV lineages in countries with varying swine production practices and fitness pressures. The independent evolution of multiple variants increases the probability that one will evolve the capacity to transmit to humans.
Although understanding the positive and negative interactions between fitness in human and swine hosts is central to understanding pandemic emergence, this area of research remains poorly understood. Strong patterns of reassortment between EAsw and North American viruses in Mexican swine are an indication of competitive interactions. The apparent fitness of the TRIG + EAsw MP internal gene constellation in central Mexican swine was central to the emergence of the genotype 1 viruses that gave rise to pdmH1N1, but is not well understood. The success of genotype 1 viruses for many years in central-west Mexico is notable, given the low levels of persistence of pdmH1N1 viruses as an intact genome in swine following reverse zoonosis in many countries (Liang et al., 2014;Nelson and Vincent, 2015;Nelson et al., 2015d;Watson et al., 2015), a pattern also observed in Sonora. The lower levels of onward transmission of pdmH1N1 surface genes in swine could relate to competition with existing H1s and N1s. The identification of EAsw NA and MP segments in central Mexican swine is notable, given that these segments were associated with enhanced respiratory droplet transmission in animal models (Lakdawala et al., 2011;Yen et al., 2011). The EAsw MP segment also is associated with zoonotic transmission of swIAVs at agricultural fairs in the US (Bowman et al., 2012). A number of studies have examined the transmission potential of naturally occurring or lab-generated avian and swine viruses in ferret or other animal models, as a measure of pandemic potential (Herfst et al., 2012;Sorrell et al., 2009;Yang et al., 2016). However, a fundamental understanding of fitness landscapes and trade-offs in multi-host systems has been constrained by the a lack of final resolution on the ethics of various experiments that involve novel viruses that present a potential threat to humans (Frank et al., 2016). The identification of viruses related to the 2009 pandemic precursors invites new experimental research into central questions of pandemic emergence at the human-swine interface.
The detection of EAsw in Mexico suggests that even relatively low levels of livestock movement can disseminate a highly transmissible virus such as IAV long distances, bringing attention to the importance of surveillance and mitigation strategies, including quarantine. Presently, imported live swine are not routinely tested for influenza. The emergence of the 2009 pandemic virus was closely linked to the increase in Mexico's imports of live swine during the 1990s and the influx of new influenza virus lineages from the United States and Europe. At this time we cannot rule out Asia-to-

Figure 7 continued
The following source data is available for figure 7: Source data 1. Times to the most recent common ancestor (tMRCA) and posterior probabilities (>0.01) for the location state of the node representing the ancestor of the pdmH1N1 clade and the most closely related swine viruses (indicated with open circles in Figure 3). DOI: 10.7554/eLife.16777.023 Mexico viral migration, owing to possible omissions in reported imports and exports of swine and gaps in sequencing of viruses in swine globally. However, all that is known about the nature of international swine production and trade suggests that this direction of viral movement is unlikely. Longdistance trade of pigs from Europe, the United States, and Canada to countries in Asia and South America occurs primarily to import high-efficiency European and North American breeding lines. Although illegal trade of poultry is important in the spread of AIVs, particularly within the East Asian region, there are no known economic incentives (and strong economic disincentives) for the illegal trade of pigs from Asia and Mexico. Globally, Asia is overwhelmingly a net importer of swine from Europe and North America, with extremely little export of Asian pigs or their viruses to any other continents (Nelson et al., 2015c). Another outstanding question is whether genotype 1 viruses have disseminated from Mexico to other countries in Latin America, where surveillance is low or non-existent. Mexico has reported export of live swine to several countries in Latin America (Belize, Colombia, Costa Rica, Cuba, El Salvador). However, the total volume of Mexico's live swine exports during 1996-2012 amounts to only one-tenth of Costa Rica's over the same time period, and one-thirtieth of Brazil's. Mexico is therefore considered an ecological sink for swine influenza viruses, with limited opportunity for viral export to other countries (Nelson et al., 2015c).
The unexpected detection of EAsw in Mexico highlights the need to revisit outdated assumptions about the spatial distribution of IAVs in swine populations globally, given the capacity of long- distance live swine movements to disseminate viral diversity between continents. Although surveillance of swIAVs has been initiated in many new regions since 2009, it remains highly imbalanced on a global scale and low compared to avian surveillance (Nelson and Vincent, 2015). Recent detections of novel swIAVs in regions that traditionally have not conducted surveillance in swine, including Latin America and South-east Asia, underscore the importance of expanding surveillance in the large number of under-sampled regions with high densities of swine Nelson et al., 2015a;Ngo et al., 2012;Perera et al., 2013). China remains an important source of zoonotic influenza viruses with pandemic potential, and the identification of strong regional variations in swIAV diversity in Mexico highlights the importance of capturing swIAV diversity across China's heterogeneous regions. Although the importance of regional live swine movements in the evolution of swIAVs in the United States has been documented (Nelson et al., 2011), many countries lack data on the intra-country movements of swine, limiting study of interactions between animal movements and regional patterns of swIAV diversity. Data on poultry movements within and between countries are similarly lacking, limiting our understanding of the role of trade in the spatial dynamics of IAVs in domestic birds. Our ability to predict future pandemics will require intensified viral surveillance and an understanding of how economic forces and international trade policies affect changes in animal movements and production practices that drive viral emergence.

Collection of influenza viruses in Mexican swine
Swine influenza isolates were obtained from samples submitted to Avimex laboratories (Mexico) for confirmation diagnostics of pigs with respiratory syndrome. The samples covered years 2010 (1 sample), 2011 (6 samples), 2012 (34 samples), 2013 (14 samples) and 2014 (3 samples). All samples came from large commercial facilities from 6 states: Sonora (8 different farms), Jalisco (6 farms), Guanajuato (3 farms), Puebla (2 farms), Yucatan (2 farms) and Aguas Calientes (1 farm). Virus was isolated in specific pathogen free embryonated eggs or in cell culture; HA, NA subtype was determined and samples from first or second passage were sent to Mount Sinai (New York) for RNA extraction and whole genome sequencing. When multiple isolates of the same subtype were obtained from the same farm and outbreak the sample with highest viral titer was submitted. GenBank accession numbers of the 58 genomes are available in the Dryad data repository under Doi:10.5061/dryad.m550m. Figure 2-source data 1 describes the date, location, subtype and lineage assignment of each viral segment.

Influenza A genome sequencing
Viral RNA was obtained from 140 ml of the original sample, or after amplification in MDCK cells, using the QIAamp Viral RNA minikit QIAGEN). Next, 5 ml of viral RNA were used as template in a 50 ml multisegment RT-PCR reaction (Superscript III high-fidelity RT-PCR kit) with influenza-specific universal primers complementary to the conserved 12-13 nucleotides at the end of all 8 genomic segments. Primer sequences and final concentrations in the reaction were as follows (influenza-complementary sequences are underlined): Opti1-F1 -5' GTTACGCGCCAGCAAAAGCAGG (0.1 mM); Opti1-F2 -5' G TTACGCGCCAGCGAAAGCAGG (0.1 mM); Opti1-R1 -5' GTTACGCGCCAGTAGAAACAAGG (0.2 mM). Amplicons were purified with 0.45x volume AMPure XP beads (Beckman Coulter) and 0.5-1 mg was sheared to an average fragment size of 150 bp on a Bioruptor Pico sonicator (Diagenode). Next, amplicon sequence libraries were prepared using the end repair, A-tailing, and adaptor ligation NEBNext DNA library prep modules for Illumina from New England Biolabs according to the manufacturer's protocol. Following final size-selection with 1x volume Ampure XP beads, and secondary PCR (8 cycles) to introduce barcoded primers, multiplexed libraries were sequenced on the Illumina HiSeq 2500 platform in a single-end 100 nt run format.

De novo genome assembly
Single-end 100 nt reads were first filtered with cutadapt (Martin, 2011) to remove low-quality sequences and adapters. Reads were then mapped against a non-redundant copy of IAV sequences in the Influenza Research Database (IRD) using STAR (Dobin et al., 2013) and chimeric reads with non-contiguous alignments to reference segments (typically originating from defective interfering particles containing segments with internal deletions) were removed. Assembly of IAV genomic segments was performed using a custom pipeline (available from the authors) in multiple stages. First, an initial assembly was done using the inchworm component of Trinity (Grabherr et al., 2011), and viral contigs bearing internal deletions were identified by BLAT (Kent, 2002) mapping against nonredundant IRD reference sequences. In the second stage the inchworm assembly was repeated but now removing breakpoint-spanning kmers from the assembly graph. Resulting IAV contigs were then oriented, and trimmed to remove low-coverage ends and any extraneous sequences beyond the conserved IAV termini. In the final stage the CAP3 (Huang and Madan, 1999) assembler was used to improve contiguity by merging contigs originating from the same segment type if their ends overlapped by at least 25 nt. Assembly quality and contiguity was assessed for all segments by mapping sequence reads back to the final assemblies using the Burrows-Wheeler Alignment (BWA) tool (Li and Durbin, 2009) and complete segments were annotated using the NCBI Influenza Virus Sequence Annotation Tool (Bao et al., 2007).

Phylogenetic analysis of swIAVs in Mexico
In addition to the 58 whole-genome sequences generated for this study from Mexico, background data sets were included that were recently used in previous studies of IAV evolution in swine (Nelson et al., , 2015a(Nelson et al., , 2015b(Nelson et al., , 2015c. The vast majority of swIAV sequences from the Americas are deposited in NCBI's GenBank, but to ensure that no additional sequences of importance were available from the Global Initiative on Sharing All Influenza Data's (GISAID) Epiflu database from Europe and Asia, sequences for the EAsw lineage were downloaded from GISAID (acknowledgement of the authors and research laboratories that contributed these data are provided in Supplementary files 1-2). A large number of whole-genome sequences from swIAVs in European swine were published by Watson et al. (2015) after we completed our analysis. To ensure that none were closely related to pdmH1N1 or any of the Mexican swIAVs, particularly on the trees inferred for the Eurasian N1 and MP segments, we inferred neighbor-joining trees including the new European swIAV sequences deposited into GenBank by these authors (data available from the authors). Sequence alignments were constructed for each of the six internal gene segments (PB2, PB1, PA, NP, MP, and NS) and for the H1, H3, N1, and N2 antigenic segments separately using MUSCLE v3.8.3 (Edgar, 2004), with manual correction in Se-Al v2.0 (available at http://tree.bio.ed.ac.uk/software/seal/). The program Path-O-Gen v1.4.0 (available at http://tree.bio.ed.ac.uk/software/pathogen/) was used to identify potential sequencing errors that substantially deviated from the linear regression of root-to-tip genetic distance against time, which were subsequently removed from the study. We excluded nine triple reassortant Mexican swIAVs (e.g., A/swine/Mexico/Mex19/2010 (H1N1), Figure 3-figure supplement 1) that had shorter branch lengths than expected given their year of sampling, an indication of possible errors in sequencing or assembly.
Phylogenetic relationships were inferred for each of the data sets separately using the time-scaled Bayesian approach using MCMC available via the BEAST v1.8.2 package (Drummond et al., 2012) and the high-performance computational capabilities of the NIH HPC Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD (http://hpc.nih.gov/). A relaxed uncorrelated lognormal (UCLN) molecular clock was used, with a constant population size, and a general-time reversible (GTR) model of nucleotide substitution with gamma-distributed rate variation among sites. For viruses for which only the year of viral collection was available, the lack of tip date precision was accommodated by sampling uniformly across a one-year window from January 1 st to December 31 st . The MCMC chain was run separately at least three times for each of the data sets and for at least 100 million iterations with sub-sampling every 10,000 iterations, using the BEAGLE library to improve computational performance ). All parameters reached convergence, as assessed visually using Tracer v.1.6, with statistical uncertainty reflected in values of the 95% highest posterior density (HPD). At least 10% of the chain was removed as burn-in, and runs for the same lineage and segment were combined using LogCombiner v1.8.0 and downsampled to generate a final posterior distribution of 1000 trees that was used in subsequent analyses.
In order to visualize branch lengths between swIAVs and the pdmH1N1 clade in terms of genetic distance, rather than time, the phylogenetic relationships of each of the eight data sets containing the pdmH1N1 clade were inferred using the maximum likelihood (ML) method available in the program RAxML v7.2.6 (Stamatakis, 2006), incorporating a general time-reversible (GTR) model of nucleotide substitution with a gamma-distributed (G) rate variation among sites. To assess the robustness of each node, a bootstrap resampling process was performed (500 replicates), again using the ML method available in RAxML v7.2.6. Although not the focus of this study, phylogenetic trees also were inferred for the pdmH1N1 lineage to identify recent human-to-swine transmission events in Mexico (inferred for PB2 and H1, as representative segments, Figure 3-source data 4), using methods described above and background data sets used in previous studies (Nelson et al., 2015b;2015d).

Spatial analysis of swIAVs in Mexico
The phylogeographic analysis considered 11 locations: Asia (including China, Japan, South Korea, Thailand, Vietnam), USA/Canada, Europe (including Belgium, Czech Republic, Denmark, France, Germany, Italy, Netherlands, Poland, Spain, and the United Kingdom), South America (including Argentina and Brazil), Mexico (Aguascalientes), Mexico (Guanajuato), Mexico (Jalisco), Mexico (Puebla), Mexico (Sonora), and Mexico (Yucatan), as well as humans (globally). The location state was specified for each viral sequence, allowing the expected number of location state transitions in the ancestral history conditional on the data observed at the tree tips to be estimated using 'Markov jump' counts (Minin and Suchard, 2008a), which provided a quantitative measure of asymmetry in gene flow between regions. The location of viruses in the pdmH1N1 clade was left uninformed, allowing the reconstruction of the location state of the common ancestor of pdmH1N1 to be unbiased by human pdmH1N1 data. For computational efficiency the phylogeographic analysis was run using an empirical distribution of 1000 trees (Lemey et al., 2014), allowing the MCMC chain to be run for 25 million iterations, sampling every 1000. A Bayesian stochastic search variable selection (BSSVS) was employed to improve statistical efficiency for all data sets containing more than four location states. Maximum clade credibility (MCC) trees were summarized using TreeAnnotator v1.8.0 and the trees were visualized in FigTree v1.4.2. Heat-maps were constructed using the R package to summarize Markov jump counts inferred over the totality of phylogenies (all segments, all swIAV lineages). Waiting times between location state transitions ('Markov rewards') (Minin and Suchard, 2008b) also were estimated, allowing us to estimate 'Markov jump' counts normalized for the percent time of locations in the tree.

Spatial analysis of human pdmH1N1
All available whole-genome sequence data for pdmH1N1 viruses collected globally in humans during March 1-May 31, 2009 were downloaded from the National Center for Biotechnology Information's Influenza Virus Resource (Bao et al., 2008) (http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html) on December 12, 2015. The over-sampled locations of New York, USA and Wisconsin, USA were randomly subsampled to 50 genomes each, resulting in a final data set of 422 complete concatenated whole-genome sequences (Supplementary file 3). The sequence data were aligned using MUSCLE v3.8.3 (Edgar, 2004). Phylogenetic relationships were inferred using the time-scaled Bayesian approach using MCMC available via the BEAST v1.8.2 package, in this case assuming a model of exponential growth in the number of infections, as this model is more realistic during the early growth stage of new pandemic virus . The alignment and phylogeographic analysis, using the methods described above, considered 12 locations: Asia (China, Japan, South Korea, Malaysia, Singapore, Thailand), Canada, Europe (United Kingdom, Finland, Italy, Russia, Netherlands, France, Germany), South America (Argentina, Chile, as well as the Dominican Republic), Mexico (Mexico City), Mexico (San Luis Potosi), Mexico (Veracruz), Mexico (state unknown), United States -Northeast (Maryland, Massachusetts, New Jersey, New York, Rhode Island), United States -Midwest (Indiana, Iowa, Kansas, Michigan, Nebraska, Ohio, Wisconsin), United States -South (Alabama, Texas, Florida, Mississippi, South Carolina, Tennessee), and United States -West (Arizona, California, Colorado, New Mexico). Heat-maps were constructed using the R package to summarize Markov jump counts inferred over the phylogeny. Waiting times between location state transitions ('Markov rewards') (Minin and Suchard, 2008b) also were estimated, allowing us to estimate 'Markov jump' counts normalized for the percent time of locations in the tree.

Genotype 1
To estimate the extent of reassortment and onward transmission of different genotypes in swine in Mexico, we used a similar phylogeographic approach, but in this case specified a location state for each Mexican virus based on genotype. For viruses collected for this study, we used the 12 genotypes defined in Figure 2. For the six additional whole-genome sequences available from Mexico that were published previously (Nelson et al., 2015c) we defined two new genotypes: genotypes 13 and 14. In addition to estimating transitions between locations (Markov jumps), we also estimated the waiting times between transitions ('Markov rewards') (Minin and Suchard, 2008b) as an indication of the period of time Mexican lineages circulate as different genotypes.