Insights into the domestication of avocado and potential genetic contributors to heterodichogamy

Abstract The domestication history of the avocado (Persea americana) remains unclear. We created a reference genome from the Gwen varietal, which is closely related to the economically dominant Hass varietal. Our genome assembly had an N50 of 3.37 megabases, a BUSCO score of 91%, and was scaffolded with a genetic map, producing 12 pseudo-chromosomes with 49,450 genes. We used the Gwen genome as a reference to investigate population genomics, based on a sample of 34 resequenced accessions that represented the 3 botanical groups of P. americana. Our analyses were consistent with 3 separate domestication events; we estimated that the Mexican group diverged from the Lowland (formerly known as “West Indian”) and Guatemalan groups >1 million years ago. We also identified putative targets of selective sweeps in domestication events; within the Guatemalan group, putative candidate genes were enriched for fruit development and ripening. We also investigated divergence between heterodichogamous flowering types, providing preliminary evidence for potential candidate genes involved in pollination and floral development.


Fig. S1
Additional admixture plots based on a subset of P. americana accessions. Fig. S2 Additional PCAs with outgroups. Fig. S3 The outcome of GO enrichment analysis for the pure Guatemalan sample (n =10), based on the set of 92 candidate genes (Table S4) detected by selective sweep mapping. Fig. S4 The outcome of GO enrichment analysis for the pure Lowland sample (n =5), based on the set of 436 candidate genes (Table S4) detected by selective sweep mapping. Fig. S5 The outcome of GO enrichment analysis for the pure Mexican sample (n =3), based on the set of 683 candidate genes (Table S4) detected by selective sweep mapping. Fig. S6 Plots illustrating Fst across the 12 scaffolded pseudo-chromosomes. Fig. S7 The outcome of GO enrichment analysis based on the set of 401 candidate genes (Table  S7) detected by Fst divergence analysis between the Mexican and Lowland pure sample. Fig. S8 The outcome of GO enrichment analysis based on the set of 394 candidate genes (Table  S7) detected by Fst divergence analysis between the Mexican and Guatemalan pure samples. Fig. S9 The outcome of GO enrichment analysis based on the set of 385 candidate genes (Table  S7) detected by Fst divergence analysis between the Lowland and Guatemalan pure samples.  (Table S9) detected by Fst divergence analysis between the A and B Flowering Types. Methods S1 A brief description of generating mapping and coverage masks for demographic analyses.

Fig. S1
Admixture plots based on removing two close relatives of Hass: Mendez (a purported somatic mutation of Hass), Gwen (a grandchild of Hass). With these samples, the optimal grouping is K=3 (middle graph), corresponding to Guatemalan, Mexican, Lowland and Hass groups.

Fig. S3
The outcome of GO enrichment analysis for the pure Guatemalan sample (n =10), based on the set of 92 candidate genes (Table S4) detected by selective sweep mapping with SweeD. Both graphs were generated by blast2GO, and both include only significant categories as measured by a p-value of p < 0.05 by a Fisher's Exact Test, as corrected for multiple tests by the blast2GO program. The graphs differ in the specificity; the top graph was generated to include general terms, while the lower graph used the "reduce to most specific option" to report terms at the most specific level in the gene ontology enrichment directed acyclic graph (DAG) file.

Fig. S4
The outcome of GO enrichment analysis for the pure Lowland sample (n =5), based on the set of 436 candidate genes (Table S4) detected by selective sweep mapping with SweeD. Both graphs were generated by blast2GO, and both include only significant categories as measure by a p-value of p < 0.05 by a Fisher's Exact Test, as corrected for multiple tests by the blast2GO program. The graphs differ in the specificity; the top graph was generated to include general terms, while the lower graph used the "reduce to most specific option" to report terms at the most specific level in the gene ontology enrichment directed acyclic graph (DAG) file.

Fig. S5
The outcome of GO enrichment analysis for the pure Mexican sample (n =3), based on the set of 638 candidate genes (Table S4) detected by selective sweep mapping with SweeD. Both graphs were generated by blast2GO, and both include only significant categories as measure by a p-value of p < 0.05 by a Fisher's Exact Test, as corrected for multiple tests by the blast2GO program. The graphs differ in the specificity; the top graph was generated to include general terms, while the lower graph used the "reduce to most specific option" to report terms at the most specific level in the gene ontology enrichment directed acyclic graph (DAG) file.

Fig. S6
Plots illustrating Fst across the 12 scaffolded pseudo-chromosomes. Each plot represents a comparison between two racial samples, based on 20kb windows along the chromosomes. In each graph, a dot represents Fst for each window, the red line represents a smoothed value along the chromosome, and the horizontal blue dotted line indicates the 1% cut-off. The three graphs are as labeled -i.e., the top graph contrasts the Mexican and Lowland sample, the middle graph contrasts the Mexican and Guatemalan sample, and the bottom graph the Lowland and Guatemalan samples.

Fig. S7
The outcome of GO enrichment analysis based on the set of 396 candidate genes (Table  S7) detected by Fst divergence analysis between the Mexican and Lowland pure sample. Both graphs were generated by blast2GO, and both include only significant categories as measure by a p-value of p < 0.05 by a Fisher's Exact Test, as corrected for multiple tests by the blast2GO program. The graphs differ in the specificity; the top graph was generated to include general terms, while the lower graph used the "reduce to most specific option" to report terms at the most specific level in the gene ontology enrichment directed acyclic graph (DAG) file.

Fig. S8
The outcome of GO enrichment analysis based on the set of 387 candidate genes (Table  S7) detected by Fst divergence analysis between the Mexican and Guatemalan pure samples. Both graphs were generated by blast2GO, and both include only significant categories as measure by a p-value of p < 0.05 by a Fisher's Exact Test, as corrected for multiple tests by the blast2GO program. The graphs differ in the specificity; the top graph was generated to include general terms, while the lower graph used the "reduce to most specific option" to report terms at the most specific level in the gene ontology enrichment directed acyclic graph (DAG) file.

Fig. S9
The outcome of GO enrichment analysis based on the set of 384 candidate genes (Table  S7) detected by Fst divergence analysis between the Lowland and Guatemalan pure samples. Both graphs were generated by blast2GO, and both include only significant categories as measured by a p-value of p < 0.05 by a Fisher's Exact Test, as corrected for multiple tests by the blast2GO program. The graphs differ in the specificity; the top graph was generated to include general terms, while the lower graph used the "reduce to most specific option" to report terms at the most specific level in the gene ontology enrichment directed acyclic graph (DAG) file.

Fig. S10 The outcome of GO enrichment analysis based on the set of 466 genes (Table S9) detected by
Fst divergence analysis between the A and B flowering types (Table 1). Both graphs were generated by blast2GO, and both include only significant categories as measured by a p-value of p < 0.05 by a Fisher's Exact Test, as corrected for multiple tests by the blast2GO program. The graphs differ in the specificity; the top graph was generated to include general terms, while the lower graph used the "reduce to most specific option" to report terms at the most specific level in the gene ontology enrichment directed acylic graph (DAG) file.

Table S4
The outcome of enrichment analyses for comparing genes either inferred to be under selection between two races (as inferred by SweeD analyses) or identified to be under selection (SweeD) in one race and contributing to diversity (as measured by Fst) between races. Statistically significant results indicate that more genes are shared between candidate lists than expected at random.    types and that were the basis for GO enrichment analysis ( Figure S10) (see supplementary excel file Table S8).