A supergene in seaweed flies modulates male traits and female perception

Supergenes, tightly linked sets of alleles, offer some of the most spectacular examples of polymorphism persisting under long-term balancing selection. However, we still do not understand their evolution and persistence, especially in the face of accumulation of deleterious elements. Here, we show that an overdominant supergene in seaweed flies, Coelopa frigida, modulates male traits, potentially facilitating disassortative mating and promoting intraspecific polymorphism. Across two continents, the Cf-Inv(1) supergene strongly affected the composition of male cuticular hydrocarbons (CHCs) but only weakly affected CHC composition in females. Using gas chromatography–electroantennographic detection, we show that females can sense male CHCs and that there may be differential perception between genotypes. Combining our phenotypic results with RNA-seq data, we show that candidate genes for CHC biosynthesis primarily show differential expression for Cf-Inv(1) in males but not females. Conversely, candidate genes for odorant detection were differentially expressed in both sexes but showed high levels of divergence between supergene haplotypes. We suggest that the reduced recombination between supergene haplotypes may have led to rapid divergence in mate preferences as well as increasing linkage between male traits, and overdominant loci. Together this probably helped to maintain the polymorphism despite deleterious effects in homozygotes.


Introduction
Complex multi-trait polymorphisms like colour morphs and specialized ecotypes are a fascinating aspect of intraspecific diversity.Such polymorphisms are increasingly known to be associated with supergenes, i.e. genomic regions harbouring linked combinations of alleles from recombination [1][2][3].The loci in supergenes segregate together, acting as Mendelian loci and producing multitrait phenotypes such as the mimetic morphs in the butterfly Heliconius numata, which differ in both coloration and patterning [4], reproductive morphs in ruff (Philomachus pugnax) which differ in plumage, size and behaviour [5,6] and flower distyly in multiple species of plants [7,8].However, while the linked genetic architecture of supergenes, particularly when due to chromosomal inversions, helps to stabilize polymorphism it also can lead to increased accumulation of deleterious load [9][10][11][12] by reducing the efficacy of purifying selection [3,10,13].Thus, the persistence of supergene polymorphism over long-time scales remains puzzling.
Understanding the maintenance of polymorphism is a key focus of supergene research and several different mechanisms of balancing selection have been identified in supergene systems [11,14].One of these mechanisms, disassortative mating, appears to be quite prevalent [11].Disassortative mating occurs when individuals preferentially mate with dissimilar phenotypes and is found in many classic supergene systems [13,[15][16][17][18]. Disassortative mating is particularly adaptive in supergenes with deleterious mutations because those are generally private to one supergene arrangement, generating a heterozygote advantage [10,[19][20][21].Theory suggests that the evolution of such a mating system requires either (i) a self-referencing system where individuals use their own phenotype to choose a mate or (ii) tight linkage between mating signals and preferences [21,22].By including many loci, supergenes are obviously good candidates for the latter situation.Yet, the underlying biological mechanisms and the genetic architecture of disassortative mating remain poorly known.
To better understand this form of mate choice and its role in supergene maintenance, we explored the traits and genetic basis of disassortative mating in relation to the Cf-Inv (1) supergene in the seaweed fly Coelopa frigida.Cf-Inv(1) is a large supergene spanning 10% of the genome [23] with two arrangements, termed α and β, resulting from three overlapping inversions [24].The Cf-Inv(1) supergene affects multiple phenotypic traits such as development time, fitness on different substrates and adult size [25][26][27].It is characterized by strong overdominance.In the wild, genotype frequencies deviate from Hardy-Weinberg and heterozygotes are found in excess (1.1-1.6×their expected frequency; [25,27,28].In experimental populations, the egg-to-adult survival of homozygotes is reduced by 10-20% at low density and by 50-70% at high density when compared to heterozygote survival [29][30][31], suggesting that each supergene haplotype has accumulated a significant deleterious load.Non-random mating with respect to Cf-Inv(1) has been shown in several experimental and natural populations [25,29,32].This may be explained in part by a higher success of large males, which are more likely to win competitions and successfully mount females without being dislodged [33].While this process may underlie a proportion of disassortative mating ( particularly between ββ females and large αα males), it does not seem to be the only process at play [25].Several lines of evidence support pre-and post-copulatory female choice favouring reproduction with males exhibiting a different genotype at the supergene.First, disassortative mating was more frequent than assortative mating in seminatural conditions for the three genotypes [34].The same mating pattern was observed between opposite lines of supergene homozygotes with about 20-60% more disassortative choice than assortative choice, even when controlling for size [25].Second, upon successive matings, which frequently happen in C. frigida, paternity was dominated by the sperm of the male with the opposite genotype [35].Disassortative mate choice may have been selected because of the observed overdominance, and both mechanisms may contribute to the persistence of polymorphism.
The mechanisms underlying disassortative mate choice in C. frigida remain unknown.Males perform no courtship displays in this system: they mount females indiscriminately and are dislodged by the female via kicking, wing flicking or other manoeuvres in 30-50% of mating attempts [25,32,36].Females receive no visual cues but the males rub the female's antennae with their forelegs during mounting [36] and females without antennae are less likely to mate while males with painted legs are more often dislodged [36].Based on this, we hypothesized that chemical cues may play a role in sexual communication in this system.Cuticular hydrocarbons (CHCs) are a common mode of communication in insects and often facilitate mate choice [37,38].Our previous work has demonstrated that CHC composition varies between sexes in C. frigida, making CHCs good candidate cues for sexual selection [39].
In this study, we investigated the hypothesis that chemical communication is conditioned by the supergene genotype Cf-Inv(1), putatively explaining disassortative mating with respect to this overdominant supergene.To test this, we quantified differences in CHC profiles between Cf-Inv(1) genotypes in populations from two continents and predicted that male CHC profiles might show a stronger effect of supergene genotype than female profiles.We combined this with measurements of female perception of these compounds using gas chromatography-electroantennographic detection (GC-EAD).Finally, we sought to more thoroughly explore the genetic basis of male traits and female perception using differential expression analyses.where they were allowed to mature on their natural wrack until all emerging adults were collected.Wild C. frigida adults were collected in September 2018 at Kamouraska, QC, Canada (47.56294, −69.87375) and transported to Laval University (Québec, Canada).Adults were allowed to lay eggs on standardized laboratory wrack (approx.50% fucoids and 50% kelp) and this subsequent generation was used for CHC analysis (i.e. the focal generation).Laboratory conditions were standardized between Sweden and Canada and all flies were raised in a temperature-controlled room at 25°C with a 12 h/12 h light-dark cycle.

Methods
For the focal generations, as larvae pupated, they were transferred to individual 2 ml tubes with a small amount of cotton soaked in a solution of 0.5% mannitol.Two days after eclosure, flies were frozen at −80°C.To genotype each adult at the inversion, we extracted genomic DNA from one leg or the whole fly and performed a diagnostic SNP assay involving a PCR step amplifying the Adh gene, and a digestion step with two restriction enzymes targeting SNPs fixed between arrangements (described in [27]).
The focal generation of Norwegian flies for the EAD analysis was obtained from larvae collected in September 2021 from the same site (Østhassel, Norway) and genotyped by extracting DNA from one leg.The focal generation of Canadian flies for the EAD analysis was obtained from culture lines homozygous at the Cf-Inv(1) and descending from wild adults collected in July 2021 from the same site (Kamouraska, Québec).All flies were reared under the conditions described above.Adult flies were kept at 5°C in 2 ml tubes containing cotton soaked in a solution of 0.5% mannitol and a piece of seaweed for a maximum of two weeks before GC-EAD analysis.Frozen flies were allowed to defrost and dry for 10 min.Each fly was then placed in a 1.5 ml high recovery vial containing 300 µl of n-hexane, vortexed at a low speed for 5 s and extracted for 5 min.Afterwards, flies were removed from the vial and allowed to air dry before they were weighed.Extracts were evaporated until dryness under a stream of nitrogen and stored at −20°u ntil GC-MS analysis.Before analysis, extracts were redissolved in 20 µl of n-hexane containing 1 µg ml −1 n-nonane (Sigma-Aldrich) as an internal standard and vortexed at maximum speed for 10 s.Extracts of about 25 flies per population (Norway and Canada), sex and genotype were analysed on a GC-MS (see electronic supplementary material for details).
The total ion chromatograms were quality checked before peak integration in OpenChrom and peak alignment using the R package 'GCalignR' [40].The peak list was manually revised and filtered on minimum peak area and number of missing peaks before statistical analysis (see electronic supplementary material for details).

(i) Statistical analyses of CHC profiles
For the analysis of CHC profiles, peak areas were normalized on the internal standard peak area and the weight of the fly.For the multivariate analysis, all data were additionally mean centred and unit variance scaled.Extreme outliers were identified by deviations in the orthogonal and score distance in a PCA, and removed prior to analysis.Balanced sample sets were visualized with PCAs for Canadian females (αα/αβ/ββ = 22), Canadian males (αα/αβ/ββ = 22), Norwegian females (αα/αβ/ββ = 10) and Norwegian males (αα/αβ/ββ = 5).Differences between αα and ββ genotypes were assessed by OPLS-DA using the 'ropls"-package in R [41] and PERMANOVA on Euclidean distances using the 'vegan' package in R [42].OPLS-DA model significance was estimated by permutation (number of iterations = 999).

(c) Gas chromatography and electroantennographic detection
We tested whether females could sense male CHCs using GC-EAD with Canadian male CHC extract for Canadian females and Norwegian male CHC extract for Norwegian females.The effluent from the column was split into two equal parts to simultaneously record the compounds by a flame ionization detector and the electrophysiologically response of fly antenna mounted to electroantennographic detector (EAD).We required five or more successful GC-EAD recordings to consider a compound electrophysiologically active.Full details of the GC-EAD procedure are found in the electronic supplementary material.

(i) Statistical analyses of perception
We analysed the GC-EAD data in three different ways.All approaches were done on Norwegian and Canadian flies separately as they were given extracts from their own populations.First, we searched for multivariate patterns in perception by performing a PCA on EAD signal intensities in µV.We supplemented this with two different univariate approaches.We used the DESeq2 1.26.0 framework to determine which compounds may have elicited different reactions from different genotypes [43].We added 1 to every value in the matrix and used the likelihood ratio test to determine if a model with genotype performed better than the reduced model with only an intercept.We considered an adjusted p-value (FDR) < 10% to be significant.We also ran univariate tests using a GLM framework.As our data were overdispersed and contained many zeros, we opted for a zero-inflated negative binomial regression which performed better (lower dispersion) than a zero-inflated Poisson regression.We implemented this regression in R using the pscl package [44].As with DESeq2, we used a likelihood ratio test to determine if our model was better than the null model and considered an adjusted p-value < 10% to be significant.For our final list of significant compounds, we used the overlap between the DESeq2 and GLM approaches.

(ii) Analysis of EAD active compounds
Retention indexes were calculated from the GC-EAD/FID chromatograms and translated into expected retention times in the GC-MS chromatograms of the CHC analysis.By comparison of the expected retention times together with the GC-FID elution patterns, the EAD active peaks were identified in the GC-MS readings and peak areas manually extracted.The peak areas were normalized on the internal standard peak area and the weight of the fly prior to analysis.For each EAD active compound, differences between genotypes within sex and differences between males and females were analysed using a GLM approach implemented in R with sex, genotype and their interaction as potential factors.Contrast statements to specifically test for the difference between αα and ββ in males and females were implemented post hoc.

(d) Differential expression analysis
We used several methods to identify candidate genes for CHC biosynthesis (CHCB) and odorant detectors which are detailed in the electronic supplementary material.We used previously reported bulk RNA-seq data (see [45] for details), which contains whole animal RNA of C. frigida from several Swedish and Norwegian populations, to examine differential expression at candidate genes for CHCB as well as odorant detection (OD).
We subset this dataset to only retain data from adults (three αα males, three αα females, five ββ males and six ββ females).
While RNA was taken from non-virgin females, we expect the expression of OD to remain meaningful for mate choice because females mate several times in their lifetime [35].We used previously generated count matrices and used DESeq2 [43] to determine differentially expressed genes between αα and ββ in males, females and a combined analysis with both sexes.After the DESeq2 analysis but before applying an FDR correction the results were subset to only include our candidate genes.Conventional thresholds (log 2 fold change > 2, adjusted p-value (FDR) < 5%) were used to identify differentially expressed transcripts within our candidate subset.

(i) Analyses of clusters of OD transcripts
For examination of genomic locations, we subset our list to only include transcripts that had a blast match containing one of the following terms 'odorant-binding protein', 'gustatory receptor' or 'odorant receptor' or a pfam annotation of GOBP (PF01395).

Results and discussion (a) CHC composition differs between supergene genotypes in C. frigida males
To quantify the role of Cf-Inv(1) in CHC composition, we collected CHC profiles using gas chromatography and mass spectrometry (GC-MS) from a total of 279 C. frigida and analysed variation between sexes and genotypes.We included populations from both North America (Kamouraska, QC, Canada; 47.56294, −69.87375) and Europe (Østhassel, Norway; 58.07068, 6.64346).While the demographic and historic processes leading to the colonization of both coast of the Atlantic are yet unknown, the two geographical populations represent natural replicates of polymorphism persistence.The samples included 20-25 individuals of each sex, genotype (αα, αβ, ββ) and population combination although due to contamination, sample sizes for different analyses varied (see electronic supplementary material, table S1 for a full summary of sample sizes).
The overall CHC composition of males, visualized by PCA, showed some differentiation between the αα and ββ genotypes on both continents, with the αβ genotype as an intermediate phenotype (electronic supplementary material, figure S1).The effect of genotype (αα versus ββ) on CHC composition was further investigated by OPLS-DA and PERMANOVA (figure 1).OPLS-DA separates variation in CHC composition into variation correlated with genotype (the predictive component = between group variation) and other systemic variation uncorrelated with genotype (the orthogonal components = within group variation).In males, the two homozygous genotypes showed parallel separation by the OPLS-DA models on both continents (Norway R2Y = 0.91, Canada R2Y = 0.92; figure 1a), which also showed reliable predictive performances (Norway Q2 = 0.57, Canada Q2 = 0.83; figure 1a).Furthermore, 18% (Canadian males) and 24% (Norwegian males) of the total variation in CHC composition contributed to the separation of the αα and ββ genotypes.The effect of the supergene on CHC composition was further supported by the PERMANOVA results (figure 1b) demonstrating significant differences ( p < 0.05) between Cf-Inv(1) genotypes in both Norwegian and Canadian males.
By contrast, very little variation in female CHC composition could be explained by Cf-Inv(1) genotype.The overall CHC composition of females, visualized by PCA, showed no differentiation between genotypes in either population (electronic supplementary material, figure S1).The OPLS-DA approach revealed that only 7% of the total variation in CHCs was associated with the genotype in Canadian females.Despite reliable predictive performance (Q2 = 0.61), the separation of the genotypes was less pronounced than in males (RY2 = 0.73; figure 1a).In the Norwegian females, no reliable predictive OPLS-DA model was obtained (RY2 = 0.64, Q2 =−0.04) demonstrating that genotype is a poor predictor for the variation in CHC composition.A PERMANOVA approach also demonstrated a significant difference ( p < 0.05) between genotype in Canadian but not Norwegian females (figure 1b).
Overall, these results support the hypothesis that CHC composition between supergene genotypes varies much more strongly in males than in females.CHC profile thus appears to be a good candidate trait for female mate choice in this system, and is possibly the signal underlying disassortative mating in relation to Cf-Inv (1).Intraspecific variation in pheromones is increasingly recognized in insects [50] and has been connected to supergenes or chromosomal rearrangements.For example, the Sb supergene in fire ants, which controls colony organization, also affects the CHC composition of queens [51][52][53]  royalsocietypublishing.org/journal/rspb Proc.R. Soc.B 290: 20231494 reject queens without it [52,53].Additionally, a segregating putative inversion in the European corn borer moth (Ostrinia nubilalis) contributes to intraspecific differentiation between insect pheromone strains [54].By tightening linkage between multiple alleles, supergene architecture may be particularly favourable for maintaining intraspecific divergence in complex traits such as chemical signalling.
(b) Male CHCs are perceived by C. frigida females, with slight differences between supergene genotypes For CHCs to facilitate disassortative female choice, C. frigida females must be able to sense male CHCs, in particular the compounds that differ between genotypes.To investigate this, we measured female perception of male CHCs using GC-EAD.During mating, males place their forelegs over the female's head directly in contact with her antennae (figure 2a) so we focused on antennal perception.Reliable EAD readings could be obtained from 48 Norwegian females (13 αα, 20 αβ and 15 ββ genotypes) and 52 Canadian females (14 αα, 20 αβ and 18 ββ genotypes).In total, the females reacted to 39 compounds in the pooled male extracts (i.e.mixed extract from αα, αβ and ββ males) demonstrating that male CHCs are actively perceived by females and are relevant candidates for sexual communication.However, this data was highly variable, with individual females sensing between 1 and 15 compounds and most compounds being sensed by less than half of the females in each group (electronic supplementary material, figure S2).
To further understand sexual communication in relation to the supergene, we tested which cuticular compounds, among the ones perceived by females, showed differences in relative concentration between genotypes and sexes (table 1; electronic supplementary material, table S2).Of the 39 female EAD active compounds, 35 could be successfully identified in the recorded GC-MS chromatograms (figure 2b; electronic supplementary material, table S3).Three compounds had very low intensities in the fly extracts (Cp2 and Cp27) or uncertainties in the peak assignment (Cp10).Compounds 21 and 22 were further treated as one single compound as these peaks were not sufficiently resolved in the chromatograms.Eleven of the 35 identified candidate compounds showed a consistent, significant effect of sex between continents (table 1; electronic supplementary material, table S2).Moreover, several showed an effect of genotype or an interaction between genotype and sex (Norway-seven compounds; Canada-nine compounds; table 1).The effect of genotype was mostly restricted to males (table 1) concordant with our findings on the overall CHC composition.Restricting our analysis to males revealed that patterns between continents were somewhat consistent between populations.Six compounds showed significant and concordant effects in both populations, five compounds showed significant but opposing effects and an additional 16 were significant in only one population.Eight compounds showed no difference between males of different genotypes in either population.Intriguingly, all compounds showing increasing but opposing effects showed increased concentrations in αα males in Norway and ββ males in Canada.This matched the general patterns in both continents; 12/15 significant compounds in Norway were higher in αα males compared to 8/23 in Canada.The compounds displaying similar genotype/sex effects across continents may reflect either parallel evolution    Compounds showing significant differences in perception between genotypes are highlighted in red.(c-j) Boxplots of reaction (in μV) of female antennae to compounds with statistically significant differences between genotypes using both the DESeq2 and GLM approaches.Below each boxplot is the geometric mean for that group (all zeros have been changed to 0.001).
royalsocietypublishing.org/journal/rspb Proc.R. Soc.B 290: 20231494 or that the initial coupling of female perception and male traits evolved before C. frigida spread to other continents while compounds differing between continents suggest that sexual selection for male CHC composition is likely also ongoing independently in both North America and Europe.
If CHC composition facilitates disassortative mating with respect to Cf-Inv(1), we expect that chemical perception or signal processing in females may vary between genotypes.A PCA on female antenna responses (in µV) to all 39 compounds showed no group separation according to genotype in either continent (PCA, electronic supplementary material, figure S3).Although the number of compounds sensed by females did not vary between genotypes the coefficient of variation (CV = σ/μ) did.In both continents, the variance around the mean decreased with copies of α (i.e.CV αα > CV αβ > CV ββ; electronic supplementary material, figure S4).We supplemented these multivariate analyses with univariate analyses on all compounds that could be reliably identified and integrated in males and were sensed by the population being analysed (Canada: 24 compounds; Norway: 30 compounds).We also removed females that were outliers in the PCA analysis (Canada-four females; Norway-three females).The DESeq2 analysis on female antennal response revealed four compounds that were differently perceived between genotypes in Norwegian females and 17 compounds in Canadian females.The GLM approach on the same data identified one compound in Norwegian females and nine compounds in Canadian females.The overlap between the approaches yielded one compound (Cp21) that showed different patterns of chemoreception between the genotypes in both Norwegian and Canadian females and six additional compounds in the Canadian females (figure 2c-j).Five of these seven compounds differentially perceived by Canadian females differed in relative concentrations between Canadian αα and ββ males (table 1).The one compound differentially perceived by Norwegian females showed increased concentrations in Norwegian αα males when assessed in the poorly resolved peak together with Cp22.
To act as a mating signal, a compound has to be sensed, processed and ultimately result in a behavioural response [55].Overall, our results show some potential differences in chemoreception between females but not overwhelming ones.This is partly due to the high inter-individual variation in the electric signal and the heterogeneous number of compounds perceived.Together these effects made it difficult to draw firm conclusions from our GC-EAD data, despite our large sample size.The observed variance may be both technical and biological.For example, if the distribution of olfactory sensilla is heterogeneous on the antenna, small differences placing the antenna towards odour stream eluting from GC could result in varying responses, as observed in Dacus oleae fly [56].Thus, it is clear that examining chemoreception using multiple methodologies in combination with behavioural experiments will be critical to further assess the functional role of female-perceived CHCs in disassortative mating for this species.
LG 2 LG 3 LG 4 LG 5 LG 6 LGL LNVDF S LVQGV S I ------   sex interaction, a positive value indicates that ββ males had higher values than αα males and there was no significant genotype effect in females.A negative genotype × sex interaction indicates that αα males had higher values than ββ males and there was no significant genotype effect in females.The last column indicates compounds that are perceived differently between αα and ββ in at least one population, as assessed by a combination of methods.We sought to more thoroughly explore the basis of male traits and female chemoreception using differential expression analyses.Although the pathway of cuticular hydrocarbon biosynthesis (CHCB) is well established [38] a recent study revealed the effect of single genes on overall CHC composition is highly complex [57].Thus, we took a more general approach; we identified 263 candidate transcripts for both CHCB and OD.This group included chemosensory receptors (gustatory and odorant), odorant-binding proteins (OBPs), sensory neuron membrane proteins, odorant degrading enzymes and ion channels, all noted as groups of interest in a recent review [58].We tested the overall patterns of differential expression between genotypes in both sexes using previously published RNA-seq data from 17 European adults (3 αα ♀, 6 ββ ♀, 3 αα ♂, 5 ββ ♂) [45].We performed three separate analyses: one with both sexes, one with males only and one with females only.Twenty-nine transcripts putatively acting on chemical communication were significantly differentially expressed between αα and ββ, with some overlap between analyses (electronic supplementary material, figure S5 and table S4).For the CHCB transcripts putatively involved in CHC synthesis, the signal was largely driven by males: differential expression between genotypes of 11/ 12 CHCB transcripts was restricted to males, and 7/12 CHCB transcripts were over-expressed in males compared to females (electronic supplementary material, table S4a).This is fully consistent with the observed larger effect on male CHC composition.Conversely, differential expression of 11/17 OD transcripts was significant in multiple analyses and not restricted to a single sex (electronic supplementary material, table S4b).

GC-MS
To ask whether the differentially expressed genes were located within Cf-Inv(1), we mapped our candidate transcripts to v.1.0 of the C. frigida genome [23].The two groups of genes showed strongly different patterns of localization.CHCB differentially expressed transcripts were widespread in the genome, with only one transcript mapping to the supergene, while nine OD transcripts mapped to Cf-Inv(1).This is significantly more than expected (52.9% of differentially expressed OD transcripts compared to 17% of all tested transcripts; Fisher's exact test p = 0.0053).Therefore, on the signal side, although the effect of Cf-Inv(1) on CHC composition was, as predicted, limited to males for both the phenotype and gene expression, it appears to be trans in relation to Cf-Inv(1) itself.This is in line with strong overall trans effects of Cf-Inv(1) in males particularly [45].However, we still found clustering of CHCB loci, similar to what has been found in Heliconius butterflies [59].As males and females share a genome, sex-specific changes in expression via cascading effects are more likely [60].By contrast, on the reception side, the excess of OD genes within Cf-Inv(1) indicates a disproportionate effect of Cf-Inv(1) on chemoreception.
Next, we examined the distribution of OD genes across the genome regardless of expression status.We further subset our putative OD genes to look at odorant receptors (ORs), gustatory receptors (GRs) and OBPs as these form the base of chemoreception [61,62] and were found to be prime candidates for mate choice in Heliconius [63].We found 63 transcripts that mapped to the genome (electronic supplementary material, table S5).
The distribution of these transcripts was non-random, as several transcripts with similar or identical annotations were clustered at close proximity in blocks of 2-10 transcripts (figure 3a; electronic supplementary material, table S5).Such a clustering may be the result of tandem duplications and is frequently observed for OBPs and chemosensory receptors in insects [65][66][67][68].Evolution by tandem duplication and subsequent divergence (e.g. the birth-death model of multi-gene families [69] is proposed to have generated the large and diverse OD gene families identified in insects [65,67,70,71].While there was not an excess of OD genes within Cf-Inv(1) compared to its size, it had an excess of paired transcripts with overlapping coordinates (labelled with A/B; figure 3a).
We found six pairs of transcripts with overlapping coordinates, three of which mapped within Cf-Inv(1).This pattern could be due to isoforms, exon duplications, errors in transcriptome assembly, or divergence.Two of these pairs (9.07 Mb and 30.69Mb) were of particular interest as they showed opposite expression patterns (electronic supplementary material, table S4b).One transcript of each pair was assembled from an αα individual and the other transcript from a ββ individual, indicating that they probably represent alternative alleles of the same ancestral genes.Reusing previously published whole-genome sequences [23], we observed that both genomic regions are characterized by high genetic differentiation between α and β (F ST = 0.96-0.97and d xy = 0.04-0.07compared to F ST = 0.87 and d xy = 0.02 in Cf-Inv(1) overall), and heterogeneous coverage in ββ (figure 3b), which may reflect either low mapping of those sequences on an αα reference genome, or ββ-restricted deletions.Comparing the protein sequences of these pairs showed striking divergence in both amino acid identity and biochemical properties (figure 3c,d).The third pair within Cf-Inv(1) was also somewhat divergent, but less so (electronic supplementary material, figure S6).This indicates that the overlapping pairs within Cf-Inv(1) are due to divergence between supergene haplotypes.We note that after merging these pairs there was still a marginally significant excess of differentially expressed OD transcripts within Cf-Inv(1) (Fisher's exact test p = 0.0996).Thus, we found divergence in both coding sequences and expression.In most insects, diversification of tandem duplicates takes place between species [65,67], however, in C. frigida, we propose that early divergence in chemical signalling may have taken place within species, between arrangements of Cf-Inv(1), although data from sister species is necessary to better understand the evolutionary history of those genes.A critical next step will be to perform ligand binding assays on these two alleles to determine if they differentially bind any of our focal CHCs.Overall, the duplication and divergence of OD genes between the two arrangements of the supergene is comparable to patterns observed between sister species of insects [72].This is in line with fast evolution of OBPs and chemosensory genes [67] and suggests that supergenes provide a genetic architecture favouring the rapid divergent evolution of mate choice including at the intraspecific level.
(d) Chemical signalling is a putative modality underlying disassortative mating and the persistence of supergene polymorphism Theory predicts that disassortative mating requires either a self-referencing system where individuals use their own phenotype to choose a mate or tight linkage between mating royalsocietypublishing.org/journal/rspb Proc.R. Soc.B 290: 20231494 signals and preferences [21,22].As CHC profile differ between sexes but female CHCs vary little by genotype it is unlikely that a self-referencing mechanism guides mate choice relative to Cf-Inv(1).Our results indicate that Cf-Inv(1) affects male CHC composition and probably plays a role in female chemoreception.However, more work remains to be done to understand differences in signal reception and processing in females.At this juncture, we cannot definitively link our results to female choice but based on our data we hypothesize that disassortative mating in this system may function via tight linkage between male mating signals and female preferences.The evolution of such a system in C. frigida supports theoretical predictions as overdominance creates a selective advantage for disassortative mating.Disassortative mating, and a linked architecture between trait and preference, are thus under strong selection, as choosing a mate with an alternate genotype ensures higher fitness.Cf-Inv( 1) is strongly overdominant.Heterozygotes enjoy about a 25-75% increase in fitness due to a life-history trade-off between homozygotes [29] as well as about a 10-70% increase in survival (depending on environmental conditions) likely caused by masking of deleterious recessive alleles [10,19,20,73].A potential pleiotropic effect of Cf-Inv(1), on male traits, female chemoreception and the genes underlying overdominance, would result in a feedback loop that should strengthen disassortative mating and stabilize the polymorphism as suggested by mate choice theory [21,22] and speciation theory [74,75].Although we cannot definitively tie differential CHC composition between Cf-Inv(1) genotypes with mate choice behaviour, our data point towards this tantalizing hypothesis.

Conclusion
Overall, we show that CHCs are a strong candidate for facilitating disassortative mating in C. frigida; males show strong differences in chemical composition by genotype, females can sense these compounds, and there are pieces of phenotypic and genetic evidence indicating that female chemoreception may vary between genotypes.More broadly, our findings highlight the importance of genetic architecture for the evolution of intraspecific diversity and the persistence of supergene-associated polymorphism.At a coarse scale, the reduction in effective recombination that favours the supergene is also putatively responsible for the accumulation of deleterious mutations leading to heterozygote advantage.The resulting selective pressure then favours disassortative mating.Potential linkage between the genes underlying male signal and the genes underlying female perception would provide an ideal architecture for the persistence of disassortative mating itself.Moreover, at a finer scale, the duplication of OD genes within the supergene region possibly contributed to the rapid divergence of signal reception.In this system, disassortative mating coupled with other mechanisms of balancing selection, such as spatially varying selection, preserve the coexistence of supergene haplotype across long-time scales, further enhancing the accumulation of divergence.

Figure 1 .
Figure 1.CHC composition varies by genotype in males but not females.(a) OPLS-DA analysis of CHC composition in Norwegian and Canadian populations.Figures are divided by population and sex and coloured by genotype: orange, αα; purple, ββ.(b) PERMANOVA results for males.(c) PERMANOVA results for females.Models were run separately for each population × sex combination.Only αα and ββ individuals were used in all analyses.

Figure 2 .
Figure 2. Female detection of CHCs varies by genotype.(a) Behaviour of Coelopa frigida during mating; note the legs of the male in direct contact with the female's antenna.Photo credit: Swantje Enge and Per Larsson.(b) Example chromatogram from an αα male with the identified EAD active peaks (Cp1-Cp39).Compounds showing significant differences in perception between genotypes are highlighted in red.(c-j) Boxplots of reaction (in μV) of female antennae to compounds with statistically significant differences between genotypes using both the DESeq2 and GLM approaches.Below each boxplot is the geometric mean for that group (all zeros have been changed to 0.001).
R R I L P WD I I L V K T L I G S M I F L L T I S G L S P V Q F S I R E N K Y V S S K L N I L Y S L M F S R RM L P WD I V L V K T L I G S M I F L L T I S G L S P V Q F S I R K N K Y V S S K L N I L Y S L L V N V A F T A F YMK S I Y N D Y L L D K I D L N D S V QMY D YMN I I T AMV S I N AM L I V A K N L V N V A F T A F YMK S I Y N D Y L L D K I D L N D S V QMY D YMN I I T AMV S I N AM L I V A K S ML R F L Q N V P F F D L L S F L G L P K DN I K Y A VA R VA VK A I L L P F I I Q I T L L V R HK H I M L R F L Q N V P F F D L L R F L G L P K D S I K Y A V A R V A V K A I L L P F I I Q I T L L V R H K R N A P E A S S L S TM P T I F T F I L S RM I P N C L F G A I T V C C Q F L N A L N T R L S K I K T E V N Y A P G A S S L D TM P T I F T F I L S RM I P N

Figure 3 .
Figure 3. Divergence of odorant detection genes within the supergene.(a) Clusters of odorant detection genes identified in the Coelopa frigida genome.Putative genes were labelled as OR-odorant receptor, GR-gustatory receptor, OBP-odorant-binding protein.Numbers correspond to their order in the genome starting from LG1 position 1.Overlapping pairs were labelled with the same number and title but with A and B afterwards.The position of Cf-Inv(1) is shown in purple.Visualization was done with karyoploteR [64] (b) Coverage: average depth of sequencing across genotypes (yellow-αα, purple-ββ) at all positions along the focal genomic regions.Expected depth is around 1.1X.Bars indicate the position of the transcripts.(c,d ) Protein alignment of OBP2A/B (c) and GR1A/B (d ).Amino acids are coloured to show biochemical properties using the ClustalX colouring scheme.Protein structure predicted by Jpred is shown below with red bars indicating helices and green arrows indicating sheets.Below that is the AMAS conservation score with higher values indicating more conserved amino acids [48].

−
. Workers carrying the Sb supergene will accept queens or dummies with the correct CHC blend and

Table 1 .
Univariate analysis of 35 compounds quantified in adults and perceived by females.Values indicated under all are t-values from GLMs on log-transformed values of normalized peaks.Values under male and female are from post hoc contrasts done using the GLM results.Italics indicate a significant difference at p < 0.05.Corresponding adjusted p-values are found in electronic supplementary material, table S2.For sex, positive values indicate higher values in males compared to females and negative values indicate the opposite.For genotype positive values indicate higher values in αα individuals compared to ββ individuals and negative values indicate the opposite.For the genotype ×