Frequent origins of traumatic insemination involve convergent shifts in sperm and genital morphology

Abstract Traumatic insemination is a mating behavior during which the (sperm) donor uses a traumatic intromittent organ to inject an ejaculate through the epidermis of the (sperm) recipient, thereby frequently circumventing the female genitalia. Traumatic insemination occurs widely across animals, but the frequency of its evolution, the intermediate stages via which it originates, and the morphological changes that such shifts involve remain poorly understood. Based on observations in 145 species of the free‐living flatworm genus Macrostomum, we identify at least nine independent evolutionary origins of traumatic insemination from reciprocal copulation, but no clear indication of reversals. These origins involve convergent shifts in multivariate morphospace of male and female reproductive traits, suggesting that traumatic insemination has a canalizing effect on morphology. We also observed sperm in both the sperm receiving organ and within the body tissue of two species. These species had intermediate trait values indicating that traumatic insemination evolves through initial internal wounding during copulation. Finally, signatures of male‐female coevolution of genitalia across the genus indicate that sexual selection and sexual conflict drive the evolution of traumatic insemination, because it allows donors to bypass postcopulatory control mechanisms of recipients.


Stylet
We measured stylet length by placing two segmented lines along its sides ( Figure A1) and then taking the average of both. We also used this line to measure the stylet's curviness, by summing all the angles between the neighbouring line segments for each line and again taking the average. Due to this, longer stylets may tend to be curvier simply because they have more line segments and thus more summed angles. However, we decided against standardising by length since this would lead to autocorrelation in our principal component analysis (and since such standardisation is easy to implement for future use). We also measured the width of the proximal and distal stylet openings ( Figure A1). As a measure of distal asymmetry, we then took the absolute difference between the size of the two distal thickenings. Finally, we categorised the stylet according to how sharp its distal end is ("stylet sharpness") on a per species basis according to the criteria in Table A1. Measurements of stylets were performed using in situ images and videos of live worms (590 stylets) as well as images of smash preparations (462 stylets). We assessed if the preparation method introduced a measurement bias by comparing stylet length in specimens for which both in situ and smash preparations were available (103 specimens across 59 species). We found that in situ estimates were slightly shorter (on average by 3.1 µm; median by 3.1%; paired t-test: t=5.8, df=102, p< 0.001). Even though we introduced a slight bias by measuring under both setups, this allowed us to greatly increase the sample size, since stylets can be more challenging to measure in situ in some orientations. In cases where both estimates were available, we used the mean between the two. In some specimens, it was not possible to measure the stylet's length, but we still measured the width of the proximal and distal openings (103 cases). Conversely, a few times we were able to measure stylet length but not the width of the distal opening (11 cases), the width of the proximal opening (one case) or both (two cases). Finally, we could not spatially calibrate the images of 7 specimens, but still collected curviness information on these stylets.

Figure A1 Details on stylet morphometrics, measured either from images taken in situ in live worms (A, B) or from smash preparations (C, D).
In each panel, the top shows the raw image and the bottom the measured lines. The segmented red and green stylet length lines are used to determine stylet length and curviness, and the blue line measures the width of the proximal opening. Each purple segmented line consists of four points (a, b, c, and d) with a to b and c to d measuring the size of the 'red' and 'green' distal thickenings, respectively, and with b to c measuring the width of the distal opening (note that points a and d can be the same as points b and c, respectively, when a side lacks a distal thickening). To determine points b and c, the stylet length line that terminates first (green in panel D) is selected and a line orthogonal to its last line segment drawn through its endpoint (here considered point c). Where this orthogonal line intersects the second stylet length line (red in panel D) is the termination point of the second line and designated point b. In cases where the orthogonal line did not intersect with the other side of the stylet, we instead simply defined the end of the second line as point b. The size of the distal thickenings was then measured as the distance from the endpoints of the orthogonal line (e.g. a relatively symmetrical distal thickening in A and asymmetric thickenings in B-D), which is akin to increasing the line width of the length lines (red and green lines) and placing the point on the structure that is obscured last.

Sperm
We measured sperm length of 1765 sperm across 504 specimens by placing a segmented line along the midline, starting from the most distal part of the feeler and terminating just anterior of the brush (if present, Figure A2). Note that this measure of total sperm length is different from that used in a previous study on M. lignano where the sperm feeler was excluded (Janicke and Schärer 2010) but equivalent to the measure used in Schärer et al. (2020). This was necessary since, compared to M. lignano, the boundary between the feeler and the body of the sperm is less evident in many species. The feeler was excluded in Janicke and Schärer (2010) since they determined that repeatability was lower when it was included. However, its inclusion in the current study potentially captures some interesting variation. We measured sperm bristle length by placing a line along its axis. If possible, we measured both bristles of a sperm, averaged them per sperm, then averaged them per specimen before finally taking the average across all specimens measured per species (1021 bristles measured across 294 specimens). We also categorised sperm on a per species basis according to their bristle type ("sperm bristle state") and whether they carried a velum ("sperm velum") or a brush ("sperm brush") (Table A1). Among the species in which the sperm bristles are visible in the light microscope, these have a fairly continuous distribution. However, very small bristles likely do not serve a function, or at least are too small to anchor sperm in the antrum during the suck behaviour. In some species, the sperm bristles are so small that they do not even protrude outside of the sperm (e.g. M. sp. 50). We consider these bristles as reduced and categorised them as such to highlight our functional hypothesis. Since the trait is continuous, an exact cutoff point is hard to define, and we decided that M. poznaniense is the species with the smallest sperm bristles (3.2 µm) that we do not consider reduced ( Figure A3).

Female antrum
Female antrum morphology was categorised by examining all available material from a species and scoring the species as a whole. We examined the thickness of the overall antrum wall ("antrum thickness"), the presence of a visible anterior thickening of the female antrum ("cellular valve") (Ladurner et al. 2005), the complexity of the chambers in the antrum ("antrum chamber"), and we counted the number of female genital openings (Table A2). For the ancestral state reconstructions, the analysis of correlated evolution, and the PGLS on sperm length, we scored a combined measure of antrum thickness and cellular valve as binary ("antrum state"). Finally, we summed the antrum thickness, cellular valve, and antrum chamber scores to obtain a summary score ("antrum complexity") (Table A2). When the antrum wall is thickened throughout, it is more difficult to see if there is a cellular valve and potentially there is some conflict between the antrum thickness and cellular valve scorings. However, this would occur only in specimens that already achieve a high score on antrum complexity and might just introduce a slight downward bias for high values.

Received sperm location
Whenever possible, we examined specimens for received sperm in the antrum and within the tissue across the entire body. Based on our observations, we then summarised our findings for each species as categories with a binary and a trinary coding. In the binary coding, we grouped all species where we observed sperm within the tissue, irrespective of whether we also observed sperm in the antrum. In the trinary coding, we added a category especially for species with received sperm in both the antrum and the tissue (Table A1).  Given are the state definitions and how they were translated into ordinal coding.

Antrum thickness
Antrum wall very thin, only visible when egg is present 0 Thin wall, but structure also visible without egg present 1 Clear thickening 2 Strong thickening 3

Cellular valve
No clear anterior thickening 0 Clear anterior thickening 1 Thickening extends far anterior 2

Antrum chamber
One simple chamber 0 Elaborated chamber and/or vagina 1

Genital openings
One opening 1 Two openings 2

Antrum state
Antrum thickness <2 AND cellular valve = 0 0  (Sluys, 1986) (Sluys 1986) (previously also Promacrostomum; see Brand et al. 2021), that currently lacks phylogenetic placement. Each of these species has one opening that is, and one that is not, associated with cement or shell glands, and we refer to these as the gonopore and bursa pore, respectively. The gonopore is assumed to be homologous to the single female genital opening in all other Macrostomum, since it is also surrounded by cement glands (Sluys 1986). The presumably novel bursa pore is always associated with a small chamber (bursa) that has strong circular musculature (An-der-Lan 1939). Interestingly, the bursa pore is located anterior to the gonopore in M. spiriger (Xin et al. 2019), in M. paradoxum (cement glands not drawn by (An-der-Lan 1939), but confirmed by pers. obs., e.g. MTP LS 3476) and M. sp. 82 (pers. obs., e.g. MTP LS 2848), while it is posterior to it in M. palum (Sluys 1986) and M. gieysztori (Papi 1950); and in its three relatives; pers. obs., e.g. MTP LS 845), further supporting their independent origin. Moreover, in M. paradoxum the bursa is described as being connected to the gut via a genito-intestinal duct (An-der-Lan 1939), which could potentially allow sperm digestion. However, we were not able to locate this duct in our in vivo observations and may have to resort to histology to confirm its existence.

SI Pathways to hypodermic insemination
While 117 of the 145 studied species could be assigned to either the hypodermic or reciprocal mating syndrome, some species were not easily identified as performing either type of mating behaviour (Table 1), which may either have been due to a lack of detailed observations, or more interestingly, due to potentially transitional patterns that deviated from these syndromes. This included two species that we categorised as intermediate because we observed sperm both in the antrum and embedded inside the recipient's tissues (light green triangles in Figure 4). Both species (M. sp. 3 and M. sp. 101) have a sharp stylet and sperm with reduced bristles, fitting with the hypodermic mating syndrome (Fig. S2). However, we have scored both species as having a thickened antrum because M. sp. 3 has a visible antrum wall and a clear cellular valve, while M. sp. 101 has a strongly thickened antrum wall with the cellular valve extending far anterior (Table S3) show them to be embedded deeply in the anterior wall of the antrum and also more deeply in the tissue lateral to the body axis, extending up to the ovaries, as well as in the tail plate ( Figure  A4). And while in M. sp. 101 we did not observe sperm as deeply in the recipient's tissues, some were fully embedded in the cellular valve and just anterior to it, and thus close to the developing eggs ( Figure A5). One explanation for these findings would be that during mating, the stylet of both species pierces the recipient's antrum wall and sperm is traumatically injected into the body internally. Unfortunately, no copulations were observed in mating observations of M. sp. 101, but this species has been seen performing the suck behaviour, as expected if ejaculate is, at least partially, deposited in the antrum (P. Singh, pers. comm.). We currently lack mating observations for M. sp. 3 and further investigations of the mating behaviour and the antrum histology of both species would be highly desirable.
The observations in M. sp. 3 and M. sp. 101 suggest a possible route to hypodermic insemination via the initial evolution of a traumatic stylet in reciprocally mating species. A sharp stylet could provide anchorage during copulation, as potentially occurs in M. spirale (pers. obs.) and M. hamatum (P. Singh, pers. comm.), but it may also serve to stimulate the partner during copulation, or aid in the destruction or removal of rival sperm already present in the partner's antrum. Finally, internal wounding by the stylet may help to embed sperm in the antrum wall and/or cellular valve to prevent their removal, either by rival mating partners or by the recipient during the suck behaviour. However, the fact that we observe many Macrostomum species with stylets with blunt distal thickenings suggests that such internal wounding may not always be advantageous for the donor and that these structures may have evolved to avoid harm to the recipient (Schärer et al. 2011). Irrespective of the initial selective advantage that internal wounding may confer, it could then evolve further to complete internal traumatic insemination, and eventually complete avoidance of the female genitalia and hypodermic insemination via the epidermis. Accidental sperm transfer due to copulatory wounding has generally been suggested as a possible route from copulation to traumatic insemination (e.g. in traumatically mating bedbugs) (Lange et al. 2013;Reinhardt et al. 2015). In bedbugs, the attachment during mating is a twostep process, indicating that traumatic insemination was evolutionarily preceded by traumatic penetration for attachment (Lange et al. 2013). Similar transitions have also been proposed for Drosophila species in the melanogaster group, where extragenital wounding structures are typically used for anchorage during copulation. In traumatically inseminating Drosophila, these structures are modified and pierce the mating partner's integument to inject sperm into the genital tract (Kamimura 2007(Kamimura , 2010. In this light our findings are remarkable, because previous examples only compared species with and without traumatic insemination, whereas we potentially observe species "in the act" of transitioning to traumatic insemination. These intermediate species should be excellent targets for future studies of the costs and benefits of traumatic insemination, especially because the two candidates represent independent evolutionary transitions. Three additional species (M. sp. 14, M. sp. 51 and M. sp. 89) were also difficult to classify because, although their morphology indicates hypodermic insemination, we clearly observed received sperm in the antrum. M. sp. 51 and M. sp. 89 grouped with the hypodermically mating species in PC1 (red arrowheads in Figure 4), while we did not include M. sp. 14 in this analysis due to missing data for sperm bristle length. We found sperm within the antrum in only 1 of 4 specimens in M. sp. 51 and 1 of 12 specimens in M. sp. 89, and it is thus possible that sperm is hypodermically injected and later enters the antrum when an egg passes through the cellular valve into the antrum before egg laying. But these species could also represent an intermediate state between the mating syndromes. We observed sperm in the antrum in 3 of 5 specimens in M. sp. 14 and it therefore seems less likely that sperm entering during the transition of the egg into the antrum is the cause of its presence here as well. Instead, sperm is probably deposited in the antrum by the mating partner during copulation. However, based on its general morphology, we predict that closer investigations of this species will reveal hypodermic received sperm in a similar location as found in the other intermediate species (M. sp. 3 and M. sp. 101). Finally, we were not able to assign M. sp. 10 to a mating syndrome, because-although we found received sperm in the antrum and its sperm carry long bristles-it also has a sharp stylet and a simple antrum. From our previous findings, we would expect this species to have a thickened antrum due to its interaction with sperm and the mating partner's stylet. This discrepancy could possibly be attributed to misclassification of the antrum morphology, since M. sp. 10 has very pronounced shell glands, making it difficult to see the anterior part of the antrum, possibly obscuring a thickening or cellular valve (see specimen IDs MTP LS 788 and MTP LS 801 for a possibly thin cellular valve).

SI Correlated evolution
We performed tests for correlated evolution on three trait combinations with three different phylogenetic trees and three different priors for each test. To evaluate sensitivity to the prior, we ran all analyses with a uniform prior (U 0 100), an exponential prior (exp 10) and a reversible-jump hyperprior with a gamma distribution between 0 and 1 for both the rate prior and the hyperprior (rjhp gamma 0 1 0 1). To assess the influence of the phylogeny we conducted these tests on three different phylogenies (C-IQ-TREE, H-IQ-TREE,  Here we first show the mean transition rates inferred ( Figures A6-A8) and then give the posterior distributions for all transition rates and the likelihood for each combination of parameters ( Figures A9-A17). The analyses were moderately sensitive to the chosen priors, with the largest Bayes factors for the uniform prior, slightly lower values for the exponential prior, and substantially lower values for the reversible-jump hyperprior in all tests. The reversible-jump hyperprior in some cases resulted in posterior distributions of the likelihood with multiple peaks, especially in runs using only the species with a transcriptome (H-IQ-TREE and H-ExaBayes). These models also showed two peaks in the distributions in the rates of transitions that were very rare (e.g. the transition away from hypodermic received sperm, which likely did not occur in these datasets) and were thus difficult to estimate and more sensitive to the prior. This phenomenon can also be seen in these rates' posteriors using the uniform and the exponential priors since the posteriors are very similar to the priors. Reversible-jump hyperpriors resulted in more well-behaved posteriors when running on the C-IQ-TREE phylogeny, likely because more data was available to influence the posteriors and because of the presence of species with thickened antrum state in the finlandense clade, which allows for all rates to be estimated more easily. In the main text, we present the results from C-IQ-TREE runs with the exponential prior, because these runs had the highest marginal likelihoods of the dependent model in all analyses.   Figure 4C). ). Species sample size was 94 for H-IQ-TREE & H-ExaBayes, and 123 for C-IQ-TREE.

Figure A9. Posterior distributions of the rate parameters (left) and likelihoods (right) for BayesTraits runs with the H-IQ-TREE phylogeny, and received sperm location and sperm bristle state.
Histograms show values from 2x10 6 samples from four converged chains and are scaled so that the largest bin has a height of unity. Rate panels are arranged so that rates that would be equal in the independent model are arranged vertically (e.g. 00 to 10 would be equal to 01 to 11).

SI Taxonomy
The analyses we present here have implications for the taxonomy and nomenclature of the Macrostomidae (Tyler et al. 2006), and they support taxonomic and nomenclatural revisions. We outline these changes in the following, using the "Genus species Author, Year" format to refer to genera and binomials, and additionally including all the citations for the relevant works.
Archimacrostomum and Inframacrostomum should be dropped As part of his monograph, Ferguson (Ferguson 1954) established two new genera based on the absence of parts of the sexual system. On the one hand, the genus Archimacrostomum Ferguson, 1954 is characterised by an incomplete male sexual system, specifically a lack of the vesicula granulorum, a prominent structure that usually connects the muscular true seminal vesicle to the proximal stylet opening, and which contains complex muscles and serves as the entry point of the necks of the prostate gland cells into the stylet lumen. On the other hand, the genus Inframacrostomum Ferguson, 1954 is characterised by an incomplete female sexual system, specifically the lack of a discrete female antrum (which Ferguson refers to as the "female genital atrium"). Contemporary workers, namely Ax and Papi, strongly opposed the erection of these two genera (Ax 1959;Papi 1959), primarily because "The absence of a specific character as a single trait is [in my view] completely unsuitable as a basis for the erection of independent genera." (our translation from Ax 1959). These critical assessments were later also supported by Schärer et al. (2011) and Schockaert (2014). Both Ax and Papi clarified that species assigned to Archimacrostomum do not lack the vesicula granulorum, but that it can sometimes simply be accommodated inside a wide proximal stylet base (Ax 1959;Papi 1959). Concerning the absence of a discrete female antrum in Inframacrostomum, both authors point out that this structure can be more or less prominent and thus cannot serve as grounds for a generic placement. These early considerations are underlined by the striking levels of convergent evolution revealed by Schärer et al. (2011) and the current study, which clearly show that the antrum has been simplified multiple times independently and hence cannot serve as synapomorphy for these genera. The clear recommendations by Ax and Papi have not been followed uniformly, however, and both Archimacrostomum and Inframacrostomum still appear in both the literature (e.g. Faubel and Warwick 2005) and in online databases (e.g. http://www.marinespecies.org/aphia.php?p=taxdetails&id=142052), although more often the former genus than the latter. We here now formally place all the species that Ferguson moved to these genera back to their original combinations in the genus Macrostomum. From Archimacrostomum we thus reinstate Macrostomum beaufortense Ferguson, 1937(originally named Macrostomum beaufortensis Ferguson, 1937 (Ferguson 1937), Macrostomum hustedi Jones, 1944(Jones 1944, and Macrostomum brasiliense Marcus, 1952 (originally named Macrostomum appendiculatum forma brasiliensis Marcus, 1952) (Marcus 1952). From Inframacrostomum we reinstate Macrostomum rubrocinctum Ax, 1951(Ax 1951. Moreover, we also reassign a subsequently named species, namely from Archimacrostomum, we establish Macrostomum sublitorale (Faubel & Warwick, 2005) (Faubel and Warwick 2005), and reinstate two additional species that were also transferred to Archimacrostomum by these authors, namely M. pusillum Ax, 1951(Ax 1951 and M. peteraxi Mack-Fira, 1971(Mack-Fira 1971.