The macroevolution of size and complexity in insect male genitalia

The evolution of insect male genitalia has received much attention, but there is still a lack of data on the macroevolutionary origin of its extraordinary variation. We used a calibrated molecular phylogeny of 71 of the 150 known species of the beetle genus Limnebius to study the evolution of the size and complexity of the male genitalia in its two subgenera, Bilimneus, with small species with simple genitalia, and Limnebius s.str., with a much larger variation in size and complexity. We reconstructed ancestral values of complexity (perimeter and fractal dimension of the aedeagus) and genital and body size with Bayesian methods. Complexity evolved more in agreement with a Brownian model, although with evidence of weak directional selection to a decrease or increase in complexity in the two subgenera respectively, as measured with an excess of branches with negative or positive change. On the contrary, aedeagus size, the variable with the highest rates of evolution, had a lower phylogenetic signal, without significant differences between the two subgenera in the average change of the individual branches of the tree. Aedeagus size also had a lower correlation with time and no evidence of directional selection. Rather than to directional selection, it thus seems that the higher diversity of the male genitalia in Limnebius s.str. is mostly due to the larger variance of the phenotypic change in the individual branches of the tree for all measured variables.


INTRODUCTION
Insect genitalia have been the focus of much attention since the middle of the 19th century, when their taxonomic value to diagnose species with otherwise very similar external morphologies was recognised. It soon became apparent that this variability should play a prominent role in reproductive isolation and speciation (e.g., Dufour, 1848), generating much debate on the causes of its origin and evolution (Eberhard, 1985;Eberhard et al., 1998;Simmons, 2014). Sexual selection is now widely acknowledged as being the major force driving the evolution of animal genitalia (Hosken & Stockley, 2004;Simmons, 2014), but there is still little consensus on what are the dominant factors (such as male competition, male-female sexual conflict or pure female choice) or the main macroevolutionary trends generating their diversity (Simmons, 2014).
One of the few widely accepted general trends is the negative allometric scaling of most genital traits (Eberhard et al., 1998;Hosken, Minder & Ward, 2005;Eberhard, 2009), which suggests that genital morphology is often subject to stabilising selection. However, genitalia are also considered to be among the fastest evolving traits in arthropods, clearly differing in Table 1 Average values of the measured variables in the outgroup (Laeliaena), the reconstructed ancestor of Limnebius, and the reconstructed ancestor and the extant species (maximum and minimum values) of the two subgenera. per, perimeter; fd, fractal dimension; lm, male body length; lf, female body length; lg, length of the male genitalia.  (Eberhard, 1985;Rowe & Arnqvist, 2011). This contrast-generalised stabilising selection but fast macroevolutionary diversification-raises the intriguing question of how divergence in genital characters is initiated and amplified to reach the extraordinary variability observed in many groups (Simmons et al., 2009;Hosken & Stockley, 2004).

Laeliaena
Most of the work on the evolution of genitalia has been conducted on reduced groups of species at limited temporal scales. There is very little quantitative data on the rate and mode of evolution of genital traits in diverse, old lineages, which could help to understand the origin of the large-scale variation in genital morphology (Hosken & Stockley, 2004;Simmons, 2014). In this work, we study the macroevolution of the size and complexity of male genitalia in a diverse insect lineage with a very uniform external morphology but extraordinary male genital variability, the aquatic beetle genus Limnebius (family Hydraenidae). Limnebius is divided in two sister lineages, one (subgenus Bilimneus) with uniformly small species (0.8-1.1 mm) with very simple genitalia, while the species of the other (subgenus Limnebius sensu stricto, ''s.str.'' onwards) are much more variable in size (0.9-2.5 mm) and structure of the male genitalia ( Fig. 1; Table 1 ;Perkins, 1980;Jäch, 1993;Rudoy, Beutel & Ribera, 2016). We obtained different measures of the complexity and size of the male genitalia and the body size of males and females of a comprehensive representation of species of both subgenera to (i) determine the phylogenetic signal and possible correlated evolution of the studied traits; (ii) compare their evolutionary rates; and (iii) estimate the distribution of the reconstructed phenotypic change through the phylogeny. Our expectation is that the comparison of the rate and mode of evolution of the size and complexity of the male genitalia between two lineages with strikingly contrasting macroevolutionary trends will contribute to understand the origin of their extraordinary diversity.

Morphological measurements
Limnebius includes ca. 150 species with an almost cosmopolitan distribution, all of them aquatic, living in all types of continental waters with the only exception of saline habitats Dashed lines, terminal branches of species with no quantitative data for the aedeagus, deleted in the analyses (see text). Numbers in nodes, posterior probability as obtained in BEAST. Habitus photographs reflect maximum size variation within each subgenera; upper row: L. cordobanus and L. fretalis, lower row: L. extraneus and L. oblongus (marked in bold in the tree). Photos of the aedeagus in the right column standardised to the same size; scales refer to the size of the aedeagus of L. fretalis (the largest, 1.21 mm). (Perkins, 1980;Jäch, 1993;Hansen, 1998;Rudoy, Beutel & Ribera, 2016). There is very few information on the ecology and behaviour of the species of Limnebius, including their sexual habits. In a recent work, Limnebius has been shown to be divided in two sister lineages with an estimated Oligocene origin, the subgenera Bilimneus and Limnebius s.str., with ca. 60 and 90 described species respectively (Rudoy, Beutel & Ribera, 2016).
We characterised the size and complexity of the male genitalia of all species included in the molecular phylogeny (see below and Table S1). The structure of the female genitalia of Limnebius is poorly known, mostly due to the lack of sclerotized parts (Rudoy, Beutel & Ribera, 2016), and was not studied here. We measured body length of adults (males, lm and females, lf ) as the sum of the individual maximum lengths of pronotum and elytra, as the different position of the articulation between the two could alter the total length when measured together. Similarly, the head was not measured, as in many specimens it was partly concealed below the pronotum. Measures were obtained with stereoscope microscopes equipped with an ocular micrometer.
Male genitalia (aedeagi) were dissected and mounted on transparent labels with dimethyl hydantoin formaldehyde (DMHF). For size measurements we used as a single value the average of each measure in all studied specimens of the same species (Table S1). For the measures of complexity a single specimen was used, as species show in general a very constant shape of the aedeagus (Jäch, 1993;Rudoy, Beutel & Ribera, 2016), and there is no evidence for the widespread existence of cryptic species within the genus (unpublished data). Male genitalia were always photographed in the same standard positions, orientated according to the foramen in ventral and lateral views. We measured the maximum length of the male genitalia (lg ) on digital images obtained in the same standard position, orientated in ventral view as determined by the foramen (Rudoy, Beutel & Ribera, 2016). We did not include setae or apical membranous structures, but included appendages when they were longer than the median lobe (as in e.g., some species of the L. nitidus group, Rudoy, Beutel & Ribera, 2016). Measurements were directly obtained from the digital images using ImageJ v.1.49 (US National Institutes of Health: http://imagej.nih.gov/ij/) (Fig. S1). We estimated experimental error by measuring the same specimen of three species on three different sessions, using two sets of images.
We used two different measures to characterise the complexity of the aedeagus: (i) Perimeter (per) of the aedeagus in ventral view, including the median lobe and the main appendages. We obtained an outline of the genitalia from digital images using ImageJ, with the ''free hand outline'' option. The total perimeter was the sum of the values of the different parts of the genitalia (median lobe and left parameter, plus main appendages if present, see Rudoy, Beutel & Ribera, 2016). We standardised the values by dividing the perimeter by the length of the aedeagus, to obtain a measure of complexity by unit of length ( Fig. S1 and Table S1).
(ii) Fractal dimension (fd). We estimated the fractal dimension of the outline of the aedeagus in ventral view on images of standard size (2100x2100 pixels, 2000 pixels from base to apex of the aedeagus) with the software Fractal Dimension Estimator (http://www.fractal-lab.org/index.html). This software estimates the Minkowski fractal dimension of bidimensional images using the box-counting method (Falconer, 1990). The software converts the image to binary data, selects the scaling window of the box, and counts how many boxes are necessary to cover the image. The absolute value of the slope of a log-log graph of the scale with the number of boxes is the fractal dimension of the image (Fig. S1, Table S1).

Phylogenetic analyses
We obtained a phylogeny of Limnebius using molecular data obtained from Rudoy, Beutel & Ribera (2016), which included 71 of the ca. 150 known species of the genus. The taxon sampling was denser for the Palaearctic lineages in subgenus Limnebius s.str., including the full range of body sizes and aedeagus structural variation (Rudoy, Beutel & Ribera, 2016). Groups with the lowest number of sampled species show in general a more homogeneous and simpler aedeagus (L. mundus and L. piceus groups and subgenus Bilimneus). We used as outgroup and to root the tree the genus Laeliaena, considered to be sister to Limnebius based on multiple morphological synapomorphies (Hansen, 1991;Jäch, 1995;Perkins, 1997;Beutel, Anton & Jäch, 2003).
We reconstructed a phylogeny of the genus with Bayesian analyses in BEAST 1.8 (Drummond et al., 2012) using a combined data matrix with three partitions, (i) mitochondrial protein coding genes (two cox1 fragments plus nad1); (ii) mitochondrial ribosomal genes (rrnL plus trnL) and (iii) nuclear ribosomal genes (SSU plus LSU ) (Table  S1). We used a Yule speciation process as the tree prior and an uncorrelated relaxed clock. Analyses were run for 100 MY generations, ensuring that the number of generations after convergence were sufficient as assessed with Tracer v1.6 (Drummond et al., 2012) and after removal of a conservative 10% burn-in fraction.
To calibrate the phylogenetic tree we used the rates estimated in Cieslak, Fresneda & Ribera (2014) for a related group (familiy Leiodidae, within the same superfamily Staphylinoidea, Beutel & Leschen, 2005) and the same gene combination based on the tectonic separation of the Sardinian plate 33 Ma. We set as prior average mean rate a normal distribution with average 0.015 substitutions/site/Myr for the mitochondrial protein genes, 0.006 for the mitochondrial ribosomal genes, and 0.004 for the nuclear ribosomal genes, all with a standard deviation of 0.001. It must be noted that for our objectives only relative rates are needed. An absolute calibration would only be necessary to obtain absolute estimates of character change, which is not our main objective and does not affect our conclusions.
We reconstructed the ancestral values of the morphological variables using the values of the terminals (extant species) in BEAST 1.8. The reconstruction of ancestral characters should ideally include a combination of extant and fossil data (e.g., Webster & Purvis, 2002;Slater, Harmon & Alfaro, 2012;Bokma et al., 2015). However, as there are no known fossils of species of Limnebius we can only use extant species. The use of extant species without fossil data has resulted in valuable contributions to the study of character evolution (e.g., Cooper & Purvis, 2010;Baker et al., 2015), and in some cases it has been reported that the introduction of fossil data did not alter substantially the conclusions obtained with extant species only (e.g., Puttick & Thomas, 2015).
We implemented a Brownian movement model of evolution (BM), a null model of homogeneous evolution in which variation accumulates proportionally with time, with incremental changes drawn from a random distribution with zero mean and finite constant variance (Hunt & Rabosky, 2014;Adams, 2014). The reconstruction of ancestral values using a BM model of evolution can be biased toward average or intermediate values (Pagel, 1999;Finarelli & Goswami, 2013), which together with the lack of fossil data can result in an underestimation of the rates of evolution of some characters. Due to these limitations our reconstruction needs to be understood as the simplest null model explaining the evolutionary change in the studied characters. It must be noted though that some of our conclusions do not depend on these reconstructions, and for others we compare two groups (Bilimneus and Limnebius) likely affected by the same biases. In any case, all analyses were repeated using only the data of the terminal branches, with a decreased power due to their lower number but less prone to reconstruction biases. Terminal branches are truncated with respect to internal branches, as they do not reflect a full interval between speciation events but the time elapsed since the last cladogenetic split. This does not introduce any bias in our measures when change is anagenetic (i.e., proportional to time), but will overestimate evolutionary rates if change occurs mostly at speciation (i.e., according to punctuated models of evolution; Gingerich, 2009;Hunt & Rabosky, 2014).

Statistical analyses
We studied the evolution of the morphological characters trough the full evolutionary path of species (i.e., from root to tips) and in the individual branches, using phylogenetic ancestor-descendant comparisons (PAD; Baker et al., 2015). We first deleted the terminals with species with unknown males, as well as the duplicated specimen of L. cordobanus and the outgroup (Laeliaena) ( Table S1). We then estimated the phylogenetic signal of the morphological variables in the whole tree using the K metric proposed by Blomberg, Garland & Ives (2003), which tests whether the topology and branch lengths of a given tree better fits a set of tip data compared with the fit obtained when the data have been randomly permuted. The higher the K statistic, the more phylogenetic signal in a trait. K values of 1 correspond to a BM model, which implies some degree of phylogenetic signal. K values closer to zero correspond to a random or convergent pattern of evolution, while K values greater than 1 indicate strong phylogenetic signal. We used the R package 'Picante' (Kembel et al., 2010) to compute K and the significance test. To test the possible decrease of power or the randomization test due to the low number of species in Bilimneus (15, vs. 50 in Limnebius s.str.) we randomly pruned the tree of Limnebius s.str. to 15 species in 1,000 replicas and computed the K values for all of them. We also measured the correlation between the variables across the whole tree with a regression of phylogenetic independent contrasts with the PDAP package in MESQUITE (Midford, Garland & Maddison, 2011).
For the analyses of the individual branches of the tree (PADs) we compiled the reconstructed values in BEAST of all variables for the initial and final node of each individual branch, together with their length. We measured three values for each of the individual branches (including terminals): (i) amount of phenotypic change, equal to the arithmetic difference between the final and initial values of the branch; (ii) absolute amount of phenotypic change, equal to the absolute value of the amount of phenotypic change; (iii) phenotypic change measured in darwins (Haldane, 1949), computed as the absolute value of the natural logarithm of the ratio between the final and initial values divided by the length of the branch in million years (Myr) ( Table S1). The use of the natural logarithm standardises the change so it is proportional and directly comparable among species with different sizes (Haldane, 1949;Gingerich, 2009). Darwins are known to be strongly dependent on the length over which are measured (Gingerich, 2009;Hunt, 2012;Hunt & Rabosky, 2014), so to obtain the correlation with time of the different variables we used instead the absolute amount of phenotypic change, which showed a more linear relationship with branch length for most of the variables (see Results). As individual branches are in principle independent from each other we analysed these variables with standard statistical procedures (see e.g., Baker et al., 2015). We used a Bonferroni correction for multiple tests and considered as significant a p < 0.01 level (corresponding to five morphological variables), although we also report some results at the standard p < 0.05 level in some of the analyses. In the comparisons between the two subgenera of Limnebius (Bilimneus and Limnebius s.str.) we did not include the respective stem branches.

Phylogenetic signal and correlation between complexity and size
The reconstructed ancestral values of Bilimneus were in the upper range of the values of the extant species, especially for measures of complexity (fd and per). The reconstructed ancestral values of the whole genus were, on the contrary, larger than any extant species of Bilimneus, with the only exception of the length of the genitalia (lg ) ( Table 1). For the subgenus Limnebius s.str. all reconstructed values, both of the genus and the subgenus, were within the range of the extant species, and close to their centre of distribution (Table 1;  Tables S1 and S2).
The phylogenetic signal (K ) was very significant for all variables both in the genus Limnebius as a whole and the subgenus Limnebius s.str., with values close to 1 (i.e., a BM model) for the measures of complexity (per and fd), clearly over 1 (i.e., with a stronger phylogenetic signal) for the size of males and females (lf and lm), but with values lower than 1 (i.e., closer to a global random distribution, although still significant) for the length of the genitalia (lg ) ( Table 2).In contrast, none of the variables had a significant K in subgenus Bilimneus (Table 2). More than 99% of the 1,000 pruned replicas of Limnebius s.str. maintained significant values of K for all variables at a p < 0.05 level, and more than 90% at p < 0.01, with the only exception of lg, which become not significant (Table 2).
All pairwise correlations between the morphological variables were significant at a p < 0.01 level as measured with PDAP both in Limnebius as a whole and in subgenus Limnebius s.str., with the exception of the correlation between lg and the measures of complexity (per and fd) ( Table 3). For Bilimneus, with a lower number of contrasts (14 vs 49 in Limnebius s.str.), results were similar except that the length of the female (lf ) was not significantly correlated with any variable, and the length of the aedeagus (lg ) was significantly correlated with its fractal dimension (Table 3).

Rates of evolution
There were no significant differences between terminal and all branches of the tree for any of the measured variables (including branch length) at a p < 0.01 level, both in average value (2 tailed t-Student with unequal variances when differences were significant) or variance (Fisher's F-test) (Table 4). At a p < 0.05 only for the measure of darwins the variances of per and lf were larger in the terminals than in all branches, and the variance of lg smaller ( Table 4). Some of the largest increases or decreases of absolute change were in terminal branches (e.g., L. nitiduloides and L. cordobanus respectively, branches 94 and 72 in Fig. S2).
The variable with the highest values of phenotypic change and the highest variance was lg, both measured in absolute phenotypic change or in darwins, and both in Bilimneus and Limnebius s.str. (Table 4). The lowest values of change were in fd, also in both subgenera. Males were in general more labile than females, with larger relative (but not absolute) differences in Bilimneus than in Limnebius s.str. (Table 4). Table 1 for the codes of the measured variables, and Fig. S3 for a graphic estimation of the density functions of the different comparisons.

Terminals
All All comparisons of absolute change and darwins between Bilimneus and Limnebius s.str. were significantly different (always lower in Bilimneus) both for the variance and the average, with the only exception of the average lg, never significantly different between them regardless of how it was measured (Table 4; Fig. S3). When the sign of the change was considered variances were still significantly different, but only the average fd showed differences (positive in Limnebius s.str. and negative in Bilimneus), although only at a p < 0.05 level (Table 4; Fig. S3).
Branches of Limnebius s.str. were on average significantly shorter and with a lower variance than that of Bilimneus (Table 4; Fig. S3). In the whole Limnebius and the subgenus Limnebius s.str. all the measures of absolute phenotypic change were significantly correlated with the length of the branch, with lg having the lowest correlation ( Fig. 2A; Table 5). On the contrary, for Bilimneus only lg and per had significant correlations. When only the terminal branches were used the correlations between the measures of absolute phenotypic change and branch lengths were still significant at a p < 0.01 level for Limnebius s.str. with the only exception of lg, but none of them was significant for Bilimneus ( Fig. 2B and Table 5).  Table 5 for the parameters of the linear regressions.

Distribution of the reconstructed phenotypic change
The amount of change in the individual branches for all variables was in general not significantly different from a normal distribution, as measured with a non-parametric Kolmogorov-Smirnov test. Only the variables of complexity when all the branches in the whole genus were considered could be said not to follow a normal distribution at a p < 0.01 level (Table 6). When measured in darwins (and taking into account the sign of the branch), the amount of change of the individual variables was in general less normally distributed than when the change was measured with independence of the time, although at a p < 0.01 level only per and lf in the whole genus, and per in Limnebius s.str., were significantly non-normal (Table 6). Despite the general normal distribution of the amount of change, in Bilimneus there was an excess of negative branches (i.e., a change to lower values) for all variables except for lg. On the contrary, in Limnebius s.str. there were significantly more positive than negative changes for the variables of complexity (Table 7). In both cases differences were mostly significant only at a p < 0.05 level, and in Limnebius s.str. only when all branches of the tree were considered. The average amount of phenotypic change per branch was negative for all measures in Bilimneus and positive in Limnebius s.str., but in all cases the 95% confidence intervals included the zero (data not shown).
For all measurements the general tendency was for a branch to have the same type of change than that of the preceding. In Bilimneus this meant that the most frequent combination was two branches with negative change, while in Limnebius s.str. two with positive change, although lg had the lowest differences between the frequency of two consecutive positive or negative branches in both subgenera (Table 7). The terminal branches showed the same tendency except for the variables measuring length in Limnebius s.str., for which the most frequent combination was two negative branches (Table 7).

DISCUSSION
We found clear differences in the macroevolution of the size and complexity of the aedeagus between the two subgenera of Limnebius. In the subgenus with the highest diversity of body size and aedeagus structure (Limnebius s.str.), both measures of complexity and the body size of males and females had a strong phylogenetic signal. This signal was much lower for the size of the genitalia, which evolved independently of complexity. Both measures of complexity had values of the K statistic closer to 1, suggesting a BM model of evolution (Blomberg, Garland & Ives, 2003). This was supported by the strong correlation of both variables with the length of the branch (a surrogate of time), as expected of characters evolving under a random walk (Gingerich, 2009;Hunt, 2012;Hunt & Rabosky, 2014).
In the subgenus with the more homogeneous and simple genitalia, Bilimneus, the tendency was for all measured variables to evolve more randomly, despite the significance of some correlations. The correlation of the aedeagus size with its fractal dimension may be due allometry resulting from its simplicity and the higher dimensionality of the variable fd. In Limnebius s.str this allometric effect may be hidden by the larger variation in complexity.
In Bilimneus, length and perimeter of the genitalia were the only variables significantly correlated with time, but in both cases the effect size was very low, and very similar in both subgenera for lg (Fig. 2). The amount of phenotypic change in Bilimneus for all variables was also smaller and with a much smaller variance than in Limnebius s.str., except for lg. Despite the large difference in size between some species between the two subgenera, there were no significant differences in the average amount of change of the length of the genitalia in the individual branches, both with the measure of absolute phenotypic change or in darwins.
There was some evidence for directional evolution in the complexity of the genitalia in both subgenera, as revealed by the excess of branches with a positive change in Limnebius s.str. and negative in Bilimneus, both individually and in combination to other branches of the same sign. But this evidence for directional evolution was weak, and the fact that in all cases the 95% interval of the average of the measure of change included zero suggests that the dominant factor was a random walk (Hunt, 2012). The directionality could also be partly due to the effect of the hard boundary imposed by the intermediate condition of the reconstructed ancestor or the artefacts introduced by the reconstruction method (Pagel, 1999;Hunt & Rabosky, 2014). These biases for negative or positive change were, however, not present for the length of the genitalia, which was more normally distributed and with a more balanced distribution of positive and negative changes. Directional phenotypic change is rare in the fossil record, accounting for a small fraction of the studied systems, likely due to the short time scale in which directional selection needs to act to reach adaptive peaks (Hunt, 2007;Hunt & Rabosky, 2014). The evolution of most lineages seem to be driven by a complex interaction of random walk, stasis and directional evolution, suggesting bouts of different types of evolutionary constraints in an idiosyncratic sequence (Hunt, 2012;Hunt, Hopkins & Lidgard, 2015). Experimental results on the effect of sexual selection agree with this idiosyncratic view of the macroevolution of the genitalia. Selection can act differently on different genital traits, but there is no evidence of the existence of directional trends acting through long evolutionary scales which may accelerate macroevolutionary divergence (e.g., House & Simmons, 2003;Simmons et al., 2009;Macagno et al., 2011;Rowe & Arnqvist, 2011;Simmons & García-González, 2011;Hotzy et al., 2012;Sakurai, Himuro & Kasuya, 2012;Okuzaki & Sota, 2014;Richmond, 2014;Dougherty et al., 2015).
Contrary to what happened with the evolution of complexity, the weaker phylogenetic signal in the length of the male genitalia, weaker correlation with time and more normal distribution of change would be compatible with an evolution dominated either by random factors or by stasis (Hunt, 2012;Hunt & Rabosky, 2014). This would seem paradoxical, as genital length is the variable with the highest evolutionary rates in both subgenera, but it should be considered that stasis is defined by the bias of the phenotypic variation of a trait, not by its amplitude-that is, for the tendency of the trait to return to an overall optimal value (Hunt, 2012), but not for the amplitude of variation around this optimum. It must also be considered that in our PAD analyses we pooled branches from multiple independent lineages, which means that even when the length of the genitalia may show a macroevolutionary pattern compatible with stasis and stabilizing selection, it may still be subjected to directional selection in different lineages.