Major histocompatibility complex (MHC) fragment numbers alone – in Atlantic cod and in general - do not represent functional variability

This correspondence concerns a publication by Malmstrøm et al. in Nature Genetics in October 2016. Malmstrøm et al. made an important contribution to fish phylogeny research by using low-coverage genome sequencing for comparison of 66 teleost (modern bony) fish species, with 64 of those 66 belonging to the species-rich clade Neoteleostei, and with 27 of those 64 belonging to the order Gadiformes. For these 66 species, Malmstrøm et al. estimated numbers of genes belonging to the major histocompatibility complex (MHC) class I lineages U and Z and concluded that in teleost fish these combined numbers are positively associated with, and a driving factor of, the rates of establishment of new fish species (speciation rates). They also claimed that functional genes for the MHC class II system molecules MHC IIA, MHC IIB, CD4 and CD74 were lost in early Gadiformes. Our main criticisms are (1) that the authors did not provide sufficient evidence for presence or absence of intact functional MHC class I or MHC class II system genes, (2) that they did not discuss that an MHC subpopulation gene number alone is a very incomplete measure of MHC variance, and (3) that the MHC system is more likely to reduce speciation rates than to enhance them. Furthermore, their use of the Ornstein-Uhlenbeck model is a typical example of overly naïve use of that model system. In short, we conclude that their new model of MHC class I evolution, reflected in their title “Evolution of the immune system influences speciation rates in teleost fish”, is unsubstantiated, and that their “pinpointing” of the functional loss of the MHC class II system and all the important MHC class II system genes to the onset of Gadiformes is preliminary, because they did not sufficiently investigate the species at the clade border.


Correspondence
In the below, we explain our criticisms of the Malmstrøm et al. 1 article as they are summarized in our abstract.
When was the MHC class II system lost in Gadiformes? The data as presented by Malmstrøm et al. 1 suggest a simultaneous loss of major histocompatibility complex (MHC) IIA, MHC IIB, CD4 and CD74 functions at the evolutionary onset of Gadiformes (see their Figure 2). However, within their datasets for gadiform fishes, sequence reads that represent these genes can readily be detected (Table S1 and Supplementary File 1). These sequence read numbers are much lower than found for the non-gadiform fish, and they may be contaminations, but that should be appropriately tested. Meanwhile, for several non-gadiform fishes, including for S. chordatus which among the investigated fishes is the species closest related to Gadiformes, there are no full-length MHC IIA, MHC IIB, CD4 or CD74 gene sequences in the unitig and scaffold datasets presented by Malmstrøm et al. 1 (Supplementary File 2 and Table S2). We agree with the conclusion by Malmstrøm et al. 1 that their data suggest that throughout Gadiformes there is no canonical MHC class II system. However, as for the evolutionary timings of the loss of an intact MHC class II system and of the losses of the individual MHC class II system genes, we find their study technically wanting and preliminary. The combination of (i) not finding intact full-length sequences for all important MHC class II system genes in species closely related to Gadiformes, while (ii) finding reads of these genes in gadiform fishes, prohibits what the authors call "pinpointing the loss of MHC II pathway genes to the common ancestor of Gadiformes". At least for a few species at either side of the Gadiformes clade border, Malmstrøm et al. 1 should have substantiated their claims by addition of specific PCR plus sequencing analyses, which should confirm presence of full-length intact MHC class II genes in the non-gadiform fishes, and their absence in the gadiform fishes. Malmstrøm et al. 1 Whereas our criticisms of the MHC class II system analysis by Malmstrøm et al. 1 are about technical issues and the preliminary character of their conclusions, we more fundamentally disagree with their analyses and discussions of MHC class I. The authors assumed 1 , as postulated by other researchers before them, that there can be a "copy number optimum" of MHC genes affected by a tradeoff between a higher number allowing the presentation of more pathogen antigens while also having a depletion effect on the T cell population. Regardless of the extent to which this mostly theoretical concept is true 2 , the MHC counting strategy by Malmstrøm et al. 1 should be deemed incomplete and far too simplistic. For their number determination Malmstrøm et al. 1 solely relied on estimation of U plus Z lineage genomic α3 exon fragment numbers, despite that the typical "birth and death" mode of MHC evolution can produce many pseudogenes 3 . The decision of the authors to only count U plus Z lineage gene fragments was based on their unsubstantiated perception that (neo-)teleost U and Z molecules "predominantly" bind peptide ligands 1 . However, not all teleost U and Z molecules are expected to present peptides 4,5 , for example this is not expected for the majority of U lineage molecules in the neoteleost fish medaka 6 and the non-neoteleost fish rainbow trout 7,8 ; how this is in the majority of the species investigated by Malmstrøm et al. 1 remains to be determined. Furthermore, it should be realized that MHC class II and non-peptide-binding MHC class I molecules (like maybe teleost molecules of the MHC class I lineages L, P and S 4 ) also can contribute to T cell depletion e.g.9 . Peculiarly, while from their referencing it follows that Malmstrøm et al. 1 were aware of an MHC class II impact on T cell depletion, the authors did not look at MHC class II when investigating their optimum MHC number model. A more general shortcoming of the article 1 is the lack of awareness that the direct determiner of the levels of "antigen coverage" and T cell depletion is the variation between the relevant MHC molecules 2 , rather than merely the MHC gene copy number. Table 1 (with detailed explanations in Supplementary File 3) summarizes that different teleost fish species can have very different levels of variation between MHC molecules 4 , and that despite its many U lineage gene copies the extent of MHC variation in Atlantic cod can be considered as relatively limited. Previously, we showed that salmon, zebrafish and eel share variation in U lineage sequences, dating from probably more than 300 million years ago (MYA), whereas all U lineage variation found within the neoteleost fishes stickleback and Atlantic cod probably was established only after these two species separated around 150 MYA 4 . Without experimental evidence, it cannot simply be assumed that "antigen coverage" and/or T cell depletion are highest in fishes with the highest counts of U plus Z α3 fragments, while neglecting levels of variance among the intact U and Z molecules and possible presences of other categories of MHC molecules. As a last critical comment we point out that, in stark contrast to the evolution of any other known MHC lineage, most deduced Z lineage molecules are characterized by a putative peptide binding groove which was almost perfectly conserved since >400 MYA 4 ; this questions the model by Malmstrøm et al. 1 that Z lineage evolution was driven by pathogen antigen variation, and is a further argument against the use of combined U+Z numbers for analysis of MHC evolution.

Discussion of the MHC class I counting strategy by
Overall discussion of the model by Malmstrøm et al. 1 Table S2. The other supplementary files were also exchanged, because something had gone wrong in their placement during the discussion with the editors during the first submission so that one file was shown twice.

REVISED
years. Malmstrøm et al. 1 postulated their multiple-regime Ornstein-Uhlenbeck model with very slow progress towards optimum MHC numbers because it was the best fitting model among the few models that they tested. However, an even better fitting model would be that in each species the respective optimal U and Z gene organizations were achieved. Further criticism is that their calculation methods for optimum U+Z numbers and half-life periods incorporated calculations of U+Z gene multiplication speeds, which suffered from the fact that (like for their other considerations) Malmstrøm et al. considered all U and Z genes as identical mathematical units 1 . For such speed calculations U and Z genes should have been studied separately, and it also should have been realized that whereas from some U or Z genes multiple new copies were generated, others were lost in accordance with the "MHC gene birth and death" model 3 . Lastly, even if, regardless of the discussable calculations for speeds and optimum numbers, there is a positive association in neoteleost fish between speciation rates and U+Z α3 fragment numbers (see their Figure 3), then still their model which considers MHC genes as "speciation genes that promote rapid diversification" 1 would be implausible in regard to cause and effect. Namely, in most species, there is a strong evolutionary pressure to maintain old allelic variation within MHC genes (trans-species polymorphism 3,4,10 ), which, if anything, is likely to slow down speciation rates because it increases the required size of the founder population 10 . If old allelic or haplotype variation can't be maintained because of rapid speciation through small founder populations, it can be speculated that a species might benefit from an enhanced capacity for the creation of new MHC allelic and/or haplotype variation by duplications/deletions and recombination 11 between a high number of linked MHC gene copies. However, in that scenario it wouldn't be the MHC organization which drives the speciation rate, as suggested by Malmstrøm et al. 1 , but the other way around.  This table shows the  lowest percentages of amino acid sequence identities between membrane-distal domains (α1+α2 for MHC I, α1 for MHC IIA, b1 for MHC IIB) of same category MHC molecules found between reported sequences of the same species. In some species no genes for particular categories were found (black boxes), while in other instances only one seemingly intact gene sequence was found (1 sequence) or only pseudogenes were found (pseudogene). A more detailed explanation of this 3. Cooper et al. 12 concluded that OU calculations are sensitive to intraspecific variation. MHC class I gene copy numbers can differ widely between individuals (haplotype variation), for example, it was estimated that the U lineage gene copy number in the neoteleost fish tilapia can differ between 11 and 17 per haplotype 15 . Also, concerning Atlantic cod haplotype variation, the presence/absence of U lineage genes was found 16 . The availability of copy number variation within species makes it highly unlikely that the half-life for reaching optimum numbers of U+Z α3 fragments is 23 million years in teleost fish, since there is already an existing copy number variation from which can be selected. A period of 23 million years typically includes multiple speciation events (see Table S2). et al. 12 recommended that OU-based conclusions should be tested by comparison with other analyses. Table S2 shows that the Malmstrøm et al. 1 conclusion that species richness is associated with elevated U+Z α3 numbers may agree with their division into clades ("regimes") used to fit their model, but that such a conclusion does not hold if other branchings in the neoteleost phylogenetic tree are compared (Table S2). For example, Percomorphaceae excluding Ophidiiformes have more species and a higher average number of U+Z α3 numbers than Ophidiiformes, which agrees with the model, but at the same time Carangiformes appear to have many more species but considerably lower U+Z α3 numbers than Abantiformes (Table S2), which does not agree with the model. Table S2 shows that there is not a general tendency for the branches with more species to have higher U+Z α3 numbers. The calculated half-life of 23 million years for reaching optimum copy number also should be considered as a reason to abandon the model; namely, the long period would imply that hardly any of the investigated species have the optimum MHC number, whereas the previously discussed haplotype variations between individuals of the same species reveal that the system is quite plastic. After all, the type of gene copy number variation investigated here is generally not about gene duplicates that acquired entirely new functions, but more commonly about duplicated genes that acquired only slightly modified functions (for example presenting a slightly different set of peptides), and genes and pseudogenes which may predominantly serve as a reservoir for increased genomic recombination, or which may have no function at all. For example, the turnover of closely related MHC class I genes during primate evolution has produced a human MHC genomic region with only 3 classical class I genes HLA-A, -B, and -C, 3 nonclassical genes HLA-E, -F and -G that also encode molecules with peptide binding ability, and numerous pseudogenes including HLA-H, -J, -K, -L, and -V, and additional orphan sequence fragments that together account for 6 MHC class I coding genes and 13 pseudogenes 17,18 . In evolution, by recombination events, increase or decrease in numbers of tandemly organized similar genes can go very rapidly (e.g. 19). A half-life of 23 million years also would have trouble explaining how the Atlantic cod has 80 U lineage α3 fragments, whereas the fish Theragra chalcogramma, which separated from Atlantic cod only ~3 million years ago, only has 31; this observed difference implies very rapid large changes in copy numbers. (C) In teleosts, although probably the least pronounced in neoteleosts, selection on U lineage variation happens at two levels 4,20 . In fish like cyprinids and salmonids, extreme evolutionary pressure appears to maintain ancient allelic sequence variation from hundreds of millions of years ago, which extends far beyond the peptide binding groove. Simultaneously, there is also the "normal" balancing/diversifying selection on peptide binding groove variation that has been well described for mammals. It is hard to take any model on teleost MHC class I

Supplementary material
Supplementary Click here to access the data. evolution seriously that does not recognize these two different levels of selection.
(D) Questions also can be raised concerning the reliability of the BAMM method used by Malmstrøm et al. 1 for analysis of speciation rates 21 . By using this method, Malmstrøm et al. 1 found higher speciation rates in species-rich clades than in species-poor clades, which seems to be a logical conclusion. However, when Rabosky, the author of the BAMM method 22 , applied the method to fish, a "negative relationship between species richness and mean speciation rate" was found, which is counterintuitive and curiously not explained in the respective article 23 .
(E) Malmstrøm et al. 1 used a bivalent BiSSE model for determining statistical significance of a positive association between elevated speciation rates and high U+Z α3 numbers. It seems that they speculated that the concluded significance of the finding derived from accelerated speciation in fish lineages where U+Z α3 numbers were higher than a threshold of about 20-25 copies, although that would not explain the overall distribution seen in their Figure 3b. As a potential biological exlanation for their proposed model, they 1 suggested that the effect of T cell depletion on hybrid fitness becomes more pronounced in the 20-25 copy range and that this might affect mate choice in species with copy numbers above this threshold, promoting inbreeding and reinforcement. However, their speculation can not explain how some species can have many more than 20-25 copies, for example the >100 copies observed in Muraenolepis marmoratus, and why such high copy numbers are not associated with even higher speciation rates (see their Figure 3b).
(F) Malmstrøm et al. 1 concluded "an optimum of 6.8 MHC I copies for basal branches of the phylogeny, which is in concordance with the hypothesized MHC I repertoire of early gnathostomes". However, previously estimated U+Z numbers in zebrafish, Mexican tetra (cavefish) and Atlantic salmon were 14, 31 and 14, respectively 4 , and U+Z α3 copies that Malmstrøm et al. 1 estimated for Borostomias antarcticus were 20, suggesting that the U+Z number was considerably higher than 6.8 at the start of the neoteleost lineage. This should have been acknowledged by the authors.
In regard to the modelling methods used by Malmstrøm et al. 1 , we also recommend readers to read the extensive critical comments by reviewer Dr. Jerzy Kulski, which he placed under the first version of our manuscript. In those comments he discusses the models used by Malmstrøm et al. 1 .

Conclusion
Malmstrøm et al. 1 used low-coverage genome sequencing for comparison of 66 mostly neoteleost fish, and so helped with elucidating their phylogeny. They found that intact MHC class II system genes may be completely absent in Gadiformes, and believed that related non-gadiform fishes have intact MHC class II system genes. However, their genomic databases were incomplete and in the case of many Gadiformes spiked with reads from MHC class II system genes that may or may not be contaminations, so that final conclusions require some additional analysis of at least a few species at the gadiform/non-gadiform clades border. We suggest that they need to perform a number of PCR and sequencing experiments to clarify this matter. When comparing class I and class II situations in their investigated neoteleosts, Malmstrøm et al. 1 also found that their earlier theory, which was that the absence of an MHC class II system might explain the high number of MHC class I genes in Atlantic cod 13 , could not be corroborated. Instead, solely based on estimations of U+Z α3 fragment numbers, they 1 proposed a new theory on MHC class I evolution which they referred to in their manuscript title. We hope to have shown sufficiently that their conclusions on MHC class I evolution were unsubstantiated, that estimation of U+Z α3 fragment numbers is not a proper way to analyze MHC functions or MHC evolution, and that, apart from not investigating logical units that are better suited for their methods of modeling, also the number estimations and modelling systems used by Malmstrøm et al. 1 were flawed and/or non-trustworthy. Before any meaningful discussion can be started about the evolution of MHC class I genes in neoteleosts, a much higher level of information about sequences and genomic positions is necessary.

Data availability
The datasets analyzed in this study originate from Malmstrøm et al.

Grant information
The author(s) declared that no grants were involved in supporting this work. reflected in their title "Evolution of the immune system influences speciation rates in teleost fish", is unsubstantiated. I concur with their three main criticisms "(1) that the authors did not provide sufficient evidence for presence or absence of intact functional MHC class I or MHC class II system genes, (2) that they did not discuss that an MHC subpopulation gene number alone is a very incomplete measure of MHC variance, and (3) that the MHC system is more likely to reduce speciation rates than to enhance them."

Jerzy K. Kulski
All three critical points are well founded and stand-alone without too much need for further support.  (2016) paper, one that also omits the other categories of the MHC I such as the L, S and P lineages that might contribute to a much large number of MHC I-like genes. In this regard, it seems that Malmstrøm et al. have taken only the genomic exon fragment numbers of the MHC I U and Z lineages to represent the entire immune system of their title.
2. Dijkstra & Grimholt are also right to point out that low coverage sequencing by next generation sequencing (NGS) can result in the artifactual loss of genes that in turn can lead to misleading or incorrect conclusions when counting for gene copy numbers or looking for gene losses. Malmstrøm et al (2016) sequencing coverage was 9 to 39x and they recovered only about 75% of the conserved eukaryotic genes. Therefore, in this situation, the low coverage sequencing data should not have been used de novo as evidence for the absence of genes from the genome without providing a properly organized high coverage map of genomic assemblies to show where the sequences are missing in the genomes. The reviewers and editors of the Malmstrøm et al (2016) paper should have been aware of this basic problem of using low coverage NGS, particularly with respect to looking for a few needles in a haystack. For a better understanding of the advantages and disadvantages of MHC genotyping and haplotyping by NGS see the review by Shiina et al (2016) . (2016) reported that there was no overall correlation between the combined MHC copy numbers and the HOX gene copy numbers that were used as a control. As already pointed out by Dijkstra & Grimholt , the U and Z genes should have been analysed separately. Nevertheless, it would have been interesting to see how these duplicated HOX development genes, which also have been implicated in driving speciation, compared with the properly classified duplicated MHC class I adaptive immune genes (separate Z, X, L, S and P lineages) at the classical and non-classical level in  (2016) provided no properly organized genomic assemblies or genomic gene maps and no information about genomic distribution of the MHC I or II sequenced fragments or the duplication mechanisms involved. If they had done so, they might have added important information to better assess and place the threshold MHC I copy numbers and gene distributions into some sort of genomic perspective . More reliable models for the evolution of MHC class I genomic duplications might be achieved by providing duplication gene maps and the phylogenic relationships of the duplicated gene sequences showing likely duplication mechanisms, where and how these genes are located relative to each other, and how the genomic structures have changed in a comparative analysis between species. See Kulski et al. (2004) for an example of one such duplication model. Mapping with phylogeny is a more informative approach than just constructing phylogenetic trees using one or more single exonic sequences from a limited number of each species and then claiming that the changes in copy numbers influence the speciation rates of almost the entire number of extant fish species. Perhaps, the Malmstrøm et al (2016) low coverage sequence libraries could still be used effectively to reconstruct full-length gene structures and targeted genomic regions that harbour multiple copies of the MH(C) genes in a comparative analysis.

Malmstrøm et al
5. The multiple-optima Ornstein Uhlenbeck (OU) model .  (2016) the multiple-optima OU model vastly outperformed alternatives such as Brownian motion, white noise, single-peaked OU and early-burst models. This find-ing corroborated their hypothesis that MHC I copy number evolution is characterized by selection toward intermediate optima, resulting from a tradeoff between detection and elimination of pathogens. Presumably, the authors preferred the OU multiple-optima model to the other models because it supported rather than falsified their hypothesis. Of course, this highly artificial computing model did not detect a tradeoff between detection and elimination of pathogens, this would be the authors' own biological hypothesis and bias. The OU model intrinsically sets optima (biases) according to its built-in algorithm , and this is one of the main objections to the use of this prediction model. The OU model artificially generates bias because its purpose is to find the trait optimum that stabilizes selection . The misguided conclusion by Malmstrøm et al (2016) in using the OU model is that the trait optimum influences speciation.
(b) Interestingly, Malmstrøm et al (2016) (2016) carried out binary state speciation and extinction (BiSSE) analysis to estimate differences in diversification rate between lineages with high and low MHC I copy numbers. They found that diver-sification rates based on correlation estimates differed most when the threshold was placed between 20 and 25 copies (Fig. 3c). With a threshold in this range, the model with two separate speciation rates for lineages with high and low copy numbers was statistically better supported than a model with a single speciation rate parameter. On this basis, they concluded that, 'These results suggest that the influence of MHC I genes on speciation rates is stronger in species that have already evolved at least 20 copies.' In comparison, the number of MHC I gene copy numbers in humans (excluding haplotype differences) is approximately 18 genes; 6 classical and non-classical MHC I genes, 5 CD1 genes, and 5 PHFZ genes (MICA, MICB, MR1, HFE, Zn-A2-GP, ). Thus, in comparison to some fish etc species, humans are diversifying along very nicely as a 'diversified' species approaching the 'magical' threshold of between 20 and 25 MHC I copies. It is noteworthy that Maddison et al (2007) highlighted the following assumptions that need to be taken into account when using their BiSSE package. For the BiSSE model analysis none of the characters associated with speciation rates can be said to be causing or influencing evolution, even if Maddison et al (2007) write, 'the correct conclusion given a significant result using our method is that the character examined or a codistributed character ' At best, the binary appear to be controlling diversification rates. character state is an association, at worse a misleading one. Maddison et al (2007) provided the following cautions and assumptions about the likelihood of the BiSSE (binary-state speciation and extinction) model: (a) the transitions happen instantaneously over the time scales considered (i.e., ignore periods of time during which a species is polymorphic). (b) these events are independent of one another; in particular, the character state change does not, in and of itself, cause speciation (or vice versa). (c) an accurate rooted phylogenetic tree with branch lengths is known (the "inferred tree") and the character state is known for each of the terminal taxa. (d) the tree is assumed complete: all extant species in the group have been found and included. (e) all terminal taxa are contemporaneous, and the tree is ultrametric (i.e., the total root-to-tip distance is the same for all tips).
I'm not convinced that Malmstrøm et al (2016) considered or accepted these constraining assumptions when using BiSSE modeling. (2009) , the speciation rates within the Percomorpha clade were calculated to be at least ten times greater than in the Gadiformes order. Yet, according to Malmstrøm et al (2016) , there were fewer than 20 copies of the U genes for each of the 5 species in the Perciformes clade, compared to more than 20 copies and up to 100 copies of the U genes in 16 of 30 species in the Gadiformes order (Fig 2b, Malmstrøm et al 2016 ). The use of only 5 species in the Perciformes order is a gross underestimate of the ten thousand or more species found in that order . Moreover, in the Gadiformes order, there were closely related species (n=16) with > 20 U genes and different groups of closely related species (n=14) with < 20 U genes. Again, the number of species that were sequenced in the Gadiformes order is grossly under-represented. The Gadiformes comprises 10 families and more than 600 species , whereas Malmstrøm et al (2016) sequenced only 27, i.e., 27 x 100/600 = 2,700/600 = 4.7% analysis of extant sequence, a percentage that is simply not good enough to support their extravagant conclusions. Is < 5% of the 600 species really representative of the Gadiformes. Malmstrøm et al (2016) have to be more temperate with their conclusions using such a small representative sample.There are clear inconsistencies with MHC I a3 fragment copy numbers in the Gadiformes order. The MHC I a3 frag copy numbers are low (<20) for Moridae and M. occidentalis in Macrourinae, and for Phycinae, Lotinae, and three species in Gadinae. Five species of Gadinae have between 20 and 40 copies. On the other hand, Bregmacerotidae, Merlucciidae, Melanonidae, Muraenolepidodae, and Trachyrincinae have MHC I a3 fragment copy numbers between 50 and 100. The threshold levels (20 to 25) are all over the place. Moreover, the lineage, genomic block duplication and hitchhiking (linkage) effects on MHC gene duplications (8 to >100 copies) in the Gadiformes have not been taken into account in the analysis of speciation rates (Fig 3, Malmstrøm et al 2016 ), and, therefore, make the entire analysis unreliable.

Speciation and diversification rates. (a) In the study by Santini et al
(b) "Diversification rate analyses were calculated on the basis of the time-calibrated phylogeny and counts  (2016) appears contradictory for the Perciformes (10,033 species) that have speciation rates 18 times greater than Gadiformes (555 species) and yet the MHC I a3 fragment copy numbers are at least two times lower in Perciformes than Gadiformes (Fig 2b, Malmstrøm et al 2016 ). Also, the Anabantiformes have 252 species -a species rate 40 times lower than Gadiformes and yet their MHC I a3 fragment copy numbers are at least two times higher than in Perciformes.
(c) Considering that there are more than 29,000 species of teleost fishes , a highly limited analysis by Malmstrøm et al (2016) of using a sample group of less than 0.2% of the available extant species cannot be considered to be statistically, taxonomically or biologically significant or sufficiently reliable to conclude that, "Evolution of the immune system influences speciation rates in teleost fish" . What does a species half-life of 25 million years mean in the context of 29,000 species of teleost fishes? If the multiple-regime OU model is wrong, highly biased or misinterpreted then does it validate or support the overall hypothesis of Malmstrøm et al (2016) ? Also, what does an optimal trait actually mean in the context of 29,000 species? If, a suboptimal number of MHC I copies are detrimental to a species, then how have divergent species managed to survive for so long with a half-life of 25 million years of adaptation? Also, if, as Malmstrøm et al (2016) say, 'Such gene family expansions may promote biological diversification by introducing new raw genetic material, potentially resulting in sub-or neofunctionalization and thus novel immunological pathways.', then which of the non-optimal (greater than or less than the threshold of 20 -25 copies) MHC I genes are detrimental to the species? In this regard, there must be a gradation of functionally good and bad MHC I genes as their copy numbers approach the threshold (optimally good) and then deteriorate beyond it. Is this assumption of a MHC I copy number functional trait value as a quantitative marker of speciation at all testable?
8. In their discussion, Malmstrøm et al (2016) referenced the hypothesis of T cell depletion and hybrid fitness by Eizaguirre et al (2009) and concluded that,"Our analyses identify this threshold at 20-25 MHC I copies, suggesting that the effect of T cell depletion on hybrid fitness becomes more pronounced in this range and that this might affect mate choice in species with copy numbers above this threshold, promoting inbreeding and reinforcement." Eizaguirre et al (2009) suggested that, "Super-optimal individual MHC diversity should be a common disadvantage for species hybrids in vertebrates, resulting in elevated parasite loads." In this regard, if high copy numbers of the MHC class I genes leads to hybridization and loss of the immune system as inferred by Eizaguirre et al (2009) , then this more than likely would lead to extinction of populations and species. Extinction would be the most extreme and bizarre form of immune system influence on speciation rates. Furthermore, it is extremely speculative for Malmstrøm et al (2016) to say that high copy numbers of the MHC class I genes with copy numbers above the threshold of 20 to 25 copies, promotes inbreeding and reinforcement, because, in fact, there is no such evidence for it. A more reasonable hypothesis is that high copy numbers of linked MHC class I genes, such as in rhesus macaque, or the mouse , or the cod , might benefit the species to better adapt to microbial inhabitants in a greater variety of geographical environments, although the evidence for this is tenuous as well. Despite ongoing debates, the selective advantage of MHC diversity in host-pathogen coevolution might not be easily resolved (at the macroevolution level) because of the constant number of insults by large numbers of pathogens in the life-time of an individual organism, population or a species and the arms race or Red Queen effect. Studies on extant species will always discover an example of a pathogen associated with a polymorphic MHC gene that might favour selective advantage for host-pathogen coevolution, whereas the pathogen that caused the extinction of a species is rarely or never found. To conclude that the immune system (that is, different copy numbers of the class is rarely or never found. To conclude that the immune system (that is, different copy numbers of the class I MHC genes ) influences speciation rates, it would have to be shown that the immunity gene products can commonly create reproductive barriers or genetic incompatibilities among populations that permit the maintenance of the genetic and phenotypic distinctive-ness of these populations in geographical proximity ; and this was not done .
9. Malmstrøm et al (2016) did not provide any reliable evidence to support their speculation that evolution of the immune system influences speciation rates in teleost fish or that increas-ing MHC I diversity facilitates speciation . Instead, Malmstrøm et al (2016) used their limited data and analyses using speculative models to jump to highly unsupported conclusions and quickly position the cart before the horse. Dijkstra & Grimholt pointed out that the Malmstrøm et al (2016) title "Evolution of the immune system influences speciation rates in teleost fish", is unsubstantiated, and that their hypothesis seems to be "the wrong way around". It should have been, "Speciation (rate) influences the evolution of the immune system in teleost fish." Or, "Speciation rates are associated with diversity of MHC class I genes in teleost fishes", which perhaps is too obvious and underwhelming. This is not simply the chicken or the egg causality dilemma; in fact, the change in title is better supported by the literature and the established theories of MHC genomic evolution in vertebrates . However, because it is less "sexy" and controversial than the original title, it might not have been so readily published.
10. A large number and variety of genome-wide gene duplications have been associated with speciation , that is, genomic gene duplications are not limited to only class I MHC genes. If MHC I gene duplications effect or affect speciation, how do the other hundreds of gene duplications contribute to speciation rates? Also, do sequence variants or mutations in non-duplicated genes have any influence on speciation rates? It seems absurd to pick on only one group of gene duplications (e.g., MHC class I genes ) as those that are responsible for speciation and ignore all the others as an inconvenience. For example, a relatively recent comparative genomic study revealed how genomes change with speciation in an examination of genomes from five cichlid fish species, an ancestral lineage from the Nile, and four species from the East Africa lakes, Tanganyika, Malawi, and Victoria . Compared to the ancestral Nile lineage, the East African cichlid genomes had many alterations in regulatory elements, accelerated evolution of protein-coding elements in genes for pigmentation, an excess of gene duplications, and other distinct features that affect gene expression associated with transposable element insertions and novel microRNA. Each species also contains a reservoir of mutations different from the other species . Much of the diversity between the cichlid fish species evolved in a nonparallel manner often rapidly due to sexual selection and genetic conflicts between males and females or between different regions of the genome at a regulatory level rather than by the slower and weaker forces of classical natural selection . If sexual selection and genetic conflict at the genomic regulatory level are the prime movers of speciation rate, it is difficult to conclude that the variable diversity of a few MHC gene copies are responsible for speciation as well as for the many other associated genomic changes associated with speciation.
11. Malmstrøm et al (2016) informed us in the introduction section of their publication that "Our results highlight the plasticity of the vertebrate adaptive immune system and support the role of MHC genes as 'speciation genes', promoting rapid diversification in teleost fishes." MHC class I gene copy number variability occurs across many different species, families, orders and domains. Because there is such enormous variability in MHC class I gene copy number for hundreds or possibly even thousands of different chordate species, it is not possible to conclude meaningfully that the expansion of MHC class I genes provides an undefined advantage of one species over another. For example, the great apes (humans, chimpanzees, gorillas and orangutans) have about six functional MHC class I genes, whereas the old and new world monkeys often have up to 15 or more . Is this evidence that the MHC class I . Is this evidence that the MHC class I genes influence the rate of speciation in primates? And if so, what does that really mean in the whole scheme of things? How do the species with low copy numbers of MHC class I genes survive so well over millions of years without the presence of another 90 to 100 copies of MHC class I genes? This question is often neglected, and yet it is important for a better understanding of the function and evolution of MHC genes between and within the vertebrate species.
12. Taxonomic and lineage markers. Mutations, indels and duplications drive diversity and evolution. However, most mutated genes within species and their families do not create or influence speciation rates in the sense that Malmstrøm et al (2016) use the term, 'speciation genes'. In comparative genomics and their sequence relationships between different species, most genomic sequences range between newly derived genes and the ultraconserved or the essential core coding and noncoding genes with varying amounts of sequence differences. Some genic and nongenic sequences such as the MHC genes and retrotransposons are highly polymorphic and therefore are useful taxonomic markers at the individual, population, species and broader lineage levels. The MHC gene sequences clearly are one of these useful taxonomic or lineage markers along with olfactory receptors, immunoglobulins, globins, HOX, TOLLs, KIRs, mitochondrial DNA, ribosomal RNA sequences and thousands of others that can be used comparatively in the phylogeny to undertake an examination of the accuracy and reliability of current taxonomical rankings and sister lineages. However, because many thousands of coding and noncoding genes (or sequences) are variants (polymorphic) or vary in copy numbers, we do not immediately or easily imply that all or some of them are responsible for speciation without providing further concrete evidence. This kind of extrapolation without the burden of proof is absurd and wrong. Similarly, to say that the polymorphisms demonstrate natural selection as if natural selection was a biological or molecular mechanism is meaningless without showing experimentally how these polymorphisms benefit or disadvantage the organism over all the other different polymorphisms.
13. On the basis of either or reasoning, the immune system obviously affects the a priori a posteriori wellbeing of individuals and populations, but whether it can be extrapolated to speciation events and speciation rates remains highly dubious and most probably unlikely. It seems too farfetched to blame MHC class I genes with high copy numbers over threshold levels of promoting inbreeding and reinforcement because this in turn could create hybrid inviability or sterility resulting in postzygotic isolation. Although the population conditions in many models of rapid speciation do favour inbreeding and/or hybridization , none of the teleost species tested by Malmstrøm et al (2016) were shown to be either inbreeding or in postzygotic isolation. The factors responsible for either prezygotic or postzygotic isolation are likely to be independent of the adaptive immune system, although zealots might argue otherwise. Hybridization between diverging lineages in post-zygotic reproductive isolation can trigger genome instability. For most animals without an adaptive immune system and for plants without a MHC, speciation depends on the shrinkage, expansion and equilibrium (e.g., aneuploidization and dysploidy) of the genome and the containment and functionality of all the essential genomic information to develop an optimal balance between stability and plasticity within the organism in order to first survive and then propagate and expand itself as a new species . In those rare and 'traumatic' transitional situations, there is no need for particular 'speciation' genes such as variable copies of the class I MHC genes to influence speciation. The rarely observed transition from population 'trauma' to a new speciation event depends on an array of totally different factors for creating postzygotic isolation events including interbreeding between semi-isolated populations and an elaboration on the existence of stress-induced changes in chromosomal and ploidy integrity both in hostile and non-hostile environments.
14. Finally, Malmstrom et al (2016) (2016) admirably sequenced 66 teleost species by a next generation sequencing method and identified an array of MHC I and MHC II exonic fragments for phylogenetic and speciation analysis using the multiple-regime OU model to predict the optimal MHC I copy number as an evolutionary trait optimum affecting speciation. However, the conclusions of the paper by Malmstrom et al (2016) especially for the MHC I gene copy numbers are unreliable because they are based on far too many assumptions, speculations, contradictions, incomplete or missing data and unproven predictive models with little or no empirical evidence in support. Nevertheless, their simple, but controversial hypothesis is published, and now it is up to them and others to test its validity and "consider plausible alternative hypotheses in a firm hypothesis-testing framework in which alternative hypotheses make clear [and sensible] predictions of emerging patterns that can be unambiguously associated with particular models." Thank you for your review and support of our article. We appreciate that an expert such as you is willing to join the public debate so that erroneous/unsubstantiated messages like the ones presented by Malmstrøm can not take hold in our field. et al.
Your comments are very extensive and valuable, and we now refer the readers to them.
Our special fields of scientific expertise are MHC genes and molecules, and in our first manuscript version we only concentrated on those. If there is no sufficient unity among the units used for mathematical modeling, that modeling, or the functional explanation of the resulting model, can never make sense. However, we now realized that for a large audience these MHC-specific issues never make sense. However, we now realized that for a large audience these MHC-specific issues might not be so clear, and that it is better to also address the questionable modeling methods used by Malmstrøm Therefore, we now have added two paragraphs dedicated to this questionable et al. modeling, titled "Detailed discussion of the use of the Ornstein-Uhlenbeck model by by Malmstrøm " and " ". We realize that et al.
Additional criticisms in regard to the modelling by Malmstrøm et al. we use different language in regard to these topics than a theoretical biologist would use, but we hope that nevertheless we address the issues clearly and correctly. It has long been true in fish MHC research that the fact that a gene has not been reported to be present in a particular species does not mean that is not present. Modern genomics techniques has presented better proof for this assertion, with the lack of an MHCII/CD4 pathway in gadids being the most prominent example, but even modern genomics techniques are not iron clad 100% proof and should be checked very carefully before definitive statements are made. Thus the comments about verifying the presence or absence of specific genes in the numerous species by other means is valid.
Additionally, the treatment of all U and Z genes as identical units while ignoring the allelic diversity of each gene within those classes is indeed a serious flaw in the reasoning of Malstrøm et al. There is significant variability in diversity U gene families which will have differing effects on T cell selection that simply counting gene numbers will not address.
Dijkstra and Grimholt's critique should be carefully read and addressed.

Is the rationale for commenting on the previous publication clearly described? Yes
Are any opinions stated well-argued, clear and cogent? Yes

Is the conclusion balanced and justified on the basis of the presented arguments? Yes
No competing interests were disclosed. Competing Interests: Referee Expertise: Fish Immunology, MHC genes, antigen presentation I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Thank you for your review and support of our article. We appreciate that an expert such as you is willing to join the public debate so that erroneous/unsubstantiated messages like the ones presented by Malmstrøm can not take hold in our field. et al.

Are any opinions stated well-argued, clear and cogent? Yes
You suggest us to provide lay readers with more background information. Following your suggestion, we have tried to do so, by writing a tentative introduction section, but decided not to use it as we found that the necessary size and discussions would distort the article too much. This article is a correspondence, a discussion about another paper, and we feel that this discussion character should be clear throughout the article. Furthermore, F1000Research proscribes that "Correspondence articles are short, peer reviewed comments", and, as it is, the article is already quite lengthy.
You ask us to be more constructive in our criticism towards Malmstrøm , and to explain to et al. them what they should have been doing instead. First of all, we would have liked them to make their claims on MHC class II system genes solid, because that seems within close experimental reach, and we believe that they should have concentrated their article on that topic. In regard to MHC class I, we would have liked them to either study those genes intensively, or to have refrained from any modeling, and especially to have refrained from highlighting a resulting model in the title. In the supplementary files 1 and 2 we already made some detailed comments about which experiments would be necessary to make the MHC class II system claims more solid and within acceptable standards. Now we also added a conclusion section to the main text which explains, in general terms, what we would like Malmstrøm to do or not to do as follows: et al.

"Conclusion "Conclusion
"Malmstrøm used low-coverage genome sequencing for comparison of 66 mostly neoteleost et al. fish, and so helped with elucidating their phylogeny. They found that intact MHC class II system genes may be completely absent in Gadiformes, and believed that related non-gadiform fishes have intact MHC class II system genes. However, their genomic databases were incomplete and in the case of many Gadiformes spiked with reads from MHC class II system genes that may or may not be contaminations, so that final conclusions require some additional analysis of at least a few species at the gadiform/non-gadiform clades border. We suggest that they need to perform a number of PCR and sequencing experiments to clarify this matter. When comparing class I and class II situations in their investigated neoteleosts, Malmstrøm also found that their earlier et al. theory, which was that the absence of an MHC class II system might explain the high number of MHC class I genes in Atlantic cod , could not be corroborated. Instead, solely based on estimations of U+Z a3 fragment numbers, they proposed a new theory on MHC class I evolution which they referred to in their manuscript title. We hope to have shown sufficiently that their conclusions on MHC class I evolution were unsubstantiated, that estimation of U+Z a3 fragment numbers is not a proper way to analyze MHC functions or MHC evolution, and that, apart from not investigating logical units that are better suited for their methods of modeling, also the number estimations and modelling systems used by Malmstrøm were flawed and/or non-trustworthy. et al. Before any meaningful discussion can be started about the evolution of MHC class I genes in neoteleosts, a much higher level of information about sequences and genomic positions is necessary." As to your question how we would have addressed the issue of MHC numbers/variation and thymic T cell depletion. The answer is that we probably wouldn't try. Meaningful modeling would require a better understanding of how positive and negative T cell selections in the thymus, involving multiple different MHC molecules, contribute both quantitatively and qualitatively to the T cell pool. Regardless of the thymic selection model, we might be interested in MHC class I gene numbers, but then we would first want to separate the question into functional MHC subclasses. For example, it may be interesting to see whether there is some evolutionary pattern in the number of classical type polymorphic MHC class I genes (a question which Malmstrøm erroneously et al. seem to think that they were addressing), in the number of genes of nonclassical MHC class I families, or the number of MHC class I pseudogenes. Possibly, selection for increased diversification speeds may select for unstable haplotypes with many tandemly organized genes and pseudogenes that can function as a recombination reservoir. It would be interesting to see whether there are differences in numbers of MHC class I pseudogenes between those fish species that more stably maintained ancient variation and those that more rapidly acquired new variation. Basically, we would try to get good information on all investigated genes (and not just count a3 fragments), and then would try to answer questions one at the time and directly connected to the data (and not try to make an unsubstantiated overarching model). Without good data, we simply wouldn't start modeling. I hope our above answer, and our extended criticism in the article main text of the models used by Malmstrøm , is also sufficient as a response to the issues that you raised in paragraph III of et al. No competing interests were disclosed.

Competing Interests:
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com