Lack of support for Deuterostomia prompts reinterpretation of the first Bilateria

The deuterostome animal groups (Chordata and Xenambulacraria) may not be each other’s closest relatives.

Linkage of nkx2-1/2 to msxlx and foxA in the octocoral Xenia sp.: The octocoral Xenia sp. encodes likely orthologs of several pharyngeal cluster genes on the same scaffold, within around 100 genes (0.7Mb) of each other for the core transcription factors:

Linkage of msxlx and pax1/9 in Phoronis australis:
Phylogenetic analysis identified g118.t1 as msxlx and g100.t1 as pax1/9, both encoded on 'scaffold1' (annotation file: 51_pau_v2.gff). There are 17 intervening genes but 3 of these have no database hits in the NCBI NR protein database. >Pa|g100 We find no evidence for pax1/9 orthologs outside of the Bilateria (in the context of our analysis, cnidarians or Trichoplax).

Gene novelties
We examined the 31 genes / gene families highlighted as deuterostome novelties in the supplementary information of Simakov et al. to test their status in light of new sequence resources. Our primary consideration has been the identification of candidate orthologs within protostome and non-bilaterian animal sequence sets. Identification of such sequences leads to the inference that the gene was present in the common ancestor of protostomes and deuterostomes (i.e. Urbilateria). Simakov et al. classify some genes as deuterostome specific, despite having non-bilaterian animal representatives, as a consequence of inferred loss within protostomes. Such genes are plesiomorphic with respect to Bilateria and therefore unsuitable as markers of deuterostome monophyly. We recognise that demonstrating deuterostome monophyly was not the purpose of Simakov et al. in gathering these data.
In the "Novel aspects of TGFbeta signaling genes in deuterostomes" category of Simakov et al, there is good evidence of urbilaterian ancestry for 3 out of 5 of their examples, with a potential fourth dependent on the precise timing of a gene duplication event (for which no evidence is available). In their "Sialo-glycoproteins and the evolution of muco-ciliary filter feeding" category, out of 7 examples, we consider 6 likely present in Urbilateria. In the category "Deuterostome novelties for physiological-metabolic specializations" at least 10 out of 14 examples were likely present in Urbilateria with a further example likely hemichordate, rather than deuterostome, specific. Finally of the 5 examples in "Deuterostome novelties without eukaryotic or prokaryotic homologs", we identify one that was likely present in Urbilateria. s We review our reasoning below. Homoscleromorph (Plakina, Corticium) sponge transcriptomes were assembled from reads using the methods described, and we provide example sequences from these assemblies where relevant. We report bit scores as these do not vary with database size (scores from blastp, see methods). All hits were statistically significant (P < 0.001) unless otherwise noted. Section headings refer to human gene names, with multiple paralogs indicated in parentheses. Genes are presented with the same section numbering as Simakov et al.

LEFTY (1,2)
The LEFTY protein is composed of a pro-peptide and TGF-β signalling domain. Simakov et al. reported the existence of a lophotrochozoan ortholog of the LEFTY pro-peptide lacking the signalling domain (and we thus infer its presence in Urbilateria). In agreement with them we were unable to identify protostome or non-bilaterian LEFTY signalling domain orthologs.

GDF1/Univin
Blast searches and phylogenetic analysis revealed an ortholog in the protostome brachiopod Lingula anatina (in a similar tandem duplication arrangement with BMP2/4 to that discussed by Simakov et al.). The presence in Lingula has also been noted by Luo and co-workers (43).
Our assessment: present in Urbilateria, because present in protostomes and deuterostomes.

TGFB (1,2,3)
The discussion of Simakov et al. notes the presence of sponge and anthozoan sequences and that deuterostome TGFb2 genes were "likely derived from a bilaterian ancestral sequence". In agreement with them, we identified orthologs of TGFB1/2/3 in anthozoans and sponges.
Our assessment: present in Urbliateria, because present in deuterostomes and non-bilaterian animals.

THBS (1,2)
Simakov et al. note the presence of a sponge sequence with similarity to the N-terminal half of THBS1/2 and note that "it may reflect a metazoan sequence that persisted through the bilaterian ancestor to the deuterostome stem". We agree with this, and identified further sponge genes including VWc, TSP1, TSP3 and repeats and other relevant domains.
Our assessment: present in Urbilateria, because present in deuterostomes and non-bilaterian animals.

TGFBR2
TGFBR2 orthologs are present in Ambulacraria and Chordata. These are paralogous to Activin receptors. The TGFRB2 ectodomain sequence is similar to the extracellular domain of Actvin receptors. Although this is not statistically significant, neither is the similarity between, for instance the human TGFRB2 ectodomain and that of Saccoglossus or Strongylocentrotus orthologs, with a blast search of the human TGFRB2 ectodomain returning no significant nonchordate hits.
Our assessment: similar sequences were clearly present in Urbilateria. Deuterostome specificity depends on the precise timing of a gene duplication event, with deuterostome specificity (if this occurred on the deuterostome stem) or protostome loss (if origin on bilaterian stem).
Sialo-glycoproteins and the evolution of muco-ciliary filter feeding

GNE
We identified homoscleromorph sponge orthologs of GNE, with the best hit of human GNE to our sponge sequence database scoring 863 bits, and the best hit to bacteria scoring 318 bits.
Our assessment: present in Urbilateria, because present in deuterostomes and non-bilaterian animals.

CMAH
We identified homoscleromorph sponge orthologs of CMAH, with the best hit of mouse CMAH to our sponge sequence database scoring 596 bits, and the best hit to bacteria scoring 219 bits. (Note this gene is absent in human).
Our assessment: present in Urbilateria, because present in deuterostomes and non-bilaterian animals.
Example sequence:

ST6GALNAC (3,4,5,6):
We identified homoscleromorph sponge orthologs of ST6GALNACs, with the best hit of human ST6GALNAC4 to our sponge sequence database scoring 214 bits, and the best hit to bacteria scoring 59.3 bits.
Our assessment: present in Urbilateria, because present in deuterostomes and non-bilaterian animals.
Example sequences:

ST3GAL (1,2):
We identified homoscleromorph sponge orthologs of ST3GALs (in common with Petit et al., 2015 (44) and Simakov et al., 2017 (16)), although with representatives from multiple independently sequenced species. The best hit of human ST3GAL1 to our sponge sequence database scored 236 bits, and the best hit to bacteria scored 71.6 bits. Sponge sequences searched against the NCBI NR database retrieve vertebrates as best hits, suggesting all animal sequences form a clade.
Our assessment: present in Urbilateria, because present in deuterostomes and non-bilaterian animals.
Example sequences:
Our assessment: subfamily specific to deuterostomes.

B4GALNT (1,2)
We identified likely orthologs of B4GALNTs in homoscleromorph sponges, with the best hit of human B4GALNT1 to our sponge sequence database scoring 213 bits, and the best hit to bacteria scoring 118 bits.
Our assessment: present in Urbilateria, because present in deuterostomes and non-bilaterian animals.
Example sequences:

NEU (1,2,3,4)
As stated by Simakov et al., cnidarian sequences indicate that NEU1 was present in the bilaterian ancestor. These sequences, and other sponge matches are distinct from the deuterostome NEU2/3/4 sequences. There is no way to date the split between NEU1 and NEU2/3/4, which could have occurred on the deuterostome or bilaterian stem. Using more sensitive hidden Markov model database searches initiated with the Pfam BNR_2 model, we identified a couple of NEU related protein sequences in protostomes (see below), but with no clear relationship to deuterostome sequences beyond general family membership. Taken together with many sponge and choanoflagellate family members, this suggests a very dynamic history for the family in the animal lineage.
Our assessment: NEU1-like sequences were present in Urbilateria, because present in deuterostomes and non-bilaterian animals. Deuterostome specificity of the NEU2/3/4 sub-family depends on the precise timing of a gene duplication event, with deuterostome specificity (if this occurred on the deuterostome stem) or protostome loss (if origin on bilaterian stem), but NEU1 is a clear protostome loss.

PCSK9
Blast and other database search methods reveal strong hits to sponge sequences, although there are matches of comparable strength to bacteria. Phylogenetic analysis suggests all animal proteins similar to PCSK9 form a monophyletic group to the exclusion of bacteria. Simakov et al. also report this relationship with Amphimedon sequences, but discount it because of lack of conserved exon boundaries. While this and the presence of an additional 'CUB' domain suggest a degree of independent evolution, we consider it more conservative to hypothesize vertical descent, rather than horizontal gene transfer.
Our assessment: present in Urbilateria because present in deuterostomes and non-bilaterian animals.

PADI
Blast searches initiated with human PADI1 and phylogenetic analysis revealed an ortholog in the protostome Priapulus caudatus. Searches against sponge sequences revealed hits to homoscleromorph sponges with comparable bit scores to bilaterians, and searches initiated with these sequences retrieved metazoan best hits. Other investigators have also recently reported PADI candidates in Priapulus (45).
Our assessment: present in Urbilateria, because present in deuterostomes, protostomes and nonbilaterian animals.

FTO
We identified likely orthologs of FTO in homoscleromorph sponges, with the best hit of human FTO to our sponge sequence database scoring 214 bits. There were no significant hits to bacteria.
Our assessment: present in Urbilateria, because present in deuterostomes and non-bilaterian animals.

ARSK
We identified likely orthologs of ARSK in homoscleromorph sponges, with the best hit of human ARSK to our sponge sequence database scoring 554 bits, and the best hit to bacteria scoring 371 bits.
Our assessment: present in Urbilateria, because present in deuterostomes and non-bilaterian animals.

NHLRC3 (NHL-containing protein ENSG00000188811 of Simakov et al.)
We identified likely orthologs of NHLRC3 in homoscleromorph sponges, with the best hit of human NHLRC3 to our sponge sequence database scoring 243 bits, and the best hit to bacteria scoring 151 bits.
Our assessment: present in Urbilateria, because present in deuterostomes and non-bilaterian animals.

Choline monoxygenase-like (also called CMO)
Sequence searches initiated with the Saccoglossus protein (XP_002738379.1, see below) retrieved strong matches to deuterostomes and the protostome Phoronis australis (349 bits). The best hit to bacteria scores 351 bits, but the Phoronis sequence searched against the entire NR database of the NCBI retrieves animal sequences as best matches.
Our assessment: present in Urbliateria, because present in protostomes and deuterostomes.

Ectoine synthase
Sequence searches initiated with the Saccoglossus protein NP_001171779.1, L-ectoine synthaselike retrieved strong matches to anthozoan proteins with a best score of 142 bits, while the best hit in bacteria scored 124 bits. Searching the NR NCBI database with the Exaiptasia hit (below) showed best matches to Saccoglossus and other deuterostome proteins.
Our assessment: present in Urbilateria, because present in deuterostomes and non-bilaterian animals.

Ectoine Hydroxylase
Searching the NCBI NR database with Saccoglossus ectoine hydroxylase yields hits to 3 Saccoglossus sequences and 1 from Branchiostoma. Phylogenetic analysis of top 500 hits suggests these animal sequences form a single clade.

Histidine methyltransferase (HMT), bacterial-like, also called methyltransferase
Searches initiated with the Pfam Methyltransf_33 HMM, scoring above the gathering threshold cutoffs ('--cut_ga'), retrieved hits to deuterostome species and sequences from the protostome Crassostrea gigas. Simakov et al. also report the presence of C. gigas hits, and their argument for this gene being deuterostome specific rests on the oyster sequences being contamination. Using current sequence databases, similar sequences are also present in other bivalve molluscs.
Our assessment: present in Urbilateria, because present in protostomes and deuterostomes.

Aromatic amino acid decarboxylase family, microbial-like; AAADC
Searches initiated with Saccoglossus tyrosine decarboxylase (XP_002731852.2) revealed hits to other metazoan and bacterial proteins. A tree reconstructed from these hits focussing on the Pyridoxoc_deC Pfam region revealed a small clade derived from Saccoglossus and Branchiostoma, nested within a bacterial clade, to the exclusion of other metazoan genes.
Our assessment: deuterostome specific, because shared by Saccoglossus and Branchiostoma nested within bacterial clade.

5-methyltetrahydropteroyltriglutamate--homocysteine methyltransferase-like
Blast searches initiated with the Saccoglossus sequence (XP_006818155.1) reveal likely orthologs in the protostome Lingula anatina and several cnidarian species. With current sequence databases, the gene no longer appears to satisfy criteria for deuterostome specificity.
Our assessment: present in Urbilateria, because present in deuterostomes, protostomes and nonbilaterian animals.

Major Facilitator Transporter algal-like, MFS algal-like
When searching with the major facilitator transporter algal-like from Saccoglossus kowalevskii (NCBI accession: ALR88600.1 see below), we were not able to detect non-hemichordate hits within the animals, and indeed only hits to hemichordate sequences are recorded by Simakov et al. As such, on their terms, this sequence does not qualify for the general classification of a 'deuterostome specific', but rather is hemichordate specific, and is uninformative for deuterostome phylogeny.

Multicopper oxidase (MCO); also called Bilirubin oxidase-like
We find similar sequences to Saccoglossus multicopper oxidase in tunicates, amphioxus, but not protostomes. We also find proteins with the same domain structure in Monosiga brevicollis and other choanoflagellates. As choanoflagellates are the sister group of animals, this raises the possibility of presence in non-bilaterian animals, but we are unable to produce evidence that the choanoflagellate sequences form a monophyletic clade with the animal sequences.

FAM198A
We only detected deuterostome hits for searches initiated with human FAM198A.

C9orf9 (Rsb66)
We only detected deuterostome hits for searches initiated with human C9orf9.

MREG
We only detected deuterostome hits for searches initiated with human MREG.

SMIM19
We only detected deuterostome hits for searches initiated with human SMIM19.

EFCC1
Database searches initiated with human EFCC1 detected a likely ortholog in the protostome Priapulus caudatus, with no significant hits to non-animals.
Our assessment: present in Urbilateria, because present in protostomes and deuterostomes.     Table S4. Percentage of simulation replicates supporting alternative topologies for the Deuterostomia clades under three different true tree hypotheses for different datasets and models. The datasets are based on a reduced version of the Laumer dataset consisting of 36 taxa covering all major branches of the phylogeny. In the first six rows, the results are based on a subselection of 50,000 sites and on the final four rows the results are based on a subselection of 10,000 sites. For both the 10,000 and 50,000 site samples we repeated the experiments after removing the 13 longest protostome and outgroup branches ("no-longs") and the 13 shortest protostome and outgroup branches ("no-shorts"). For the 50,000 site dataset we repeated the analyses after removing Xenoturbella ("noXeno"). The two models used for the inference of the trees are the LG+F+G and the C60+LG+F+G. The three tree hypotheses assumed for the simulations differ only with respect to the relationships of the deuterostome clades (DM: monophyletic deuterostomes, D1: Xenambulacraria are sister to Protostomia, D2: Chordata are sister to Deuterostomia).