A new dataset of pheromone and pheromone-gene structures from the ciliate, Euplotes crassus

Like many other organisms, ciliates communicate and interact socially via diffusible chemical signals, named pheromones, that are functionally associated with a genetic mating-type mechanism of cell self/not-self recognition. In Euplotes species, pheromones form species-specific families of small, globular, and disulfide-rich proteins folding into exclusively helical secondary structures. Each is specified by one of a series of high-multiple alleles that are inherited in Mendelian fashion with relationships of co-dominance at the so-called mat genetic locus of the cell transcriptionally inert micronuclear genome, and expressed in the transcriptionally active macronuclear genome as individual DNA molecules in which the central coding region is flanked by 5’-leader and 3’-trailer noncoding regions ending with C4A4/T4G4 telomeric repeats. In E. crassus, a cosmopolitan marine species with a long tradition in the study of ciliate mating systems and breeding patterns, oligonucleotides specific to amino acid sequences of pheromones Ec-1 and Ec-α were previously used to clone and sequence a first set of four structurally distinct macronuclear (mac) pheromone coding genes, mac-ec-α, mac-ec-1, mac-ec-2 and mac-ec-3, from two interbreeding strains, L-2D and POR-73. The use of these oligonucleotides in PCR amplifications of macronuclear DNA preparations from three other E. crassus interbreeding strains, ES10, Fava4 and MN4, has now resulted in the characterization of a second set of eight new pheromone coding genes, mac-ec-β, mac-ec-γ, mac-ec-δ, mac-ec-ε, mac-ec-µ, mac-ec-4, mac-ec-5 and mac-ec-6. Multiple alignment between previously and newly determined pheromone-gene sequences reinforces the concept that the E. crassus pheromone-gene family includes two sub-families, which likely reflect a duplication of the micronuclear mat gene locus and represent an apomorphic trait of the E. crassus clade. Members of one sub-family (each identified with a Greek letter) show a 500-bp 5’-leader noncoding region rich in AGGA/AGGGA repetitions, and encode 56-amino acid pheromones with eight conserved Cys residues. Members of the other sub-family (each identified with an Arabic numeral) show an 800-bp 5’-leader noncoding region without AGGA/AGGGA repetitions, and encode 45-amino acid pheromones with ten conserved Cys residues.


a b s t r a c t
Like many other organisms, ciliates communicate and interact socially via diffusible chemical signals, named pheromones, that are functionally associated with a genetic mating-type mechanism of cell self/not-self recognition. In Euplotes species, pheromones form species-specific families of small, globular, and disulfide-rich proteins folding into exclusively helical secondary structures. Each is specified by one of a series of high-multiple alleles that are inherited in Mendelian fashion with relationships of co-dominance at the so-called mat genetic locus of the cell transcriptionally inert micronuclear genome, and expressed in the transcriptionally active macronuclear genome as individual DNA molecules in which the central coding region is flanked by 5'-leader and 3'-trailer noncoding regions ending with C 4 A 4 /T 4 G 4 telomeric repeats. In E. crassus , a cosmopolitan marine species with a long tradition in the study of ciliate mating systems and breeding patterns, oligonucleotides specific to amino acid sequences of pheromones E c -1 and E c -α were previously used to clone and sequence a first set of four structurally dis-tinct macronuclear ( mac ) pheromone coding genes, mac-ecα, mac-ec-1, mac-ec-2 and mac-ec-3 , from two interbreeding strains, L-2D and POR-73. The use of these oligonucleotides in PCR amplifications of macronuclear DNA preparations from three other E. crassus interbreeding strains, ES10, Fava4 and MN4, has now resulted in the characterization of a second set of eight new pheromone coding genes, mac-ecβ, mac-ec-γ , mac-ec-δ, mac-ec-ε, mac-ec-μ, mac-ec-4, macec-5 and mac-ec-6 . Multiple alignment between previously and newly determined pheromone-gene sequences reinforces the concept that the E. crassus pheromone-gene family includes two sub-families, which likely reflect a duplication of the micronuclear mat gene locus and represent an apomorphic trait of the E. crassus clade. Members of one sub-family (each identified with a Greek letter) show a 500-bp 5'-leader noncoding region rich in AGGA/AGGGA repetitions, and encode 56-amino acid pheromones with eight conserved Cys residues. Members of the other sub-family (each identified with an Arabic numeral) show an 800-bp 5'-leader noncoding region without AGGA/AGGGA repetitions, and encode 45amino acid pheromones with ten conserved Cys residues.
© 2023 The Author(s Euplotes strains used in this study were taxonomically identified as E. crassus on the basis of morphological and genetic criteria [1] . Macronuclear DNA preparations were amplified using primers designed on conserved sequences adjacent to the telomeric ends and previously determined from other E. crassus pheromone coding genes [2] . Amplicons were cloned, plasmids of five distinct clones were Sanger-sequenced in both directions, and partial sequences were overlapped to produce the complete gene sequences.

Value of the Data
• The mating-type mechanism that controls ciliate conjugation is basically a phenomenon of cell self/not-self recognition driven by cell interactions with diffusible protein pheromones. The determination of new pheromone and pheromone-gene sequences from E. crassus improves the knowledge of the molecular basis of this mechanism. • In E. crassus , pheromones are secreted in tiny amounts making it hard to determine their complete amino acid sequences in a significant number via direct chemical analysis of native protein preparations. The PCR primers reported in the dataset are designed on conserved stretches of the pheromone-gene sequences, and help to circumvent the chemical way by cloning and sequencing other pheromone coding genes from new E. crassus strains. • An upgraded dataset of pheromone and pheromone-gene sequences may be used for a more reliable identification of which pheromone structural specificities are functionally more relevant in diversifying conspecific cells in their spectrum of mating interactions, and which nucleotide sequence stretches turn out to be useful in designing new primer combinations for qPCR analyses of pheromone-gene amplification and expression. • The knowledge of more E. crassus pheromone amino acid sequences greatly validates the application of the AlphaFold AI system in scouting their three-dimensional folding, and the knowledge of this folding pattern may add a relevant brick to the general picture of how the Euplotes pheromone molecular structures-so far determined only from E. nobilii, E. petzi and E. raikovi [ 3 , 4 ]-evolve in concert with speciation and habitat colonization.

Objective
Results from traditional Mendelian analyses of the E. crassus, E. minuta and E. vannus mating systems have for long accounted for an eccentric model of serial dominance relationships between single-locus multiple mat -alleles [ 5 _ 7 ], and the synthesis of insoluble (membranebound) protein pheromones [8] . Results from two original research articles, based on molecular approaches to the E. crassus mating system, conflicted with this model [ 2 , 9 ]. They provided evidence that (i) E. crassus cells conform with the more common (basic) ' E. patellamodel' in relation to mat -allele relationships of co-dominance and the synthesis of diffusible pheromones [10] , and (ii) the difference resides in the evolution of two structurally distinct pheromone-coding gene sub-families likely reflecting a phenomenon of mat -gene locus duplication [2] . In addition to reinforcing evidence that a mat -gene locus duplication is a distinctive (apomorphic) feature of the E. crassus clade, the knowledge of new E. crassus pheromone and pheromone-gene sequences may help (i) planning in silico analyses of whether and how pheromones from the two sub-families interact in binding target cells, and (ii) exploring whether and how these interactions can account for the serial-dominance relationships that were originally presumed to govern mat -allele inheritance and expression in the species of the E. crassus clade. Table 1 reports the denominations of all the pheromone-coding genes that have been cloned from E. crassus and analyzed for their sequences, while Table 2 enumerates the respective Gen-Bank sequence numbers. Figs. 1 and 2 illustrate the multiple alignments of the nucleotide sequences that are overall known from E. crassus for each of the two pheromone-gene subfamilies: one ( Fig. 1 ) including genes (labelled with progressive Arabic numbers) that show closer orthologous relationships with pheromone genes of other Euplotes species; the second ( Fig. 2 ) including genes (labelled with Greek letters) with closer E. crassus specificity. Fig. 3 illustrates the multiple amino acid sequence alignments of the cytoplasmic pheromone precursor forms (pre-pro-pheromones) that are specified by the structurally determined sequences of genes from the two sub-families.   1. Multiple nucleotide sequence alignment of E. crassus pheromone genes of the subfamily (distinguished by Greek letters) encoding pheromone sequences of 56 amido acids with eight Cys residues. Gaps were inserted to optimize the alignment, and dots indicate identical nucleotides. A previously determined sequence is in light letters, and the newly determined sequences are in bold letters. The 5'-leader and 3'-trailer non-coding regions are in lower case letters, the coding region is in capital letters, and the telomeric repetitions are in italics. The ATG start codon and the TAA stop codon are shadowed. Filled and light arrowheads delimit the sequence segments specifying the signal peptide and the pro segment, respectively, of the cytoplasmic (immature) pheromone precursor form (pre-pro-pheromone). In the 5'leader region, a box delimits the domain characterized by AGGA/AGGGA repetitions.

Fig. 2.
Multiple nucleotide sequence alignment of E. crassus pheromone genes of the subfamily (distinguished by Arabic numerals) encoding pheromone sequences of 45 amido acids with ten Cys residues. Gaps were inserted to optimize the alignment, and dots indicate identical nucleotides. Three previously determined sequences are in light letters, and the newly determined sequences are in bold letters. The 5'-leader and 3'-trailer non-coding regions are in lower case letters, the coding region is in capital letters, and the telomeric repetitions are in italics. The ATG start codon and the TAA stop codon are shadowed. Filled and light arrowheads delimit the sequence segments specifying the signal peptide and the pro segment, respectively, of the cytoplasmic (immature) pheromone precursor form (pre-pro-pheromone).

Cell Cultures
The E. crassus strains used for the pheromone-gene sequence determination were expanded each starting from a single specimen isolated from samples of shallow water and sediment. They were cultivated in sterilized natural, or artificial seawater, at 20-22 °C under a light-dark cycle following routine procedures [2] and using the green algae Dunaliella tertiolecta as standard food source.

Ethics Statements
This work did not involve experimental analyses of animals or humans.

Data Availability
Pheromone and pheromone-gene sequences from the marine ciliate, Euplotes crassus

Declaration of Competing Interest
The authors declare no competing financial interest.