Genome comparison reveals that Halobacterium salinarum 63‐R2 is the origin of the twin laboratory strains NRC‐1 and R1

Abstract The genome of Halobacterium strain 63‐R2 was recently reported and provides the opportunity to resolve long‐standing issues regarding the source of two widely used model strains of Halobacterium salinarum, NRC‐1 and R1. Strain 63‐R2 was isolated in 1934 from a salted buffalo hide (epithet “cutirubra”), along with another strain from a salted cow hide (91‐R6T, epithet “salinaria,” the type strain of Hbt. salinarum). Both strains belong to the same species according to genome‐based taxonomy analysis (TYGS), with chromosome sequences showing 99.64% identity over 1.85 Mb. The chromosome of strain 63‐R2 is 99.99% identical to the two laboratory strains NRC‐1 and R1, with only five indels, excluding the mobilome. The two reported plasmids of strain 63‐R2 share their architecture with plasmids of strain R1 (pHcu43/pHS4, 99.89% identity; pHcu235/pHS3, 100.0% identity). We detected and assembled additional plasmids using PacBio reads deposited at the SRA database, further corroborating that strain differences are minimal. One plasmid, pHcu190 (190,816 bp) corresponds to pHS1 (strain R1) but is even more similar in architecture to pNRC100 (strain NRC‐1). Another plasmid, pHcu229, assembled partially and completed in silico (229,124 bp), shares most of its architecture with pHS2 (strain R1). In deviating regions, it corresponds to pNRC200 (strain NRC‐1). Further architectural differences between the laboratory strain plasmids are not unique, but are present in strain 63‐R2, which contains characteristics from both of them. Based on these observations, it is proposed that the early twentieth‐century isolate 63‐R2 is the immediate ancestor of the twin laboratory strains NRC‐1 and R1.

. Detailed interstrain comparisons revealed that the chromosomes of R1 and NRC-1 are completely colinear and virtually identical (Pfeiffer et al., 2008). They are also highly similar (in silico DDH, 95%) to the type strain (91-R6 T ) (Pfeiffer et al., 2020), confirming the taxonomic assignment of strain NRC-1 to the species Hbt. salinarum (Gruber et al., 2004). The availability of the high-quality genome sequence for strain 63-R2 now allows the interstrain genome comparisons of all four strains.
A distinctive feature of Hbt. salinarum is a high rate of spontaneous mutation due to the movement of, and recombination between, mobile genetic elements (MGEs) (ISH elements, transposons, "the mobilome"), and this has been a focus of study from the 1980s onwards (DasSarma et al., 1983;Ng et al., 2000;Pfeifer & Blaseio, 1990;Pfeiffer et al., 2008Pfeiffer et al., , 2020. ISH elements are not only associated with insertional inactivation of genes but also genome inversions and other genome rearrangements Pfeiffer et al., 2020). Most of the differences between the twin laboratory strains NRC-1 and R1 could be attributed to this highly active mobilome (Pfeiffer et al., 2008).
In this study, the core genomes for all four strains were compared to assess the relationship between laboratory strains NRC-1 and R1, and the original Lochhead strains 91-R6, 63-R2. In these comparisons, strainspecific copies of MGEs were removed to reduce the background noise and enhance any evolutionary signals. The genome of strain 63-R2 (NRC 34001) was found to be exceedingly similar to the laboratory twins NRC-1 and R1, and the types of changes seen are consistent with strain 63-R2 being the ancestral strain from which the two laboratory strains were derived. We believe that the origin of these laboratory strains has now been resolved.
For convenience, summaries of these methods are given below.
2.1 | Formatting the chromosomal sequences of strains 63-R2, 91-R6, NRC-1, and R1 for comparative analysis In-house tagged versions of the genome sequences of strains 63-R2, 91-R6, R1, and NRC-1 were generated in which all unique sequences between MGEs were identified, as well as each MGE and associated target sequence duplication (TSD).
After the removal of comments, a "total" sequence was available for each strain. The concatenation of these sequences resulted in a "total" database for subsequent analyses, especially the determination of positions in the original genome sequences. In this "total" database file, line breaks around MGEs are preserved so that their visual identification is simple, especially when the MGE is enclosed by a TSD. A copy of that file served as the initial version of the "core" database, open for subsequent manual modification, most importantly the removal of strain-specific copies of MGEs.
2.2 | Chromosome comparison strategy and generation of core chromosomes devoid of strainspecific MGEs for strains 63-R2, NRC-1 and R1 Preliminary genomic comparisons (BLASTn, MUMMer) had indicated that the genome sequence of strain 63-R2 was much more closely related to those of the twin laboratory strains NRC-1 and R1 than to that of strain 91-R6 T , and because of this, the initial analyses were restricted to these three strains. Applying an iterative comparison procedure, "core" chromosome sequences devoid of strain-specific MGEs were generated. The build-up of this "core" database is described in Supplementary Methods, deposited at Zenodo: https:// doi.org/10.5281/zenodo.7780801. All eliminated MGEs, including their position in the original genome sequence, are documented in Table A1.
BLASTn analyses of the "core" sequences resulted in a complete set of HSPs (high-scoring pairs) that correlated the complete "core" sequence of the chromosome from strain 63-R2 against the core chromosomes from the twin laboratory strains NRC-1 and R1.
HSP positions of the interstrain comparison are reported for the "core" database, but to allow easy correlation with biological features, all "core" database positions have been correlated with the corresponding positions in the original sequences of the "total" database (Supplementary Table S4 Table A2), or are present in all three of the other strains (63-R2, NRC-1, and R1; documented in Table A1).
The core chromosomes of strains 91-R6 and 63-R2 were compared by BLASTn, leading to long HSPs, interrupted by unique sequences, which typically were short. Two long regions were encountered that are considered unique despite having a small number of short HSPs (see the Supplementary Methods for details).
2.4 | Comparison of the reported plasmids pHcu235 and pHcu43 from strain 63-R2 to plasmids pHS3 and pHS4, respectively, from strain R1 Preliminary comparisons (BLASTn) indicated that the sequence of plasmid pHcu235 from strain 63-R2 is most closely related to plasmid pHS3 from strain R1, so these plasmids were compared in detail using the same procedure as described for chromosomal comparison (see above, Section 2.2). Plasmid pNRC200 from strain NRC-1 showed a more patchy relationship and was not included in this analysis.
Preliminary comparisons (BLASTn) indicated that the unique, 2.3 kb sequence of plasmid pHS4 from strain R1 is closely related to a region on plasmid pHcu43 from strain 63-R2. Thus, the sequences of these two plasmids were compared. A plasmid corresponding to pHS4 has not been reported for strain NRC-1, and thus a plasmid from this strain was not included in the analysis.
The position of strain-specific MGEs and their associated TSD which were removed upon generation of core plasmid sequences, are listed in Table A3 (pHcu235/pHS3) and Table A4 (pHcu43/pHS4).
The final results of this analysis are the HSPs obtained with the "core" sequence of pHcu235 against the "core" sequence of plasmid pHS3 from strain R1 and the HSP obtained for the "core" sequences of pHcu43 against pHS4.
2.5 | Validation that a plasmid corresponding to pHS4 from strain R1 is absent from strain NRC-1 This is based on an analysis of Illumina sequence reads obtained upon resequencing of strain NRC-1 (Kunka et al., 2020)  When sequence duplications between contigs exceeded the length of even the longest PacBio reads, related plasmids were used to guide assembly at the junctions of these duplications. For plasmid pHcu190, plasmids pNRC100 and pHS1 were used as guide sequences, and a complete plasmid could be assembled. For plasmid pHcu229, plasmids pNRC200 and pHS2 were used as a guide. The assembly remained incomplete at both ends, due to a very long duplication between pHcu229 and pHcu190. No heterogeneities could be detected within this duplication, and thus the sequence of pHcu229 could be completed in silico by transferring the corresponding sequence from pHcu190. The sequences of pHcu190 and both versions of pHcu229 are deposited at Zenodo: https://doi.org/ 10.5281/zenodo.7288901.

| Subassembly walking
Sequence duplications that exceed the length of PacBio reads cannot be resolved by regular assembly procedures. In this case, we applied a PFEIFFER and DYALL-SMITH | 3 of 25 method that we refer to as "subassembly walking" which is described in Supplementary Methods: https://doi.org/10.5281/zenodo.

7780801.
For subassembly walking attempts, we selected PacBio reads based on the following sequence features: (a) unique sequences from other strains which were not covered in the set of contigs from strain 63-R2, (b) sets of PacBio reads selected according to a yet unexplored junction between a unique sequence and a duplication; this enabled the minimum length of the duplicated sequence which is connected to that junction to be determined, and (c) optional MGE's, where some reads contained the MGE-free sequence version, while others exemplified the junction between the MGE and the adjacent unique sequence.
2.8 | Assembly of strain 63-R2 contigs contigDRAFT1 and contigDRAFT2 which represent the residuals of a plasmid that has integrated into the chromosome Some sequences in strains R1 and NRC-1 are strain-specific and are not represented in the other strain (R1: 210 kb; NRC-1; 15 kb) (Pfeiffer et al., 2008). Large parts of these strain-specific sequences occur in strain 63-R2. Nevertheless, some of the R1-specific sequences were seemingly absent from this strain and it was attempted to validate their absence. Surprisingly, PacBio reads were identified which contain some of the R1-specific sequences even though these occur neither in the chromosome nor in any of the assembled plasmids from strain 63-R2 (case [a] in Section 2.7). Readsets were selected and assembled within Geneious (de novo assembly tool). Reads were also mapped to available contigs, including minor ones (e.g., short; low coverage; atypical connectivities of duplicated sequences). Emerging contigs were validated and/or extended by subassembly walking, resulting in contig-DRAFT1 and contigDRAFT2.

| Additional bioinformatics tools
As general tools, MUMMER v4 (Delcher et al., 2003) and the BLAST suite of programs v2.2 (Altschul et al., 1997;Johnson et al., 2008) were used for genome comparisons. All of the reported HSPs were obtained by BLASTn with default parameters except for three (−e 0.001; −F F; −C 0). Thus, low-complexity filtering and compositionbased statistics were switched off. This slightly more stringent evalue cutoff was chosen to reduce casual hits. The TYGS server (Meier-Kolthoff & Göker, 2019) was used to query by whole genome comparison if strains represent novel species or belong to known species. Geneious Prime (version 2022.0.2) was used for read mapping and read assembly (Kearse et al., 2012 flagged as a "younger heterotypic synonym" so that this strain is nowadays assigned to the species Hbt. salinarum (see above).
This result was further corroborated by MUMMer comparisons of the chromosome sequences as deposited in GenBank 3.2 | Detailed comparison of the chromosomes from the most closely related strains, 63-R2, NRC-1, and R1 The chromosomes from the three extremely closely related strains 63-R2, NRC-1, and R1 were compared in detail using BLASTn. Due to the combination of extremely similar chromosome sequences and a highly active mobilome, mutations, which carry the evolutionary signal may be outnumbered by MGE mobilization events, which are of little relevance for unraveling the deeper evolutionary history of these strains. To avoid this problem, chromosome sequences devoid of strain-specific MGEs were generated in silico ("core sequences") and then used for comparison.
For transparency, all strain-specific MGEs which were removed during this procedure are documented in Table A1. The final "core"

Note:
The core chromosomes of strains 63-R2 and R1 were compared using BLASTn (see also Figure 2). Core chromosomes are devoid of strain-specific mobile genetic elements (MGEs) (see Table A1 for the coordinates of strain-specific MGEs in the complete chromosome sequence). For all coordinates from the core chromosome, the corresponding coordinates from the complete chromosome are provided in Supplementary Table S4: https://doi.org/10.5281/zenodo.7780801. For HSPs (high-scoring pairs, i.e., BLASTn alignment blocks), the start and end base is given. Also, raw counts (matching bases and total bases) as well as the number of gap characters, as returned by BLASTn, are listed. The % nucleotide sequence identity was recomputed (to provide two decimal point accuracy). High-scoring pairs (HSP) are tagged R_HSP with a serial number (R to indicate comparison against strain R1). Regions that are not covered by HSPs are shown as breaks. The first and last base is given if there is an unaligned sequence. Otherwise, the term "directly adjacent" or, if applicable, the number of overlapping bases (bp; base pairs) is given. Breaks are tagged R_break with a serial number. For breaks, a comment briefly mentions the key aspect. In the core chromosome sequence of strain NRC-1, the R1 specific break 1 is within N-HSP1 (pos 170,773-180,248), the R1 specific break 4 is within N-HSP4 (pos 1,861,061-1,861,164).
The HSPs are separated by unique sequence breaks that are typically short (below 2 kb). Only a few of these are longer than 3 kb (11 of the unique regions). There are eight long breaks (3.2 kb to 9.4 kb), two very long pairs of unique sequences (47.0 to 78.2 kb), and one extremely long unique sequence (164.2 kb). Of the eight long breaks, six have features that are characteristic for proviruses (breaks 4, 12, 13 + 15, 25, 26). One long unique sequence codes for a type I restriction enzyme (break 2), and one long break is due to deletion of five genes, including nrdAB (break 34).
Within two of the long breaks are a small number of homologous sequences which are short (typically less than 1 kb) and which show reduced sequence similarity (typically less than 90% nucleotide sequence identity have a well-known AT-rich island (61 kb) (Ng et al., 2000;Pfeifer & Betlach, 1985). This is replaced by a distinct AT-rich island in strain 91-R6 (47 kb) (Pfeiffer et al., 2020) (Table A7 and S1.1 in Supplementary Text S1;

Note:
The core chromosomes of strains 63-R2 and NRC-1 were compared by BLASTn (see also Figure 2). Core chromosomes are devoid of strain-specific mobile genetic elements (MGEs) (see Table A1 for the coordinates of strain-specific MGEs in the complete chromosome sequence). Tags start with N_ (to indicate comparison against strain NRC-1). For further explanations of the these are very closely related to plasmids pHS3 and pHS4 from strain R1. The correlation between the plasmids from strain 63-R2 and their most closely related counterparts from strains R1 and NRC-1 is illustrated in Figure 3a,b and Figure A1.
A detailed description of the similarities and differences of these sequences, from which one plasmid could be finalized (pHcu190), and another partially assembled (pHcu229). Long repeats constrained the assembly of the latter plasmid to 170 kb but it could be expanded in silico to its predicted full length of 229 kb. The correlation between the novel plasmids from strain 63-R2 and their most closely related counterparts from strains R1 and NRC-1 is illustrated in Figure 3c,d.
The subregions of the various episomal plasmids from strains 63-R2, NRC-1, and R1 are depicted in Figure A1.
Two additional contigs were obtained which reflect the integration of a plasmid into the chromosome. They share a large region in common but proved resistant to finalization despite detailed scrutiny.
They are reported as contigDRAFT1 and contigDRAFT2.
3.5.1 | Assembly of plasmid pHcu190 from strain 63-R2 and its comparison to pNRC100 from strain NRC-1 and pHS1 from strain R1 Plasmid pHcu190 was assembled as a complete, circularized plasmid.
It is closely related to plasmid pHS1 from strain R1 (Figure 3c) Table A1). The two colored stars in the center indicate strain-specific copies of the same MGE (ISH2) which are integrated at distinct but very closely spaced positions (872 bp apart, serials 21 and 22 in Table A1). The strain-specific ISH2 in the integrative element of strain NRC-1 is also indicated. Strain-specific MGEs of category CB are not indicated because they do not differ among the represented strains.
deviations, the extreme similarity between pHcu190 and R1 (99.98% sequence identity over 120.6 kb) supports our hypothesis that strain R1 is a direct descendent of the cultivated Lochhead strain. Even more remarkable is the extreme similarity to plasmid pNRC100 from strain NRC-1. After the removal of strain-specific MGEs (see Table A5), pHcu190 and pNRC100 could be fully described by a single 183.6 kb HSP with 99.99% nucleotide sequence identity (Table 5a, Figure 3c). Notably, pHcu190 also carries the long inverted duplication which is known from pNRC100 (Ng et al., 1998) but is absent from pHS1. The remarkable similarity between pHcu190 and pNRC100 makes it likely that strain NRC-1 is also a direct descendent from the cultured Lochhead strain 63-R2. In summary, the plasmids F I G U R E 3 Correlation of the episomal plasmids of strain 63-R2 with those from strains NRC-1 and R1. Colored solid lines indicate colinear plasmid sequences (green: strain R1; blue: strain 63-R2; red: strain NRC-1). Strains and plasmids are indicated to the left and sequence length and accession are indicated to the right. The term "This study" is used for plasmids that have not been included in the original sequencing report and thus have not been deposited in GenBank. Loosely dashed lines indicate the absence of the corresponding sequence from the respective plasmid. All episomal plasmids are circular (not indicated). Plasmids and regions which are only partially displayed are indicated by a terminal slant line pair. Also shown are strain-specific mobile genetic elements (MGEs) (black line extending above: present, MGE type indicated by a tag; gray line extending below: MGEs absent). For MGEs that occur in two strains, tags are highlighted yellow. There is additional, panel-specific markup. (a) A 16 kb sequence that is duplicated between pHcu235 and pHcu190 from strain 63-R2 is highlighted. The copy from pHcu235 is shared with pHS3 from strain R1 but is absent from pNRC200 from strain NRC-1. For coordinates and details see Table 3. (b) For coordinates and details see Table 4. (c) A 16 kb sequence that is duplicated between pHcu235 and pHcu190 from strain 63-R2 is highlighted. The copy from pHcu190 is shared with pNRC100 from strain NRC-1 but absent from pHS1 from strain R1 due to a large deletion in that plasmid. The pHS1specific sequence "M" is indicated at the end of pHS1 (gray). A 4.5 kb sequence (indicated by a short light red line "below") has been deleted in pHS1 and at the same position, a 19.3 kb sequence has been inserted (indicated by light green). This 19.3 kb sequence is also found in the same sequence context in pHS2/pHcu229 (indicated in [d] by a light green line "above"). The long inverted repeat is indicated by two pairs of arrows, the shorter (red, 32 kb) representing the extent of the duplication in pHcu229 and pNRC200, and the longer (orange, 40 kb) representing the extended duplication in pHcu190 and pNRC100. For coordinates and details see Tables 5a and 5b. (d) The sequence of pHcu229 could only be partially assembled due to extensive intra-and inter-plasmid duplications. The vertical dashed lines at the termini indicate this incompleteness. The plasmid could be completed in silico (indicated by a wavy line) as it is most likely identical to the corresponding region of pHcu190, allowing a sequence transfer. A 19.3 kb sequence (indicated by a light green line "above") has been deleted in pNRC200 and at the same position, a 4.5 kb sequence has been inserted (indicated by light red). This 4.5 kb sequence is also found in the same sequence context in pNRC100/pHcu190 (indicated in [c] by a light red line "below"). The inverted repeat is indicated by a pair of arrows (red, 32 kb) which correspond to the shorter arrows in (c). In pHcu229, the arrows traverse the termini of the assembled region and extend into the sequence added in silico. An ISH3D in pHcu229 (pink background) and an ISH3B in pNRC200 (light green background) are integrated into an identical sequence context (see Table A6). The vertical dashed line in the center indicates that the long deletion in pHcu229 partially overlaps with an independent long deletion in pNRC200. For coordinates and details see Tables 6a and 6b. PFEIFFER and DYALL-SMITH | 9 of 25 from strain 63-R2 display "hybrid characteristics" compared to the plasmids of strains NRC-1 and R1, and unify seemingly inconsistent characteristics of the lab twin plasmids. This is further corroborated by the 16 kb sequence that matches between the unrelated plasmids pNRC100 and pHS3, being seemingly "shifted." This 16 kb sequence is duplicated in strain 63-R2, occurring in pHcu190, the equivalent of pNRC100, and in pHcu235, the equivalent of pHS3. The most parsimonious interpretation is that each of the laboratory strains has inherited plasmid precursors with both copies and then deleted one copy upon laboratory cultivation.  Figure 3c). The first unique region is 4.5 kb in pHcu190 and 19.3 kb in pHS1. These strainspecific regions were described previously in the comparison of pNRC100 and pHS1 (Pfeiffer et al., 2008) and are illustrated in Figure 3c and Figure A1. The 19.3 kb sequence in pHS1 is absent from the plasmids of strain NRC-1, but present in another plasmid from strain 63-R2 (pHcu229, see below). The other unique region is 58.4 kb in pHcu190 and covers the long (40 kb) inverted duplication and a 16 kb sequence which also occurs in pHcu235 (Figure 3c). This is replaced by a 1.9 kb region, carrying a copy of ISH2, in pHS1.
Although the 1.9 kb sequence is absent from the plasmids of strain NRC-1, and from the assembled plasmids of strain 63-R2 (pHcu235, pHcu43, pHcu190, and pHcu229), it was detected in the PacBio reads of strain 63-R2 as a 1.3 kb sequence without ISH2 (see below, contigDRAFT1). With respect to the inverted duplication, it may be speculated that it was present when strain 63-R2 was cultivated by Lochhead, was retained in strain NRC-1, and was initially also present in the lineage to strain R1 but was subsequently lost upon laboratory cultivation.
3.5.2 | Detection and assembly of plasmid pHcu229 from strain 63-R2, which is related to plasmid pHS2 from strain R1 and plasmid pNRC200 from strain NRC-1 With the newly assembled pHcu190, three of the four plasmids from strain R1 (pHS1, pHS3, and pHS4) have an equivalent in strain 63-R2.
T A B L E 3 Comparison of the core sequences of plasmids pHcu235 from strain 63-R2 and pHS3 from strain R1. Note: The core sequences of plasmids pHcu235 and pHS3 were compared using BLASTn (see also Figure 3a). Core plasmid sequences are devoid of strainspecific mobile genetic elements (MGEs) (see Table A3 for the coordinates of strain-specific MGEs in the complete plasmid). For further explanations of the table layout, see the legend of Table 2a.
T A B L E 4 Comparison of the core sequences of plasmids pHcu43 from strain 63-R2 and pHS4 from strain R1. Note: The core sequences of plasmids pHcu43 and pHS4 were compared using BLASTn (see also Figure 3b). Core plasmid sequences are devoid of strainspecific mobile genetic elements (MGEs) (see Table A4 for the coordinates of strain-specific MGEs in the complete plasmid). For further explanations of the table layout, see the legend of Table 2a.

T A B L E 5a
Comparison of the core sequences of plasmids pHcu190 from strain 63-R2 and pNRC100 from strain NRC-1. Note: The core sequences of plasmids pHcu190 and pNRC100 were compared using BLASTn (see also Figure 3c). Core plasmid sequences are devoid of strain-specific mobile genetic elements (MGEs) (see Table A5 for the coordinates of strain-specific MGEs in the complete plasmid). For further explanations of the table layout, see the legend of Table 2a.
The PacBio reads from genome sequencing of strain 63-R2 were successfully scrutinized for matches to unique regions from R1 plasmid pHS2. Using a supervised approach within Geneious, pHS2 as a reference, and subassembly walking (see methods and Supplementary Methods), a contig was assembled that is longer than 170 kb. Despite considerable efforts, it was not possible to further extend this contig which runs at both termini into the long inverted duplication known from pHcu190, pNRC100, and pNRC200 ( Figure A1). Extensive attempts to detect reads which indicate additional heterogeneities between this plasmid and pHcu190 were not successful. Thus we assume that this plasmid is identical to pHcu190 in the overlapping region and completed its sequence in silico by inserting the corresponding region from pHcu190. The complete sequence was 229,124 bp and the plasmid was accordingly designated pHcu229.
After the removal of strain-specific MGEs (see Table A6), three HSPs are required and sufficient to describe the relation of plasmid pHcu229 from strain 63-R2 and pHS2 from strain R1 (Table 6a, Figure 3d). Also, three HSPs are required and sufficient to describe the relation of pHcu229 and pNRC200 from strain NRC-1 ( -1889 bp terminal region specific for pHS1; 58,471 bp region specific for pHcu190/pNRC100, which includes a 16 kb sequence and the long (40 kb) inverted duplication

Note:
The core sequences of plasmids pHcu190 and pHS1 were compared using BLASTn (see also Figure 3c). Core plasmid sequences are devoid of strain-specific mobile genetic elements (MGEs) (see Table A5 for the coordinates of strain-specific MGEs in the complete plasmid). For further explanations of the --Present in pHcu229 and pNRC200 but absent from pHS2 Note: The core plasmid sequences of pHcu229 (restricted to its assembled region) and pHS2 were compared using BLASTn (see also Figure 3d). Core plasmid sequences are devoid of strain-specific mobile genetic elements (MGEs) (see Table A6 for the coordinates of strain-specific MGEs in the complete plasmid). For further explanations of the table layout, see the legend of Table 2a.
T A B L E 6b Comparison of the core sequences of plasmids pHcu229 from strain 63-R2 and pNRC200 from strain NRC-1.  -348,885-361,547 No downstream match because pHcu229 has only been partially assembled (in silico extension not considered)

Note:
The core plasmid sequences of pHcu229 (restricted to its assembled region) and pNRC200 were compared using BLASTn (see also Figure 3d). Core plasmid sequences are devoid of strain-specific mobile genetic elements (MGEs) (see Table A6 for the coordinates of strain-specific MGEs in the complete plasmid). For further explanations of the

| DISCUSSION
The origin of the widely used laboratory strains of Hbt. salinarum, R1, and NRC-1 was previously unclear, although genome comparisons had shown their chromosomes and plasmids were extremely similar, confirming that they both came from the same parental strain (Pfeiffer et al., 2008). Our previous genomic comparison of strains R1 and NRC-1 with Lochhead strain 91-R6 T (type strain of the species) excluded strain 91-R6 T as being the parent of the two laboratory strains (Pfeiffer et al., 2020). The current study, analyzing a genome sequence that has recently been published (DasSarma et al., 2022), now establishes that parent as being strain 63-R2, originally isolated from microbially spoiled buffalo hide by Lochhead and deposited in the NRC culture collection as NRC 34001 (Lochhead, 1934). Much of the previous confusion was caused by a combination of factors, including the difficulties in taxonomy before the sequencing era, changes in nomenclature, inadequate strain description in early research publications, and the closure of the Canadian culture collection without archiving strain documents.
The activity of the mobilome is known to dominate strain differences, especially for chromosomes, while nonmobilome-related differences are extremely rare. To focus on nonmobilome differences, all strain-specific MGEs were first removed in a clearly documented procedure that maximized transparency. The core genomes were then compared in detail.
Earlier insights gathered from changes observed in strains of Hqr.
walsbyi (Dyall-Smith et al., 2011) had unraveled two processes leading to gross differences between very closely related strains: deletion-coupled insertion and repeat-mediated deletion. Multiple examples of both processes were encountered in the current study.
A deletion-coupled insertion results in a replacement so that unrelated sequences occur in an identical genomic context. Several differences between the two Lochhead strains can be attributed to deletion-coupled insertion. One case is the 61 kb AT-rich island of strain 63-R2 which was already known from strains NRC-1 and R1 (Ng et al., 2000;Pfeifer & Betlach, 1985) and which is replaced by an The chromosomes of strains 63-R2 and NRC-1 contain the same integrative element (ca 10 kb) which is absent from strain R1. This element has integrated into the pilB2 gene and is associated with an 8 bp terminal duplication as a direct repeat. Its removal in R1 could either have been by precise self-excision of the element but more likely by repeat-mediated deletion, mediated by the 8 bp duplication.
The evolutionary signals conveyed by the plasmids of strains 63-R2, NRC-1, and R1 were also intriguing. While they are extremely well conserved in sequence, the architecture of the plasmids carried by these strains varied greatly. In a previous report (Pfeiffer et al., 2008), this was mistakenly taken as evidence of plasmid misassembly. However, both plasmids are well supported by experimental data, with evidence for the plasmids of strain NRC-1 being even stronger (detailed restriction analyses) (Bobovnikova et al., 1994;Kennedy, 2005;Ng & Kothakota, & DasSarma, 1991;Ng et al., 1993Ng et al., , 1998Ng et al., , 2000Ng et al., , 2008 than that for the plasmids from strain R1 (cosmid end sequencing) (Pfeiffer et al., 2008). plasmid, pHcu190, proved near-identical to plasmid pNRC100 from strain NRC-1 rather than to plasmid pHS1 from strain R1. The other plasmid, pHcu229, which could only be partially assembled due to extremely long perfect duplications, showed hybrid similarities. One part proved to be near-identical to pHS2 from strain R1, while another part proved to be near-identical to plasmid pNRC200 from strain NRC-1. The extreme similarity of the plasmid sequences, despite variations in their architecture, calls for a direct genealogical descent rather than representing independent isolates. The Lochhead strain 63-R2 is a well-documented original isolate and most likely the immediate ancestor of the laboratory strains NRC-1 and R1.
The major architectural difference between the plasmid pair pHcu190/pNRC100 and pHS1 is the presence of a 40 kb inverted duplication in the former and the absence of this in the latter. Being present in two strains, it can be assumed that the plasmid version including the inverted duplication is ancestral. The conversion of that F I G U R E 4 (See caption on next page) presumed ancestral plasmid to pHS1 might well have been caused by deletion-coupled insertion. The inserted sequence is 1.9 kb in length, and the deleted sequence is 58.4 kb in length and covers the inverted copy of the 40 kb deletion plus a 16 kb sequence, which is duplicated only in strain 63-R2 (on pHcu235 and pHcu190) while only one copy is found in strain R1 on the pHcu235-related plasmid pHS3, and only one copy is found in strain NRC-1 on the pHcu190-related plasmid pNRC100.
In a surprising discovery, while attempting to confirm the absence of the pHS1-specific 1.9 kb sequence from strain 63-R2, PacBio reads were found that carry this sequence (lacking an ISH2 element and thus being a 1.3 kb sequence). Expansion of that sequence uncovered remnants of a plasmid that is found integrated into the chromosome and which occurs in two variants. Thus, both architectures, that of plasmids pHcu190/pNRC100 and that of plasmid pHS1 must already have occurred in the common ancestor of strains 63-R2, NRC-1, and R1. Similarly, one variant of the integrated plasmid contains a junction that is specific to pNRC200 and joins sequences from pHS3 and pHS2 (and thus also from pHcu235 and pHcu229). Again, both architectures, that of plasmids pHS3 and pHS2 and that of pNRC200 must already have occurred in the common ancestor, which probably is the original isolate 63-R2.
This suggests the following hypothetical scenario for plasmid evolution from the common ancestor to 63-R2, the parent of the twin laboratory strains NRC-1 and R1 (Figure 4). The precursor of pHS1 and pNRC100 was duplicated in the common ancestor, one copy being converted from the pNRC100 architecture to the pHS1 architecture by deletion-coupled insertion. Also, the precursor of pNRC200, pHcu235, and pHcu229 was duplicated in the common ancestor. Strain 63-R2 has retained both versions. Having already partially eliminated the version with the pHS1 architecture, that version integrated into the chromosome but has then been largely but not yet completely lost from strain 63-R2. Strain NRC-1 received both versions but eliminated that with the pHS1 architecture. Strain R1 also received both versions but subsequently lost the version with the pNRC100 architecture. Corresponding events of duplication, F I G U R E 4 Hypothetical scenario for plasmid evolution from the common ancestor to strain 63-R2 and further to the twin laboratory strains NRC-1 and R1. (a, left) We hypothesize that the ancestor contained four plasmids (pA to pD) which roughly corresponds to the four strain R1 plasmids pHS3 (pA), pHS2 (pB), pHS1 (pC), and pHS4 (pD). Plasmid pB carries an inverted 32 kb repeat (outward-facing red arrows) while plasmid pC contains an extended version (40 kb) of the inverted repeat (additional outward-facing orange arrows). (a, center): For reasons described in the text, we hypothesize that Event A1 consists of two duplications, followed by modifications. This event occurred in the ancestor of strain 63-R2 which is the parent of the laboratory strains NRC-1 and R1. (i) Plasmid pC has been duplicated. One version was retained (pC1) while the other (pC2) suffered a deletion-coupled insertion (marked by a red arrowhead labeled "M" in pC2). The deletion amounts to 58 kb while the inserted sequence is less than 2 kb. The inserted sequence (region M in Supporting Information: Table S3.3 in Supplementary Text S3: https:// doi.org/10.5281/zenodo.7780801, see also Figure 3c) has been previously reported as a sequence that occurs only in strain R1 and not in strain NRC-1 (Pfeiffer et al., 2008). The 58 kb deletion covers a 16 kb sequence (region R in Supplementary  (b, left) The plasmids hypothesized to occur in the immediate ancestor are drawn (see panel a center) but labeled to reflect the plasmids from strain 63-R2, with labels from panel a center being provided in parenthesis. Two alternative and unrelated sequences enclosed in the same sequence context are indicated by colored pentamers. The green pentamer (in plasmids derived from pC) refers to a 4.5 kb sequence, and the blue pentamer (in plasmids derived from pB) refers to a 19.3 kb sequence. (b, right) This panel shows the four episomal plasmids of strain 63-R2 and the two plasmid integrations into the chromosome. Two of the four episomal plasmids have been reported (DasSarma et al., 2022), pHcu235 (derived from pA1) and pHcu43 (derived from pD). Plasmid pHcu235 differs from its precursor by two closely spaced long deletions (44.2, 7.5 kb) (indicated by red triangles). One episomal plasmid has been assembled to completion and is first described in this report (pHcu190, derived from pC1). Another episomal plasmid could only be partially assembled but could be completed in silico, and is first described in this report (pHcu229, derived from pB1). Plasmid pHcu229 differs from its precursor by one long deletion (12.9 kb) (indicated by a red triangle). The alternative and unrelated sequences (colored pentamers) are retained from their precursors. The ancestral plasmid pC2 has been integrated into the chromosome (contigDRAFT1) while the free form of the plasmid has been lost. Also, parts of pC2 were lost upon chromosomal integration. The ancestral concatenated plasmid pAB2 has been integrated into the chromosome (contigDRAFT2) while the free form of the plasmid has been lost. Also, parts of pAB2 were lost upon chromosomal integration. Because plasmid integration occurred at only a single site in the chromosome, parts of pC2 and pAB2 may have been joined and further modified before their chromosomal integration. (c) This panel shows the hypothesized events leading to the four plasmids of strain R1. Plasmid pHS3 corresponds to pA1, plasmid pHS2 to pB1, and plasmid pHS4 to pD. Plasmid pHS1 corresponds to pC2 while the ancestral pC1 (anc_pC1) has been lost. Also, the concatenated ancestral plasmid pAB2 (anc_pAB2) has been lost in this strain. In pHS1, a 4.5 kb sequence (green pentamer) has been replaced by a 19.3 kb sequence from pHS2 (blue pentamer) so that the 4.5 kb sequence has been lost from strain R1. (d) This panel shows the hypothesized events leading to the two plasmids of strain NRC-1. Plasmid pNRC100 corresponds to pC1 while the ancestral pC2 has been lost. Plasmid pNRC200 corresponds to the concatenated plasmid (pAB2) while the ancestral plasmids pA1 (anc_pA1) and pB1 (anc_pB1) have been lost. The ancestral plasmid pD (anc_pD) has also been lost. In pNRC200, a 19.3 kb sequence (blue pentamer) has been replaced by a 4. 5 kb sequence (green pentamer) from pNRC100 so that the 19.3 kb sequence has been lost from strain NRC-1.
PFEIFFER and DYALL-SMITH | 15 of 25 partial loss in strain 63-R2 with chromosomal integration, and complete strain-specific deletion of the plasmid with one of the architectures have also shaped pNRC200 in strain NRC-1 and plasmids pHS2 and pHS3 in strain R1.
Strain R1 was obtained in the Stoeckenius lab, working with "Hbt.
The Lochhead strain obtained from a buffalo hide was assigned the species epithet "cutirubrum," while the epithet "salinarum" that had been coined by Harrison andKennedy in 1922 (Harrison &Kennedy, 1922) was assigned to the strain obtained from a cow hide (Lochhead, 1934). TYGS analysis confirms that all these strains belong to the same species, supporting the conclusion by Ventosa and Oren (1996) that Hbt. salinarum, Hbt. cutirubrum, and Hbt.
halobium are the same.

ACKNOWLEDGMENTS
This article is dedicated to Dieter Oesterhelt (1940Oesterhelt ( -2022. The authors wish to express their gratitude for his generous and longlasting support. This research received no specific grant from any funding agency in the public, commercial, or non-for-profit sectors.

CONFLICT OF INTEREST STATEMENT
None declared. . This ISH8 is absent from pHcu229 due to a 12 kb deletion but is present on a minor plasmid sequence that has been integrated into the chromosome (contigDRAFT2, not shown). Additionally, two MGEs are drawn even though they are internal to a region. These are an ISH3 (highlighted yellow) which occurs exclusively in the reverse copy of the inverted duplication in pHcu229. An adjacent ISH8 is also indicated because it is present in the reverse copy of the inverted duplication of pHcu229 but absent from the forward copy (indicated by a small cross in red). This ISH8 is absent from the single copy of this region in plasmid pHS1 from strain R1 but present in all other copies of all other plasmids from strains 63-R2 and NRC-1.  Note:

DATA AVAILABILITY STATEMENT
Mobile genetic elements (MGEs) of category CA were deleted when comparing these three strains to each other. MGEs of category CB are common to these three strains but are strain-specific when compared to strain 91-R6. In each case, the length of the removed sequence (MGE + TSD) is given, and the position in the affected strain. A dash is given for nonaffected strains. Coordinates refer to the original chromosome sequence. The type of strain-specific MGE is indicated. Nearly all strain-specific MGEs were found to be associated with a TSD, the length of which is specified. If strain-specific MGEs were very closely spaced, their relative positioning is specified in the comment column by their integration positions in the three core genomes. Special cases requiring an extended description are labeled as "special case" in the comment (see hereafter). Special case A: A complete ISH2 and a partial ISH8A element (terminal 311 bp) were integrated as a cassette, which is concluded from the fact that the two elements are directly adjacent and are bounded by one common 10 bp target duplication. It should be noted that ISH2 is a MITE, which does not carry a transposase gene, has termini related to those of ISH8 and is mobilized in trans by the ISH8 transposase. Special case B: This ISH2 is present in the integrative element insert of strain NRC-1 but absent from the corresponding element in strain 63-R2. The integrative element is completely absent from strain R1 (see Figure 2). Special case C: not associated with a TSD; the sequence TC-GT-GT-AT-GT-CT (strains NRC-1 and R1) is replaced by TC-GT-GT-[ISH8E]-GT-CT (strain 63-R2).
T A B L E A2 Strain-specific MGEs which were eliminated upon generation of the core chromosome sequence of strain 91-R6. Note: Mobile genetic elements (MGEs) deleted from the chromosome of strain 91-R6 upon generation of the core sequence. These MGEs are strainspecific when compared to the chromosomes of the three strains 63-R2, NRC-1, and R1. See Table A1 for additional explanations.
T A B L E A3 Strain-specific mobile genetic elements (MGEs) which were eliminated upon generation of the core plasmid sequences pHcu235 and pHS3. T A B L E A4 Strain-specific mobile genetic elements (MGEs) which were eliminated upon generation of the core plasmid sequences pHcu43 and pHS4. T A B L E A5 Strain-specific MGEs which were eliminated upon generation of the core plasmid sequences pHcu190, pNRC100, and pHS1. Note: Several mobile genetic elements (MGEs) are shared between strains 63-R2 and NRC-1 but absent from strain R1. In the forward copy of the long (40 kb) inverted duplication, they are considered strain-specific. Their counterpart in the reverse copy of the inverted duplication is not considered to be strain-specific because the inverted copy is absent from strain R1 and thus only strains 63-R2 and NRC-1 are compared, both containing this MGE. See Table A1 for additional explanations.
T A B L E A6 Strain-specific mobile genetic elements (MGEs) which were eliminated upon generation of the core plasmid sequences pHcu229, pNRC200, and pHS2. Note: For pHcu229, the analysis is restricted to its assembled part and coordinates refer to the assembled version, not to the in silico completed version. See Table A1 for additional explanations.
T A B L E A7 The extent of the AT-rich regions in strains 63-R2, NRC-1, and R1 and the equivalently positioned replacement region in strain 91-R6.  Table 1).