Interlaboratory study to validate a STR profiling method for intraspecies identification of mouse cell lines

The Consortium for Mouse Cell Line Authentication was formed to validate Short Tandem Repeat (STR) markers for intraspecies identification of mouse cell lines. The STR profiling method is a multiplex polymerase chain reaction (PCR) assay comprised of primers targeting 19 mouse STR markers and two human STR markers (for interspecies contamination screening). The goals of the Consortium were to perform an interlaboratory study to–(1) validate the mouse STR markers to uniquely identify mouse cell lines (intraspecies identification), (2) to provide a public database of mouse cell lines with the National Institute of Standards and Technology (NIST)-validated mouse STR profiles, and (3) to publish the results of the interlaboratory study. The interlaboratory study was an international effort that consisted of 12 participating laboratories representing institutions from academia, industry, biological resource centers, and government. The study was based on 50 of the most commonly used mouse cell lines obtained from the American Type Culture Collection (ATCC). Of the 50 mouse cell lines, 18 had unique STR profiles that were 100% concordant (match) among all Consortium laboratory members, and the remaining 32 cell lines had discordance that was resolved readily and led to improvement of the assay. The discordance was due to low signal and interpretation issues involving artifacts and genotyping errors. Although the total number of discordant STR profiles was relatively high in this study, the percent of labs agreeing on allele calls among the discordant samples was above 92%. The STR profiles, including electropherogram images, for NIST-validated mouse cell lines will be published on the NCBI BioSample Database (https://www.ncbi.nlm.nih.gov/biosample/). Overall, the interlaboratory study showed that the multiplex PCR method using 18 of the 19 mouse STR markers is capable of discriminating at the intraspecies level between mouse cell lines. Further studies are ongoing to refine the assay including (1) development of an allelic ladder for improving the accuracy of allele calling and (2) integration of stutter filters to identify true stutter.


Introduction
The impact of misidentification and cross-contamination of animal cell lines in the scientific community continues to be a serious problem [1,2]. Although some journals require cell line authentication prior to publication [2], others only recommend it, allowing for non-validated materials and possible misinformation to enter the public domain. The scientific community and funding agencies are encouraging data transparency to improve reproducibility and the quality of research (https://grants.nih.gov/reproducibility/index.htm) [3][4][5][6]. One aspect of reproducibility is validating the biological reagents used in experiments to include cell lines which continue to be the main work horse in biological research. For human cell lines there are validated methods [7], kits, and databases available for intraspecies identity testing. For non-human cell lines there are validated methods for interspecies identification such as DNA barcoding [8] and the species-specific multiplex PCR assay [9] (ASN-0004: Species-Level Identification and Cross-Contamination Screening in Animal Cells by Multiplex PCR-Working group in progress); however, these methods are unable to distinguish between non-human cell lines of the same species.
Mouse models have been an important tool in biomedical research, especially with the advancement of transgenic mouse technologies, such as the CRISPR/Cas9 system [10]. With the increasing number of transgenic systems and potential new cell lines created, there is a need for intraspecies identification of mouse cell lines. Indeed, mouse cell lines are the second most frequently used cell lines in basic research when compared to human cell lines. The Mouse Cell Line Authentication Consortium was formed to validate a short tandem repeat (STR) multiplex PCR assay to distinguish among mouse cell lines. This assay improves upon the patented methods (US Patent 9,556,482 B2) previously described [11], which now includes additional primers that target a total of 19 mouse STR loci and two human STR markers for contamination detection (US patent application publication 2017/0101677 A1). This method is rapid, inexpensive, and uses related methods and instrumentation already in place for human cell line authentication by STR profiling. This validated method is the first step in authenticating a mouse cell line by determining its STR profile identity.
In October of 2016, a Federal Register Notice (FRN) was published inviting eligible laboratories to join the Mouse Cell Line Authentication Consortium (https://www.federalregister. gov; 81 FR 70665). Eleven laboratories responded to the FRN and formally signed a Cooperative Research and Development Agreement (CRADA) with NIST, the 12 th laboratory, to join the Consortium (see Table 1). Test kits were distributed to Consortium laboratory members; each kit contained DNA isolated from 50 mouse cell lines (coded samples), primers, PCR Master Mix, calibrants, and control human DNA. Each Consortium laboratory member was provided a detailed protocol and asked to perform the assay as written. The protocol described how to amplify the STR target regions of the mouse DNA from the 50 cell lines using PCR, separate the amplicons using capillary electrophoresis (CE), size the fragments based on calibrants, and submit the exported data file to NIST. NIST analyzed the data and submitted comments to the Consortium members. Consortium members met on several occasions via webinar and teleconference calls to discuss the interpretation and analysis of the submitted data. This multiplex PCR assay provides a unique STR profile for each mouse cell line tested, including closely related cell lines. Moreover, cell lines from within a mouse strain showed conserved alleles at particular loci that can be used to identify cell lines at the strain level, enabling the detection of misidentified cell lines. Here we describe the results of the study in detail and the path forward.

Initial data analysis
Upon observation of the first set of data submitted to NIST, it was apparent that interpretation of data was inconsistent among the Consortium laboratory members primarily due to PCR artifacts, and to a lesser extent, low signal and genotyping (bin) errors. PCR artifacts, mainly stutter, were a major factor in the discordant allele calls. The electropherograms for each cell line were very similar among Consortium laboratory members, but interpretation of allele calls, most notably peaks in the +4 and -4 stutter positions, varied. Stutter is a PCR artifact that is caused by strand slippage of the polymerase enzyme during amplification resulting in a small peak that occurs either one repeat before or after the true allele at a STR marker [12,13]. The STR loci targeted in this assay are tetranucleotide repeats; therefore, the stutter appears either four base pairs before or after the true allele. Traditionally, stutter filters are used in fragment analysis software to flag a peak in the stutter position as being a potential allele or an artifact. Since stutter filters were not yet established for the mouse cell line STR profiling assay, some Consortium laboratory members relied on default stutter filters validated for human cell lines in their software to call alleles for each sample. Consequently, some alleles were missed due to the lack of pre-established stutter filters.

Data analysis using stutter rules
Since stutter filters for the mouse cell line STR profiling assay were not yet established, the Consortium laboratory members in a consensus pronouncement, determined the stutter filter settings based on the data. The stutter ratios were calculated based on the following rules: (1) any peak in the -4 stutter position with a stutter ratio greater than 20% would be called an allele, and (2) any peak in the +4 stutter position with a stutter ratio above 10% would be called an allele. The Consortium laboratory members were also asked to ensure that each allele was properly labeled since the exported data file only included information for peaks that were assigned an allele call. Using these guidelines, the Consortium achieved full concordance in 18 out of 50 mouse cell lines (see Concordance section). A large part of discordance was due to variations in data interpretation, specifically the lack of stutter ratio calculations, and in some cases low instrument signal. Consortium laboratory members were asked to provide reasons for discordance including errors in stutter ratio calculations, artifacts present, or low signal. These determinations led to corrections in final allele calls. After establishing rules and offering several training webinars, the consensus allele calls among the Consortium laboratory members improved to over 92% (see Concordance section under Results).

Stutter ratio comparison across laboratories
To determine if stutter ratios vary across the different instrument platforms in the interlaboratory study, stutter ratios were analyzed in DNA samples from normal mouse tissue (diploid) for each represented allele in the calibrant samples (C1 though C5, see Materials and methods) from all Consortium laboratory members. As seen in literature, stutter ratios increased with increasing number of uninterrupted repeats (see Fig 1), except in some microvariant alleles where the longest uninterrupted repeat is shorter than what was found in the full repeat [14].
The stutter ratios across alleles were then normalized for repeat length, since this was the most prominent source of variation. That is, each observed stutter ratio was divided by the average of all stutter ratios observed for the same marker and allele combination (see Fig 2).
To see a trend or pattern across the relationships of the instruments, the average normalized stutter ratio was computed for each instrument within each marker. The Friedman test [15], an analysis looking for trends among ranks, was applied and showed that there was a statistically significant difference between the instruments (see Fig 3). Stutter ratios from either the ABI 3130xl or 3500xl instruments were found to be roughly 1% lower on average than stutter ratios from either the ABI 3730 and 3730xl instruments. Although statistically significant, the difference is negligible from a practical perspective, as it is roughly the same size as variability in stutter ratios observed from one CE run to another on different days using the same instrument and running the same sample.

Stutter filter recommendations determined from this study
Stutter filters for the mouse multiplex PCR assay were not available during the interlaboratory study. However, data from the study were used to create stutter filter recommendations for future studies. The stutter ratios calculated are allele-specific (see S1 Table). Currently, most available software only allows the addition of marker-specific stutter filters. For this study, marker-specific stutter was calculated across represented alleles from normal, diploid mouse DNA by taking the mean of the stutter ratios for all alleles at a specific locus and three standard deviations were added (see Table 2). This method has been used previously for marker-specific stutter filters for human STRs [16]. As seen in Table 2, there are two loci (18-3 and 13-1) that have stutter filter recommendations above the 20% stutter threshold that was used to make   allele calls in this study. Only DNA from normal, diploid samples was used in calculating stutter ratios. Therefore, some alleles that are present in cancer cell lines may not be represented in the marker-specific stutter recommendations. Application of this recommendation led to consistent allele calls. The allele-specific stutter filter would be useful in determining allele calls for these markers, especially at the lower repeat number (allele) that have a much smaller associated percent stutter. In the future, stutter filters will be incorporated into the data analysis software program and will be used to flag alleles that need to be examined more closely. If the sample is composed of a mixture of cell lines, then stutter filters cannot be applied to the data set, since multiple contributors may have varying peak heights in the profile that can invalidate the stutter filters. It is important to note that not all alleles were represented in the stutter filter calculations.

Signal intensity
Signal intensity was appropriate to allow for genotyping of the cell lines. The signals were above threshold limits and accurate allele calls were made for a nominal starting amount of 2 ng of DNA. Accurate quantitation to determine DNA concentration is important for optimal signal intensity (see Methods section). In few cases, low signal was observed in the red dye channel, the lowest fluorescence intensity of all the dyes, for markers X-1 and 13-1. This may be due to a lower DNA amount resulting from loss of DNA to the walls of the tube or pipetting error. The low signal can be remedied by increasing the amount of DNA in the PCR reaction mix, changing the injection time/voltage during CE, or increasing the number of cycles during PCR.

Genotyping (Bin)
Genotyping is the determination of allele calls among STR markers. In order to make accurate allele calls or genotyping, bins were created. Bins are place holders in the software for converting imprecise DNA fragments into alleles that have been determined based on sequenced calibrants (refer to S2 and S3 Files for protocol information regarding establishing bins in the different software platforms). Calibrants, which are sequenced diploid DNA extracts from known mouse strains, were used to adjust bins, optimize the assay on the different instrument platforms (see Table 3), and to aid in accurate allele calling (reference S2 and S3 Files). Optimized protocols for capillary electrophoresis were determined by each Consortium laboratory member and the parameters are described in Table 4. The calibrants covered 111 alleles (67%) of the 165 alleles observed in the 50 mouse cell lines. There was sufficient allele coverage by the calibrants for accurate allele calls of 49 of the 50 cell lines tested; consequently, identifying alleles in the test samples using extrapolation of alleles in the calibrants was successful. However, there was one exception for cell line C1498, when extrapolation of calibrants was unsuccessful for STR marker 18-3. Allele 28, at STR marker 18-3, fell outside the range of the calibrant at allele 22. Sanger sequencing at marker 18-3 confirmed the allele had 28 repeats. Increased coverage of alleles would improve allele calls with the application of an allelic ladder. These calibrants used for genotyping are not a renewable source. NIST and ATCC are currently developing an allelic ladder for the mouse STR markers.

Human cell line control sample
The human diploid palatal mesenchymal cell line, HEPM, was used as a control to detect contamination of human cells by confirming the presence of primers for two human STR markers, D8S1106 and D4S2408. The STR markers were incorporated in the multiplex PCR assay to test for human contamination of mouse cell lines. The electropherogram for the human control DNA obtained from HEPM is shown in S1 Fig. All human control samples resulted in amplified products exclusively sized in the human STR regions. In this study, no human cell line contamination was observed in the fifty mouse cell lines tested. Previous studies described sensitivity of the human STR markers [11] in an assay targeting nine mouse STR loci with levels  of human contamination detected as low as 62 pg of DNA. Similar results were found when DNA from HEPM cells were tested with the additional mouse STR markers used in this interlaboratory study.

Removal of STR marker 11-1
STR marker 11-1 was one of the original 19 markers analyzed by the Consortium and was ultimately removed. The removal of this marker did not change the informativeness of the remaining 18 STR markers to discriminate among the 50 mouse cell lines (see S2 Table). The marker for 11-1 was removed due to an abnormal peak morphology which appears as a "shoulder" either +1 or -1 bp from the true allele (see Fig 4), leading to discordant results. This phenomenon, most likely incomplete adenylation, was seen in STR genotyping from both diploid DNA samples and cell lines, and could not be resolved with a modification of the PCR that included an increased 4 h soak time at 60˚C at the end of the thermal cycling program or with added Pigtail sequences to the 5' end of the reverse primer [17]. The "shoulder" appears most prominently in full repeat alleles, such as allele 19, and is less of a problem for microvariants, such as allele 20.3 (Fig 4).

Concordant results
Review of the raw data (electropherogram) for each Consortium laboratory member showed concordance for the 50 cell lines. However, using 19 STR markers data misinterpretation resulted in discordance for 32 of the 50 cell lines. The reasons for discordance include high or low DNA amount, stutter calculation errors, missed allele, bin error, and artifacts. All of these interpretation errors can be resolved with training and experience. A sample was labeled discordant if one or more alleles was miscalled, for example, the NIH/3T3 cell line resulted in 311 out of 312 correct allele calls (99.6% consensus allele calls among the Consortium laboratory members) (see Table 5). In this example, one allele (20.3) was missed at STR marker 4-2, resulting in a discordant cell line.
With the removal of STR marker 11-1, the Consortium laboratory members had concordant STR profiles for 21 of the 50 cell lines tested (see Table 6). Allele calls for these 21 cell lines were in full concordance among the Consortium laboratory members at 18 STR loci.
Although the remaining 29 cell lines were discordant, the percent of accurately calling alleles among the Consortium laboratory members were above 92% for all cell lines. Each discordant cell line is summarized with an average percent allele agreement, the corresponding number of STR markers that were discordant, and the reasons for discordance (Table 7).

Cell lines that require further characterization
Eight cell lines, HT-2 clone A5E, Sarcoma 180, McCoy B, BALB/3T3 clone A31, J558, E. G7-OVA, EL4, and C1498 were removed for misidentification, cross-contamination, or lack of consensus of allele calls among Consortium laboratory members. These eight cell lines will not be included in the NCBI BioSample database entry. A spleen cell line, HT-2 clone A5E, reported in the literature as being derived from a BALB/c mouse strain [18] exhibited greater similarity with a C57BL strain by sharing over 50% alleles. This cell line also has three alleles or more at seven loci, high peak height imbalance, and is discordant at eight STR markers. The HT-2 clone A5E cell line needs more characterization to determine if it is contaminated or misidentified due to the inconsistencies with the strain of origin and the increase of alleles per marker. Sarcoma 180 is another cell line that has high peak height imbalance and greater than three alleles at 17 loci, making it difficult to apply stutter filter guidelines. Additionally, two cell lines J558 and BALB/3T3 clone A31, purported to be unrelated, have 95% allele matching based on pairwise comparison using Master's algorithm (see S3 Table). These three cell lines requires further characterization to determine authenticity. There is limited information in the literature regarding a clear origin for the McCoy B cell line, which was initially described as originating from the parental cell line McCoy A, derived from synovial fluid in the knee from a human patient [19]. McCoy B cell line is of mouse origin, not human, and is a clear indication of misidentification [20]. The additional cell lines, E.G7-OVA, EL4, and C1498 have less than 97% consensus agreement in allele calls (see Table 7), to include discordant calls at greater than four loci, and greater than three alleles present at a specific locus. These disagreements call for further characterization of these cell lines.

Pairwise comparison
A pairwise comparison using Master's algorithm (percent match = number shared alleles/total number of alleles in the questioned profile) [21] was calculated on the mouse cell lines and control diploid mouse strains from commercial sources (Jackson Laboratories and Zyagen, Inc.) (see S3 Table). When a cell line was compared against itself it showed 100% matching. However, cell lines compared to its derivatives demonstrated a percent matching greater than 85% (see Table 8). Moreover, cell lines derived from the same mouse strain, in some cases, showed high levels of relatedness (over 80%). Based on the matching algorithm data, there are several cell lines that need further studies due to high level of relatedness.

Discussion
The mouse multiplex PCR assay for intraspecies mouse cell line identity testing was validated by the Mouse Cell Line Authentication Consortium. Overall, each Consortium laboratory member, using various instrumentation, polymers and arrays, generated STR profiles for the 50 mouse cell lines. Review of the electropherograms from the Consortium laboratory members showed concordance for the 50 cell lines. Cell lines that were known to be related showed greater than 85% matching alleles using Master's algorithm. Several reported related and unrelated cell lines showed greater than 80% allele matching and require further investigation. Data misinterpretation resulted in discordance based on 18 of the 19 STR markers. All of these discordant allele calls can be resolved with training and experience. The Consortium started out with 19 STR markers to validate mouse cell lines and test for human contamination. It is recommended by the Consortium that marker 11-1 be removed from the multiplex PCR assay due to the abnormal peak morphology in the amplified products. The removal of this marker will not affect the overall informativeness of the remaining 18 mouse STR markers and their ability to discriminate between the fifty mouse cell lines tested.
Future plans include collaboration between NIST and ATCC to develop an allelic ladder to increase allele coverage and aid in more accurate allele calls. A written consensus standard for mouse cell line authentication would be useful to the community to establish rules and guidelines. Points to be considered for a future consensus standard include: allelic ladder, stutter filter recommendations, DNA quantitation methods, and troubleshooting. The NIST-validated STR profiles from this study have be added to the NCBI BioSample database (https://www. ncbi.nlm.nih.gov/biosample/?term=mouse+cell+line+STR+profile%5Battribute+name%5D) with BioProject accession number PRJNA539973. The cell lines listed in the database, having greater than 98% allele consensus among the Consortium laboratory members, have background information included, allele calls for 18 mouse STR markers, and their associated electropherograms.
Finally, it is important to note that concordance in STR profile among the Consortium determines the accuracy to perform and interpret the STR results and does not demonstrate complete authenticity of a particular cell line. However, there is enough confidence that a concordance value greater than 98% provides an accurate STR profile for the identity of mouse cell lines. The STR profiles published on the NCBI BioSample database are validated by the Consortium based on the cell lines provided by ATCC. These STR profiles are the first step in authentication of mouse cell lines.

Materials and methods
Several activities were performed by NIST in preparation of the interlaboratory study. Some of these activities included: 1. Preparation of DNA from 50 mouse cell lines and one human cell line

Preparation of cell line DNA samples
Fifty-one cell lines (50 mouse, one human) of seed stock material were provided by ATCC (see Table 9) to NIST for the interlaboratory study for mouse STR genotyping. Vials of frozen cell lines were stored in a -150˚C freezer upon receipt. DNA purity. DNA purity was determined by absorbance using the Synergy MX and the Take3 micro-volume plate (BioTek, Winooski, VT, USA) with an optical pathlength of 0.5 mm. A260/A280 ratios were between 1.83 to 1.92 (values greater than or equal to 1.8 are ideal for DNA purity).
Fifty-one cell line DNA samples were blinded and provided a code (mouse cell lines were coded M01 through M50 and the human cell line was coded H1). 20 μL of each DNA sample, concentrations ranging from 2 ng/μL to 2.7 ng/μL, were aliquoted into labeled 0.2 mL 8-strip tubes. Each of the Consortium laboratory members received duplicate DNA samples.

Diploid mouse DNA samples used to calculate stutter ratios
Genomic DNA samples from 48 mouse strains obtained from Jackson Laboratories (Bar Harbor, ME, USA), two mouse strains obtained from Zyagen (San Diego, CA), and 15 wild-caught mice (courtesy of Dr. Michael Nachman, University of Arizona, USA) were used in calculating stutter ratios for normal diploid mouse samples to create stutter filter recommendations.

Preparation of calibrants from diploid mouse DNA
Genomic DNA samples from 12 mouse strains obtained from Jackson Laboratories were used to create five calibrant DNA samples (C1 through C5). The DNA calibrant samples were prepared from the following 12 strains of mice: C58/J, CE/J, BTBR T<+> tf/J, NZW/LacJ, NOD/ ShiLtJ, WSB/EiJ, A/J, KK/HlJ, MA/MyJ, C3H/HeJ, I/LnJ, and NZB/BlNJ. The calibrants were used to assign alleles. Sanger sequencing was performed on each strain at all 19 STR loci to correlate the number of repeating units with fragment length. These highly characterized DNA samples were diluted to approximately 2 ng/μL in TE buffer (10 mM Tris-HCl, pH 8.0, 0.1 mM EDTA) (Ambion) and quantified using Qubit dsDNA HS Assay Kit (Invitrogen) and  Table 10 and an example of the associated electropherogram for Calibrant 1 can be found in S2 Fig.

Preparation of primers to target mouse STR markers
Primers for 19 mouse STR markers were developed for intraspecies identification of mouse cell lines. The multiplex PCR primers target the following 19 mouse loci: 18-3, 4-2, 11-1, 6-7, 19-2, 1-2, 7-1, 1-1, 3-2, 8-1, 2-1, 15-3, 6-4, 11-2, 17-2, 12-1, 5-5, X-1 and 13-1. The repeat motif for each STR marker was determined by sequencing to correlate fragment length with repeat number. In addition to the mouse STR markers, two well characterized human STR markers D8S1106 and D4S2408, previously described to identify human and African green monkey DNA [22,23], are also included in the kit to screen for contamination of mouse cell lines with human cells (Table 11). Table 11 contains the primer sequences that make up the multiplex PCR assay. Forward primers were labeled with one of the following fluorescent dyes at the 5' end: 6FAM (blue), VIC (green), NED (yellow), or PET (red) (Applied Biosystems). In some cases, an additional guanine base (G) or a ''PIGtail" sequence (GTTTCTT) was added to the 5' end of the unlabeled reverse primer to promote complete adenylation and are underlined in the reverse primer [17]. The final primer concentrations for the forward and reverse primers in a 20 μL reaction volume were optimized based on DNA concentration and the number of cycles in the PCR program.

Characterization of calibrants
PCR amplification for sequencing. Sequencing reactions for calibrant samples (C1 through C5) were prepared using the following 20 μL reaction mixture: 4-5 ng of template DNA, 1X GeneAmp PCR Gold buffer (Thermo Fisher), 1.5 mM MgCl 2 (Thermo Fisher), 1 U AmpliTaq Gold Polymerase (Thermo Fisher), 0.15 μM unlabeled forward and reverse primers (see S6 Table), 0.16 mg/mL non-acetylated BSA (Invitrogen), 200 μM dNTPs (Invitrogen), and PCR grade water. The SeqAmp program was run on the Veriti (Applied Biosystems) with the following parameters: denaturation for 10 min at 95˚C, amplification for 35 cycles of 1 min at 94˚C, 1 min at 55-60˚C (annealing temperature specific to individual primers), and 1 min at 72˚C, extension for 45 min at 60˚C, followed by a hold at 25˚C.

Sequencing STR markers in calibrants.
To determine the number of repeat units for each fragment length (allele) for the 19 mouse STR markers, each calibrant was subjected to Sanger sequencing. Each calibrant (C1 through C5) DNA sample was amplified with primers targeting all 19 mouse STR markers in single-plex. Amplicons were treated with 2 μL of Exo-Sap-IT (Applied Biosystems) per 5 μL of PCR product to remove unincorporated primers and deoxyribonucleotide triphosphates (dNTPs) by incubating samples for 90 min at 37˚C followed by 20 min at 80˚C to inactivate the enzymes. Samples were then sent to Eurofins MWG Operon (Louisville, KY, USA) for Sanger sequencing. Resulting sequence profiles were received after data analysis was performed by Eurofins MWG Operon. Lasergene 15.2 (DNA Star, Madison, Wisconsin, USA) alignment software, SeqMan Pro, was used to determine repeats for each allele at every STR marker.
PCR amplification for genotyping calibrants. PCR amplification of calibrant samples (C1 through C5) and optimization of CE parameters was completed by the Consortium laboratory members prior to running the full panel of mouse cell line DNA (M01 through M50). Detailed methods are described in S1 File but will be explained briefly here. Each PCR reaction contained 12.5 μL of Qiagen Type-It Microsatellite PCR Mix (MM), 2.5 μL of primers (P21), 9 μL of PCR grade water, and 1 μL of DNA template (approximately 2ng/μL). To improve signal in samples, many labs had to increase the input DNA volume to 2 μL in the reaction. PCR reactions were placed on a thermal cycler with the following parameters: denaturation at 95˚C for 5 min, 29 cycles of amplification step (95˚C for 30 sec, 59˚C for 90 sec, 72˚C for 30 sec), soak at 60˚C for 90 min, and a hold at 25˚C. After optimization of the assay was completed using calibrant DNA, PCR was performed on the full panel of mouse cell line DNA. PCR Master Mix for the 50 mouse DNA samples (M01 through M50), five Calibrants (C1 through C5), one human DNA control (H1), and one negative control (water) was prepared using the same conditions as described for the calibrants.
Separation and analysis of PCR products. PCR amplicons were prepared for capillary electrophoresis by adding 9.5 μL of Hi-Di Formamide (Applied Biosystems) and 0.5 μL of GeneScan 600 v2 LIZ internal size standard (Applied Biosystems) to 1 μL of PCR product to each well in a 96-well plate. The 96-well plate was briefly centrifuged at 2000 x g for one minute. The plate was loaded onto the Applied Biosystems 3500xl Genetic Analyzer (or equivalent) (use a STR cassette base if needed). A five-dye DS-33 Matrix Standard Kit (Applied Biosystems) was previously established under the G5 filter with the following dyes: 6FAM, VIC, NED, PET, and LIZ (as per manufacturer's instruction). Conditions for running an Applied Biosystems 3500xl 36cm array using POP-4 polymer was provided to the Consortium laboratory members and specific adjustments were made individually based on the instrument, array, and polymer being used. The conditions for capillary electrophoresis (CE) using a 3500xl, 36cm array, POP-4 were as follows: oven temp (60˚C), run time (1500 sec), run voltage (15kV), pre-run time (180 sec), pre-run voltage (15 kV), injection time (15 sec), injection voltage (1.2 kV), and data delay (1 sec). Consortium laboratory members were provided a specific set of files for their version of software (which may include bins and panels, analysis method, plot settings, size settings, and table settings).

Preparation of mouse STR validation kit and protocol distribution
The kits were prepared at NIST. Each kit provided to a Consortium laboratory member contained the following: • DNA from 50 mouse cell lines (two tubes of each mouse cell line DNA; each tube containing 20 μL of 2 ng/μL of DNA

Instructions to consortium laboratory members
The following instructions were given to the Consortium laboratory members. Upon receipt, kit should be stored at -20˚C. Once kit components were thawed for use, subsequent storage of kit components were as follows: Qiagen Type-It Microsatellite PCR Master Mix (MM) should be stored at -20˚C and all DNA samples (M01-M50), Calibrants (C1-C5), and Human Control (H1), and should be stored at 4˚C. Mouse Multiplex Primers (P21) are light sensitive and should be stored in the dark at 4˚C.
In addition to receiving the test kit provided by NIST/ATCC, each Consortium laboratory member was responsible for purchasing the internal size standard GeneScan LIZ 600 v2 (Applied Biosystems) and DS-33 G5 dye matrix (Applied Biosystems) to allow for proper sizing of PCR fragments and to calibrate the CE instrument for the fluorescent dyes used in the assay. The following files were distributed to each consortium member describing PCR sample preparation, thermal cycling conditions, CE conditions, software analysis and settings, and interpretation of data (see supplemental materials): • Protocol 1: STR profiling of mouse cell lines (S1 File)  Table) • Extrapolation of alleles for bin set (S5 Table) • Electropherograms for each of the five calibrants (S2-S6 Figs) • Bins and panels, analysis methods, table settings, plot settings, size settings (S4 File) Using the protocols, tables and bins/panels provided, each lab was asked to adjust the bins and panels in their respective software platform using the calibrant samples prior to running the 50 mouse DNA samples. Once the instruments and protocols were optimized for the calibrant samples, all samples were genotyped and analyzed. A successful run was determined by the presence of all alleles in the calibrant samples detected above threshold levels, amplification of the human control (H1) at the human STR markers, and no amplification of alleles in the negative control.

Stutter filter calculations
Stutter ratios were calculated by dividing the height of the stutter peak by the height of the allelic peak [24]. Only peak heights above 100 RFUs were used in stutter ratio calculations. The general stutter guidelines were borrowed from human cell line authentication [7] and are as follows: 1) A peak in the -4 stutter position with a stutter ratio greater than 20% would be called an allele, and 2) A peak in the +4 stutter position with a stutter ratio above 10% would be called an allele.
To assess whether stutter ratios were affected by measurement platform, a Friedman test [15] was conducted using average stutter ratio (normalized to account for variation due to allele length) from each marker for each instrument as input. For each marker, the four considered instruments are ranked in order of average stutter ratio. A Friedman test looks for patterns in these rankings to assess whether any instrument tends to produce higher or lower stutter ratios than the other instruments. A Friedman test does not make any assumptions about the specific distributional form of the stutter ratios, and accounts for differences in average stutter ratios across markers.
Stutter ratios were also calculated for each allele present in the collection of genomic DNA from Jackson Laboratories, Zyagen, and wild-caught mouse samples from Michael Nachman's lab at the University of Arizona. Allele-specific and marker-specific stutter ratios were determined for all alleles present in the samples tested. Allele-specific stutter filters were computed as the average of stutter ratios reported for the allele across samples plus three standard deviations. Marker-specific stutter filters were computed as the average of all reported stutter ratios across alleles within a marker plus three standard deviations [16]. Incorporating stutter filters in fragment analysis software will aid in future allele calling and flag samples with peak height imbalances that may need further review.

Concordance
A cell line was considered concordant if all Consortium laboratory members obtained the same allele calls at all 19 loci. For each allele call within each cell line, an agreement proportion was computed as the fraction of the other 11 Consortium laboratory members that also called that allele. For example, suppose for a given marker in a given cell line that 12 Consortium laboratory members called allele 17 and 11 Consortium laboratory members called allele 19 and no other allele calls were made for this marker. The allele agreement for each of the allele 17 calls would be 11/11 = 1, and the allele agreement for each of the allele 19 calls would be 10/ 11 = 0.909. Allele agreement for each marker was determined by calculating the average agreement proportion across all allele calls for that marker. The agreement score for the marker in the example above would be 12�ð 11 = 11 Þþ11�ð 10 = 11 Þ

23
¼ 0:957, where 23 appears in the denominator because that is the total number of allele calls made for this marker across all groups. Allele agreement was also assessed at the cell line level by computing the average agreement proportion across all alleles and markers. As the term, "agreement proportion" implies, this measure of correspondence conveys the average proportion of the other labs that would agree with a given allele call.  Table. Allele-specific stutter ratios. Average allele-specific stutter ratios were determined using normal diploid mouse strain DNA samples. The upper stutter limit was calculated as the mean plus 3 standard deviations, where the standard deviation corresponds to the collection of stutter ratios observed for that specific allele in that specific marker. (XLSX) S2  imply that the materials or equipment identified are necessarily the best available for the purpose.