Methanol-based fixation is superior to buffered formalin for next-generation sequencing of DNA from clinical cancer samples

High-quality tumour DNA is essential for any personalised treatment strategy based on NGS. Here we show that methanol fixation is superior to formalin, greater DNA yield, longer fragment size and more accurate copy-number calling using sWGS. We also show provide a new approach to understand fixation artefacts using non-negative matrix factorization.

introduction Although microscopic examination of formalin-fixed paraffinembedded (FFPE) material remains crucial in cancer diagnosis, next-generation sequencing (NGS) of tumour DNA has emerged as a powerful diagnostic tool [1] and is a central component of personalised medicine initiatives. NGS relies heavily on high-quality DNA, and snap-frozen (SF) samples are preferred because formalin fixation induces chemical modifications and degradation of DNA [2,3].
Comprehensive diagnostic strategies and translational research protocols therefore currently demand two samples, one SF for molecular analysis and the other FFPE for routine haematoxylin and eosin staining (H&E) and immunohistochemistry (IHC). Processing of SF samples for NGS has several disadvantages, including reduced ability to microdissect tumour material and significantly increased costs [4,5]. In particular, there are significant barriers to obtaining SF material in largescale clinical trials, where samples are typically collected from multiple hospitals in different countries. Therefore, alternatives to formalin-based fixation are required to circumvent the need for fresh-frozen sampling.
Methanol-based fixation has emerged as a promising such alternative [5][6][7] (supplementary Table S5, available at Annals of Oncology online). Universal molecular fixative (UMFIX) has been shown to be superior for IHC to neutral-buffered formalin (NBF), and gives higher yield and molecular weight of extracted DNA and RNA [5,6,8]. In addition, prolonged exposure to methanol fixatives may have fewer deleterious effects on DNA/ RNA quantity and quality than NBF [3,5]. However, potential NGS sequencing artefacts from methanol fixation have not been studied.
Here, we have tested the suitability of DNA extracted after methanol-based fixation for NGS assays compared with DNA from matched NBF and fresh-frozen tissues. We studied highgrade serous ovarian cancer (HGSOC) samples because they have ubiquitous TP53 mutation and TP53 sequences have been extensively studied for fixation artefacts [9,10]. HGSOC also has marked genomic rearrangement and copy-number abnormalities (CNAs), which allow stringent inspection of the effects of DNA fragment length size on CNA profiling.

patients and methods sample acquisition and processing
Three equal fragments were macrodissected from tumour specimens removed from 16 patients, median age 62, with HGSOC undergoing debulking surgery. In addition, mock biopsies of the tumour were taken from 12 cases with a 16G core biopsy gun. All samples were reviewed by at least two pathologists and fixed in 10% NBF (Genta Medical, York, UK)), UMFIX (Sakura Finetek, Thatcham, UK) or SF (liquid nitrogen). Matched normal tissue controls were processed in parallel. Full clinical details are given in supplementary Table S1, available at Annals of Oncology online. immunohistochemistry 5 µm sections of NBF and UMFIX fixed material were stained for CK7, p53, PAX8, WT1 and CK20 using established clinical protocols in the Department of Pathology, Queen Elizabeth University Hospital, Glasgow, with additional optimization for WT1 staining of UMFIX tissues. Staining and image analysis protocols, as well as all histoscore data, are described in supplementary material, available at Annals of Oncology online.

tagged-amplicon sequencing (TAm-seq)
The coding regions of TP53, PTEN, EGFR, PIK3CA, KRAS and BRAF were sequenced by TAm-Seq as described previously [11] on an Illumina MiSeq using PE-125 bp protocols. Data analysis is described in supplementary material, available at Annals of Oncology online.
shallow whole-genome sequencing (sWGS) WGS libraries were prepared from 100 ng DNA using modified TruSeq Nano DNA LT Sample Prep Kit protocol. Library quality and quantity were assessed with DNA-7500 kit on 2100 Bioanalyzer and with Kapa Library Quantification kit according to the original protocols, respectively. Eighteen barcoded libraries were pooled together in equimolar amounts and each pool was sequenced on HiSeq2500 in SE-50 bp mode. Analysis methods are described in supplementary material, available at Annals of Oncology online.

mutation signature analysis
Non-negative matrix factorisation was carried out to identify mutation signatures [12] in relation to different fixation (supplementary material, available at Annals of Oncology online). All non-reference base changes observed across the sequencing data were interrogated from both TAm-Seq and sWGS data.
results Figure 1 summarises the study design and the flow of samples through the study. Additional REMARK data are provided in supplementary material, available at Annals of Oncology online.
methanol fixation yields higher yield and size of DNA fragments than buffered formalin There was no significant difference in tumour cellularity and TP53 allele fraction between UMFIX and NBF samples, thus allowing a direct comparison of DNA metrics (supplementary Figure S1, available at Annals of Oncology online). Quantification of extracted DNA showed similar yields of small (90 bp) fragments from UMFIX and SF samples, both of which were significantly higher than from NBF ( Figure 2A). As expected, SF samples showed the highest yields of large fragments (129 bp, 305 bp), but yields from UMFIX samples were still significantly higher than NBF ( Figure 2B).

copy-number calling in methanol-fixed material is superior to formalin
Copy-number profiles from sWGS were compared for correlation and variance of copy-number abnormality (CNA) estimation, using SF as gold standard. UMFIX showed superior copy-number profiles compared with NBF, with 9 of 11 biopsies and 10 of 12 surgical samples showing higher correlation with the matched SF ( Figure 2C). UMFIX also had lower noise for segmental copy-number estimation than NBF ( Figure 2D).

single-nucleotide sequencing noise from methanolfixed material is comparable with SF and NBF
We analysed low-level sequence noise using 255 376 observed non-reference bases in the sWGS and TAm-Seq data. All analysed mutations were filtered using dbSNP specifically to exclude germline SNPs. Analysis of the flanking bases  single-nucleotide variant calling from methanolfixed material is comparable with fresh-frozen SNVs were called using TAm-Seq of 66 samples yielding 546 variants. Manual curation of these variants revealed lower average sensitivity and specificity for NBF compared with SF and UMFIX, albeit not significantly ( Figure 3C).

methanol fixation permits high-quality H&E and IHC analyses
Tissue morphology (H&E staining) of UMFIX samples was comparable with NBF fixation. Overall, differences between UMFIX and NBF were not diagnostically significant ( Figure 4A). Statistically significant correlation was found between quantitative IHC histoscores in UMFIX and NBF-fixed samples for key HGSOC markers (p53, CK7, PAX8, WT1). CK20 was uniformly negative in all tumour samples, regardless of fixative (data not shown). There was no significant difference in median histoscore between the two sample sets for p53, CK7 and PAX8 ( Figure 4B).

discussion
The most important variables for NGS assays are DNA quality and yield. Formalin fixation can induce severe effects on the structure and integrity of DNA causing C>T, A>G, G>T, G>C and A>T base changes, methylene bridge formation, DNA denaturation and DNA fragmentation [6,[13][14][15]. After NGS,  these chemical modifications result in greater SNV artefacts, higher sequence duplication rates, smaller insert sizes and lower fractions of mappable reads [16,17]. We evaluated whether methanol-based fixation can reduce these detrimental artefacts when attempting to identify true somatic SNVs and accurate copy-number from clinical material. We show that UMFIX fixation yields longer amplifiable DNA fragments, in agreement with previous reports [3,5,8], which improves our ability to call DNA copy-number accurately. We show that SNV calling from UMFIX DNA has similar performance to DNA from SF tissues and that traditional H&E staining and IHC scoring can be carried out on UMFIX-embedded samples with minimal optimisation.
These findings are clinically highly important: although attempts have been made to reduce noise induced by formalin fixation (e.g. increasing targeted sequencing coverage or reducing C>T transitions with UDG treatment), these methods only mitigate some sources of noise when calling SNVs and do not improve the ability to call CNA [18,19]. CNA detection is more challenging than SNV detection and remains the major clinical need for personalised treatment approaches in HGSOC.
In addition, we have used a state-of-the-art computational approach to perform in-depth exploration of the low-level sequence noise introduced by fixation and sample processing. In an advance over previous approaches, we modelled the trinucleotide context of each base change and de-convolved distinct trinucleotide noise signatures. This computational approach has previously been used to identify signatures in collections of SNVs observed across thousands of tumours, and these signatures used to infer underlying mutational processes [12]. In our data, we identified three distinct trinucleotide signatures. Signature 3 has high similarity to a previously identified CpGage-related cytosine deamination (C>T) signature (signature 1B [12]), and a recently uncovered sequence error signature [20]. However, signatures 1 and 2 are novel and have no similarity to previously described signatures. In particular, they show high rates of C>T transition but not in CpG dinucleotide contexts. As expected, signature 3 contributed only to the set of base changes common to all samples across a patient. In contrast, signatures 1 and 2 contributed only to the base changes exclusive to SF, UMFIX or NBF samples. This suggests that the sequencing noise represented by these two signatures (C>T not at CpG) is induced through sample processing. The two fixative conditions showed a slightly increased contribution to signature 1 compared with SF, suggesting that fixation may have a specific effect. However, larger studies are required to achieve the power to discern this. This approach to modelling sequence noise provides powerful tools to explore sequencing artefacts and an analytical framework to understand the mechanisms behind their creation. Further studies with high coverage WGS are now underway to refine these data.
There are no data on the effects of long-term methanol fixation on DNA quality or quantity, and this study utilised samples collected no more than 6 months before analysis. With FFPE material, it is possible to isolate DNA from long-term archived samples [21], although factors such as duration of fixation, age of the sample, exposure to heat and light, as well as the concentration, buffering and age of the formalin, can all influence DNA quality and extent of sequence artefact [22].
Careful longitudinal analyses will be required to ascertain whether similar problems emerge in UMFIX samples.
We specifically did not examine RNA in this study. There are several previous publications on the utility of RNA extracted from methanol-fixed specimens in PCR and microarray assays, including from samples stored at room temperature for up to 8 weeks [5,23]. However, we are not aware of any study assessing RNA sequencing or RNA profiling of samples extracted from methanol-fixed tissue-again, future studies will be required to confirm whether RNA extracted from methanol can be reliably used in such assays.
In summary, whilst SF samples remain the gold standard for nucleic acid extraction from tumour material at present, there are significant costs associated with such samples in clinical trials and NGS-based personalised medicine studies. A key advantage of methanol fixation is that it allows easy collection and embedding of tumour material with associated economies for pathological verification and microdissection. Based on our findings of superior DNA quality, we recommend that UMFIX be routinely adopted for collection and storage of clinical cancer specimens for large-scale genomic analysis.