Exome sequencing identiﬁes a recurrent variant in SERPINA3 associating with hereditary susceptibility to breast cancer

Background: Breast cancer is strongly inﬂuenced by hereditary risk factors. Yet, the known susceptibility genes and genomic loci explain only about half of the familial component of the disease. To identify novel breast cancer predisposing gene defects, here we have performed massive parallel sequencing for Northern Finnish breast cancer cases. Methods: Ninety-eight breast cancer cases with indication of hereditary disease susceptibility were exome sequenced. Data ﬁltering strategy focused on predictably deleterious rare variants that were still enriched in the sequenced cohort. Findings were conﬁrmed with additional, geographically matched breast cancer cohorts. Results: A recurrent heterozygous splice acceptor variant, c.918-1G > C, in SERPINA3 , was identiﬁed, and it was signiﬁcantly enriched both in the hereditary (6/201, 3.0%, p Z 0.006, OR 5.1, 95% CI 1.7 e 14.8) and unselected breast cancer cohort (26/1569, 1.7%, p Z 0.009, OR 2.8, 95% CI 1.3 Conclusion: These ﬁndings demonstrate that c.918-1G > C germline variant in SERPINA3 gene, encoding a member of the serine protease inhibitor class, is a novel breast cancer predisposing allele. ª 2020 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
Since the identification of the major breast cancer susceptibility genes, BRCA1 and BRCA2, extensive efforts have been taken to find additional inherited risk factors [1]. This has led into discovery of several breast cancere associated genes and genomic loci with variable levels of disease risk [2]. The majority of the moderate-to-high-risk breast cancer susceptibility genes, including BRCA1 and BRCA2, but also others such as PALB2, CHEK2 and ATM, encode essential DNA damage response (DDR) proteins. Even in the era of massive parallel sequencing, the analysis has often been limited to DDR pathway and has resulted in the identification of rare breast cancer predisposing alleles e.g. in RECQL, FANCM and ERCC3 genes [3e6]. The moderate-to-high-risk susceptibility genes are all characterized by rare, mostly loss-offunction pathogenic variants conferring breast cancer predisposition. Despite these findings, so far identified genetic susceptibility factors explain only about half of the familial component of breast cancer [7], making the identification of additional inherited risk factors and understanding their contribution to disease onset imperative. For this purpose, here we have performed exome sequencing for 98 Northern Finnish breast cancer patients with indication of hereditary disease susceptibility. The founder populations provide advantage for the rare variant approach, as they harbor founder variants of higher prevalence than outbred populations. The contribution of a gene to the disease is easier to prove, if several families with the same predisposing variant can be identified. This has been shown to be the case for instance for PALB2 [8] and MCPH1 [9] founder variants identified in Finnish population.
Using a filtering strategy not limited to any predefined functional pathway, we identified a recurrent splice acceptor variant in SERPINA3 gene, encoding a member of the serine protease inhibitor class, significantly enriched in the analyzed patient cohorts. Based on the results, we propose a novel link between SER-PINA3 and inherited breast cancer predisposition.

Discovery cohort in exome sequencing
Patient cohort selected for exome sequencing consisted of 98 index cases affected with breast cancer from Northern Finnish families negative for BRCA1, BRCA2 and PALB2 gene pathogenic founder variants [10,11]. The following selection criteria, indicating an inherited predisposition to the disease, were used: 1) index cases from families with three or more breast and/or ovarian cancer cases in first-or second-degree relatives (n Z 83), 2) index cases from families with two cases of breast, or breast and ovarian cancer in first-or second-degree relatives, of which at least one with early disease onset (<35 years), bilateral disease or multiple primary tumors (n Z 7) and 3) breast cancer cases diagnosed at or below the age of 40 (n Z 8 Variants identified in exome sequencing of 98 index cases were filtered using the following criteria: 1) inclusion of variants with predicted harmful effect on protein: protein truncations (non-sense, frameshift and splice site variants), in-frame insertions/deletions and amino acid changes predicted to be deleterious using two different algorithms (PolyPhen and SIFT), 2) inclusion of variants absent from or with minor allele frequency <0.01 in dbSNP, Ensembl, ExAC and SISu databases, 3) exclusion of known non-pathogenic/pathogenic variants and 4) inclusion of variants that were observed at least in three individuals in the discovery cohort.

Variant genotyping
Variants passing the filtering criteria were genotyped using Agena Bioscience MassARRAY System (Sequenom Inc., FIMM) and High-Resolution Melt (HRM) analysis (CFX96, Bio-Rad) with Type-It HRM reagents (Qiagen). Sanger sequencing (ABI3500xL Genetic Analyzer, Applied Biosystems) was used for confirmation of the variants.

Caseecontrol cohorts
The frequency of variants passing the filtering was evaluated in geographically matched Northern Finnish unselected breast cancer case cohort. This consisted of 1569 consecutive breast cancer cases unselected for the family history of cancer and age at disease onset, diagnosed at the Oulu University Hospital during the years 2000e2016. Clinical parameters for unselected breast cancer cases were obtained from pathology reports and included KI-67 status, tumor grade, TNM (tumor, nodes, metastasis) classification, tumor morphology, estrogen (ER), progesterone (PR) and HER2 receptor status, and tumor subtype.
An additional cohort of 103 breast cancer cases with indication of inherited predisposition was used for genotyping of SERPINA3 c.918-1G>C. This consisted of index cases from BRCA1, BRCA2, PALB2 and MCPH1 pathogenic founder variant negative [9e11] breast cancer families with 1) with three or more breast and/or ovarian cancer cases in first-or second-degree relatives (n Z 42), 2) two cases of breast, or breast and ovarian cancer in first-or second-degree relatives, of which at least one with early disease onset (<35 years), bilateral disease or multiple tumors (n Z 22) and 3) two cases of breast cancer in first-or second-degree relatives (n Z 39). Finrisk

Statistical analyses
c 2 test or Fisher's exact test was used to compare the allele frequencies between cases and controls, and also for the comparison of the tumor characteristics between the SERPINA3 c.918-1G>C carrier and non-carrier patients. All p-values were two-sided. Benjami-nieHochberg method was used to control the false discovery rate (FDR) for multiple comparisons for the tested germline variants [12]. After BenjaminieHochberg procedure (FDR Z 0.05), p-values below 0.01 were considered statistically significant. The 5-year breast cancerespecific survival (BCSS) between the SER-PINA3 c.918-1G>C carriers (n Z 26) and non-carriers (n Z 1417) from the unselected breast cancer cohort was compared by univariate KaplaneMeier analysis and Cox regression. The time from date of diagnosis to the last follow-up or death was calculated as survival time. All statistical analyses were performed using IBM SPSS Statistics 26.0 for Windows (IBM Corp.).

Loss of heterozygosity analysis
Genomic DNA was extracted from 14 FFPE tumors of SERPINA3 c.918-1G>C carriers using GeneRead DNA FFPE Kit (Qiagen). Loss of heterozygosity (LOH) was evaluated by sequencing of a 147 bp amplicon flanking the variant site. Peak height values from sequence chromatograms were compared between tumor and corresponding normal DNA samples to assess the allelic ratios. Allelic imbalance values >1.67 or <0.60 were considered as indicators of LOH.

Results
In total, 36 variants passed the filters (2 non-sense, 1 frameshift, 5 splice site, 1 in-frame insertion, 1 in-frame deletion and 26 predicted deleterious missense variants) and were analyzed further in additional geographically matched cohorts (Table S1). Of these, splice acceptor variant in SERPINA3 (serpin peptidase inhibitor, clade A member 3, NM_001085.4:c.918-1G>C, rs199710314) was found significantly enriched in the unselected breast cancer cohort used for validation and thus selected for more detailed investigation.
SERPINA3 c.918-1G>C was present in 4/98 of the exome-sequenced index cases (4.1%). All four heterozygous carriers were negative for any other previously reported breast cancereassociated variants. SERPINA3 encodes for a 423 amino acid protein SERPINA3, also known as a1-antichymotrypsin (a1-ACT), that acts as a plasma protease inhibitor [15]. Unlike other serpins, SERPINA3 has the ability to bind to DNA (Fig. 1a), although the functional significance of DNA binding is unclear [15]. By binding targeted proteases to the reactive center loop (RCL) (Fig. 1a), SERPINA3 proteolytically inhibits the activity of several serine proteases including chymotrypsin, cathepsin G and mast cell chymases. In silico tools predicted that the SERPINA3 c.918-1G>C variant abolishes the canonical splice acceptor and activates a new acceptor site right next to the original splice site. This results in deletion of two nucleotides and frameshift, thereby creating a premature stop at the codon position 309 (Fig. 1b) and eliminating the RCL domain of the protein.
In total, the frequency of SERPINA3 c.918-1G>C was evaluated in 1770 breast cancer cases: 201 cases with suspected inherited susceptibility for the disease (hereafter referred as hereditary cohort) and 1569 cases unselected for family history of cancer ( All available information and additional DNA samples from the identified SERPINA3 c.918-1G>C carrier families were used to study the potential segregation of the variant with cancer phenotype (Table S2). One-third (9/26) of the unselected cases had at least one breast cancer case among their first-or second-degree relatives, providing further support for breast cancer association. Besides initially studied index cases, four samples from relatives affected with breast cancer were available for variant testing. Of these, three were positive for SER-PINA3 c.918-1G. The relatives of SERPINA3 c.918-1G>C carriers were also reported to have several other types of malignancies, the most common being stomach cancer occurring in 22% of the families (7/32), and head and neck cancers (5/32, 16%).
The comparison of the tumor characteristics (Table  S3) between SERPINA3 c.918-1G>C carriers and non-carriers from the unselected cohort showed a significant enrichment of medullary breast cancer, a rare tumor subtype, among the carriers (4/26, 15.4%, p Z 0.000014, OR 42.9, 95% CI 11.7e157.1). Although based on small sample sizes, this rare subtype enrichment supports the contribution of this germline variant to the tumorigenesis in the carriers. No other associations with the tumor characteristics or 5-year BCSS were detected (Fig. S1). There was no difference in the average age at disease onset between the carriers (mean Z 58 years, variation 36e87 years) and noncarriers (mean Z 58 years, variation 28e93 years) in the unselected cohort. The LOH analysis of the SER-PINA3 locus demonstrated that the wild-type allele was retained in the breast tumors (Fig. S2), a feature that is typical for moderate risk breast cancer alleles [2].

Discussion
Current study provides strong genetic evidence for the association of SERPINA3 c.918-1G>C with inherited breast cancer predisposition in the Finnish population.
In the currently analyzed cohorts SERPINA3 c.918-1G>C was identified in 3.0% cases with indication of hereditary predisposition to the disease and in 1.7% of the breast cancer cases unselected for family history of the disease or age at diagnosis. Based on the caseecontrol comparisons, the risk conferred by SER-PINA3 c.918-1G>C allele falls in the range typical for moderate risk alleles [16] (2.8-fold based on unselected cases and fivefold based on hereditary cases). The variant showed significant association with medullary carcinoma, a rare subtype with relatively favorable prognosis, and curiously also enriched in BRCA1 germline mutation carriers [17]. The SERPINA3 c.918-1G>C carrier families had also history of several other cancer types, including stomach cancer and cancers of the head and neck, indicating that the cancer spectrum associated with SERPINA3 variants might extend beyond breast cancer. The encoded SERPINA3 is an inhibitor of several serine proteases and acts as an acute phase reactive protein. It has been reported to have roles in a variety of physiological activities such as inflammatory response [18], complement activation [19], regulation of lipid metabolic processes [20], wound healing, extracellular matrix remodeling [21] and apoptosis [15]. Variants of this gene can influence protease targeting and thereby also be tissue specific. Overexpression of SERPINA3 has been observed in several cancer types, including endometrial cancer, melanoma, glioma and breast cancer [22e25]. Its high expression has been demonstrated to positively correlate with poor prognosis in patients with colon [26], breast [27], lung [28,29] and gastric cancers [30]. Recently, a new role for SERPINA3 was discovered as a transcriptional regulator of genes related to hepatocellular carcinoma progression by inducing telomere elongation, cell proliferation, migration and invasion [31]. SERPINA3 has been reported to be estrogen-inducible, and its mRNA level has been suggested to be significant predictor of good prognosis in hormone receptor (ER and/or PR)-positive breast cancer patients [25]. Taken together, various lines of evidence suggest that alterations in multifunctional SERPINA3 have a role in malignancy development.
In conclusion, the current genetic data demonstrates that germline variant in SERPINA3 gene, c.918-1G>C, associates with breast cancer. Based on the caseecontrol comparison, the risk associated with it is about threefold compared with non-carriers. Although rare, this Finnish founder allele is enriched in Northern Finland with statistically significant association with breast cancer. According to GnomAD database (https://gnomad.broadinstitute.org/), this gene harbors other deleterious alleles that could be relevant in other populations. Addition of SERPINA3 to the growing list of genes with functions beyond DDR pathway to harbor predisposing alleles underscores that diverse mechanisms are likely to be relevant to breast cancer pathogenesis. Which of the numerous functions of SERPINA3 is relevant for breast cancer predisposition in particular, warrants further investigation.

Funding
This work was supported by the Academy of Finland (grant numbers 307808, 314183, 335242) and the Cancer Foundation of Finland sr. Funders had no role in the study design, collection, analysis and interpretation of data, writing of the report and the decision to submit the article for publication.