Frequent LPA KIV-2 Variants Lower Lipoprotein(a) Concentrations and Protect Against Coronary Artery Disease

Background Lipoprotein(a) (Lp(a)) concentrations are a major independent risk factor for coronary artery disease (CAD) and are mainly determined by variation in LPA. Up to 70% of the LPA coding sequence is located in the hyper-variable kringle IV type 2 (KIV-2) region. It is hardly accessible by conventional technologies, but may contain functional variants. Objectives This study sought to investigate the new, very frequent splicing variant KIV-2 4733G>A on Lp(a) and CAD. Methods We genotyped 4733G>A in the GCKD (German Chronic Kidney Disease) study (n = 4,673) by allele-specific polymerase chain reaction, performed minigene assays, identified proxy single nucleotide polymorphisms and used them to characterize its effect on CAD by survival analysis in UK Biobank (n = 440,234). Frequencies in ethnic groups were assessed in the 1000 Genomes Project. Results The 4733G>A variant (38.2% carrier frequency) was found in most isoform sizes. It reduces allelic expression without abolishing protein production, lowers Lp(a) by 13.6 mg/dL (95% CI: 12.5-14.7; P < 0.0001) and is the strongest variance-explaining factor after the smaller isoform. Splicing of minigenes was modified. Compound heterozygosity (4.6% of the population) for 4733G>A and 4925G>A, another KIV-2 splicing mutation, reduces Lp(a) by 31.8 mg/dL and most importantly narrows the interquartile range by 9-fold (from 42.1 to 4.6 mg/dL) when compared to the wild type. In UK Biobank 4733G>A alone and compound heterozygosity with 4925G>A reduced HR for CAD by 9% (95% CI: 7%-11%) and 12% (95% CI: 7%-16%) (both P < 0.001). Frequencies in ethnicities differ notably. Conclusions Functional variants in the previously inaccessible LPA KIV-2 region cooperate in determining Lp(a) variance and CAD risk. Even a moderate but lifelong genetic Lp(a) reduction translates to a noticeable CAD risk reduction. (J Am Coll Cardiol 2021;78:437–49)

The 4733G>A variant is located 11 bp upstream of the 5' exon boundary. The length of the expected transcript if all four exons are spliced correctly is known (602 bp). Splicing aberrations are detected by PCR products that differ in length and/or sequence from the transcript that is expected.

Supplemental Figure 2: Dilution series of mutant in wild type plasmid for asPCR 4733G>A.
Varying percentages of pSPL3 containing the 4733G>A mutation was diluted with wild type pSPL3. 69 fg of DNA was used per reaction (corresponding to about the amount of genome copies present in 20 ng genomic DNA). The percentage of mutant plasmid is given above each sample pair. The lane indicated as "0%" contains only the wild type plasmid. The primer does not produce an amplicon at 0% mutation level while it shows an amplicon at 0.5% mutation level.

Supplemental Figure 3: Lp(a) concentrations of 4733G>A carriers and non-carriers over the whole isoform range, restricted to individuals who do not carry the 4925G>A variant.
This plot is restricted to individuals who do not carry the 4925G>A variant. Lp(a) concentration in mg/dL in non-carriers of 4733G>A (light blue) and carriers (orange) grouped by the number of KIV repeats on the smaller expressed isoform. Levels above 210 mg/dL are not shown for better representation. Absolute number of carriers per isoform stratum are given below each boxplot. Isoform grouping was done to have n≥20 in each group prior to subsetting for non-carriers of 4925G>A.
Supplemental Figure 4 (see next page): Dot plots of isoform dominance patterns in carriers and noncarriers of 4733G>A. The variant 4733G>A induces a switch in the isoform dominance. In non-carrier the shorter isoform is mostly also the dominant one, as would be expected. Conversely, in carriers of 4733G>A it can be seen that, especially in the range of isoform size 24 -33, the shorter isoform is no more the dominant one, because 4733G>A likely reduces its expression levels. Which isoform is the dominant one was deduced from Western blots (WB).
(A) Guide for interpreting panel B and C; (B) Isoform size and isoform dominance patterns in non-carriers of 4733G>A (n = 1216); (C) Isoform size and isoform dominance patterns in carriers of 4733G>A (n = 613).
For each individual, the dominant isoform is colored in red and the non-dominant one in blue. Plots only include individuals who express two isoforms in plasma and do not carry 4925G>A.

Panel A)
The dot plots show size of the isoforms in each individual. The figures are restricted to individuals that presented two isoforms in Western Blot (isoform 1 and isoform 2). Only non-carriers of 4925G>A are shown because 4925G>A has been shown before to reduce allelic expression (1). The individuals are sorted along the x-axis, the y-axis shows the isoform size (two dots on y-axis per individual). The individuals are ordered according to the size of Isoform 1 and within each isoform 1 size group, then by the size of isoform 2. The dominant isoform is colored in red, the non-dominant isoform in blue (in the example isoform 37 and 39, respectively).

Panel B)
shows the non-carriers of 4733G>A. In this plot, Isoform 1 is the dominant isoform for most individuals.
In contrast, in Panel C) (carriers of 4733G>A) the dominant isoform is mostly the isoform 2. This indicates that in non-carriers the dominant isoform is usually the smaller isoform (as expected (2)). Conversely, in carriers of 4733G>A the smaller isoform is often the one that is expressed to a lower instead of a higher extend (i.e. it is the non-dominant isoform). Interestingly this applies mostly when isoform 1 is in the size range 24-33, while in samples with the smaller isoform <24 and especially <23 this switch in the dominance patterns of the isoforms is not seen and the non-dominant isoforms are still those >24, which here correspond to isoform 2. We assume that this is because the 4733G>A is predominantly (albeit not exclusively) associated with isoforms 24-33. Figure 5 (see next page): Isoform-specific Lp(a) concentrations of alleles of (A) 4733G>A and (B) 4925G>A carriers are reduced compared to alleles from non-carriers. However, these plots contain both alleles of variant carriers, whereas the 4733G>A and 4925G>A variants are mostly present only on one of the two alleles. Including only the allele of carriers that most likely carries the respective variant based on the isoform dominance pattern, shows that both variants lower the isoform-specific Lp(a) concentration, irrespective of the isoform size.

Supplemental
Blue: alleles from non-carriers of 4733G>A and 4925G>A. All plots are restricted to heterozygous carriers of isoforms since true homozygous carriers cannot be discerned from heterozygous carriers with one null allele. Lines represent smoothed conditional means based on the loess method (local polynomial regression fitting) in the ggplot2 package in R and the gray shades represent 95%CIs. Values above 100 mg/dL are not shown for better visualization. Plots for 4733G>A (A, C) are restricted to non-carriers of 4925G>A to avoid confounding. Vice versa, plots for 4925G>A (B, D) are restricted to non-carriers of 4733G>A.
(A) Orange represents both alleles of 4733G>A carriers. (B) Purple represents both alleles of 4925G>A carriers. (C) Orange represents the 4733G>A carrying allele inferred by the dominance pattern. The other allele of 4733G>A carriers is excluded from this plot since homo-and heterozygous individuals cannot be discerned. (D) Purple represents the 4925G>A carrying allele inferred by the dominance pattern. The other allele of 4925G>A carriers is excluded from this plot since homo-and heterozygous individuals cannot be discerned.
The isoform-specific Lp(a) concentrations were calculated by multiplying the total Lp(a) concentration of an individual with the relative expression of isoform 1 and isoform 2 derived from the Western blot data. The dominance pattern was used to infer the allelic location of the two functional variants, since it is expected that the non-dominant isoform carries the respective mutated variant. If isoform 1 is the dominant isoform in a variant-carrier, the variant is assumed to be on isoform 2. If isoform 1 (smaller expressed isoform) is the dominant isoform in a variant-carrier, the variant is assumed to be on isoform 2 (larger expressed isoform). If isoform 2 is the dominant isoform of a variant-carrier, the variant is assumed to be on isoform 1.

Supplemental Figure 6 (see next page): Minigene assay indicates splicing modulation.
A) Explanation of the observed effects as deduced by Sanger sequencing. Sanger sequencing trace with explanation is shown in Supplemental Figure 7. 4733G>A induced two different splice effects: splice product 1 uses a novel splice acceptor located one base downstream of 4733G>A leading to a 9 bp intron retention. Splice product 2 enhances a cryptic splice acceptor 34 bases downstream of 4733G>A, deleting 24 bases of KIV-2 exon 2. Usage of these latter splice acceptor is observed also in the wild type products but at much lower level. This is in line with the bioinformatic predictions shown in Supplemental Figure 8. Both changes are in-frame.
B-E) Reverse Transcriptase PCR images showing the effect of 4733G>A in five biological replicates with two technical replicates each. 4733G>A activates two cryptic splice sites. The upper band (blue dot) is the splice product 1, which includes additional intronic bases. The lower band (magenta dot) is the splice product 2, which deletes 24 exonic bases. The grey dot indicates the wild type product showing with a length between the two aberrant splice products. WT: wild type, mut: mutant, +/-P: puromycin addition.

Supplemental Figure 7: Minigene assay indicates splicing modulation (shown as Sanger trace).
cDNA Sanger sequence of the mutant minigene PCR product displays a simultaneous 9 bp intron retention and a 24 bp exonic deletion, as deduced from manual separation of the mixed Sanger traces. The explanation of the sequence portions that are mixed are shown below the Sanger trace. Figure 8: Prediction of the effects of 4733G>A on splicing by NetGene2.

Supplemental
Blue: Exon sequence of LPA. Black Intron sequence. Uppercase: position of 4733G>A. Red pipe symbols: predicted acceptor splice sites (pipe symbols denoting the predicted splice event). Green: branchpoint predicted by LaBranchoR, lowercase underlined branchpoint context sequence predicted by SVM-BPfinder (Supplemental methods). SVM-BPfinder predicts the AG exclusion zone to extend for 61 bp upstream of the predict wild type acceptor site (red pipe nr 1).
For the wild type sequence the best scoring splice acceptor coincides with the acceptor splice site of the reference transcript (ENST00000316300, NetGene2 score 0.16; nr 1), with a second weaker splice acceptor predicted downstream (score 0.14; nr 3). For the mutated sequence the wild type splice acceptor is abolished while an additional splice acceptor is created one base downstream of 4733G>A (score 0.25; nr 2). The second predicted splice event downstream of the wild type site is retained (confidence 0.14; nr 3) and a third weak event (score 0.07, unnumbered pipe symbol in the mutated sequence) is predicted even more downstream but was not observed in the minigene products.
Both splice events predicted in the wild type sequence were observed also in the wild type minigenes (see main results). In the mutated minigenes both the event 1 bp downstream of 4733G>A (event nr 2; splice product 1 in Supplemental Figure 6) and the stronger alternative splice event in the exon (event nr 3; splice product 2 in Supplemental Figure 6) were observed.

Supplemental Figure 9: Association between Lp(a) concentrations and proxy SNPs in UK Biobank
Lp(a) concentration (nmol/L) in carriers of rs75692336 (purple, proxy SNP for 4925G>A), rs6938647 (orange, proxy SNP for 4733G>A), both variants (green, compound heterozygosity) and none of the variants (blue) in UK Biobank (n=433,563). Both proxy SNPs associate with lower Lp(a) concentrations; in the double carrier group, the variance is reduced. IQR: interquartile range.

Supplemental Figure 10: Hazard ratio for CAD risk in UK Biobank with time-on-study.
Status refers to the presence (yes) or absence (no) of the proxy SNPs rs6938647/rs75692336 for 4733G>A/4925G>A for the two variants. Reference group are non-carriers of proxy SNPs of both variants. Model is adjusted for sex and age. Time-on-study is taken as time scale. In this plot, individuals with CAD events prior to enrolment are omitted (n=17,835).

Supplemental Tables
Supplemental Table 1  No. of cycles 40 * Selective 4733G>A base is shown in red. The additional mismatch at position -2 from the 3'end to enhance the thermodynamic disadvantage of unspecific parings is marked in green.

Variant naming
Genomic localization of variants in the KIV-2 to the genome is ambiguous and thus no rs identifier can be attributed. As for 4925G>A we therefore name the variant 4733G>A according to its position in the reference sequence used for the NGS mutation screening (1). Possible genome coordinates are given in Supplemental Table 1. The base change is given relative to the coding strand of LPA NM_005577.4.

Assay development
Several allele-specific primers were designed containing the selective allele at the 3' end. Additional thermodynamic disadvantage to unspecific pairings is given by various base mismatches introduced on position -2 from the 3' end of the allele-specific primers (4). No variants in the primer binding sites were observed in the European populations from the 1000G project (data from (3)). Performance of these primers was tested in mixtures of pSPL3 plasmids containing 4733 wild type or mutant and the primer with the highest specificity and sensitivity was chosen. Primer concentrations were optimized using one sample containing the 4733G>A variant and one without, as confirmed by ultra-deep NGS (3). The final protocol was again tested for sensitivity on mixtures of pSPL3 plasmids containing 4733 wild type or mutant ranging from 100% to 0% mutant fraction (Supplemental Figure 2). In addition, we used also 16 samples from SAPHIR (5) (8 which are confirmed non-carriers of the variant and 8 with variant levels ranging from 1.7 to 6.4 % variant level determined by ultra-deep NGS data from Coassin, Schönherr et al., 2019 (3)) to further validate the assay and determine the optimal CT threshold. A PCR amplicon for PNPLA3 served as positive amplification control to detect PCR failures e.g. due to insufficient DNA quality. PCR reactions were run on an Applied Biosystems QuantStudio 6 Flex system in 384 well format using SYBR green chemistry.

Assay validation
The allele-specific real-time PCR (asPCR) assay correctly classified all 59 GCKD samples where 4733G>A carrier status had been confirmed previously by ultra-deep next generation sequencing (3) (23 positives and 36 negative samples). The positive samples showed a mutational level (representing the fraction of mutated KIV-2 units) from 1.86 to 10.04 %, corresponding to Ct values ranging from 29.95 to 33.02. All 39 negative samples showed either no amplification or Ct values >38.3. Thus the separation of the two groups was very clear. A Ct >35 was considered negative in GCKD typing. Of 4,909 GCKD samples typed only two samples had to be excluded due to failure of the amplification control product (indicating e.g. insufficient DNA quality or quantity).

Description of the GCKD study
The GCKD study comprises of Caucasian patients with moderate CKD. Moderate CKD is defined as an estimated glomerular filtration rate (eGFR) according to the CKD_EPI equation of 30-60 mL/min per 1.73 m 2 (corresponding to CKD Stage 3) or overt proteinuria and eGFR > 60 mL/min per 1.73 m 2 . Overt proteinuria is defined as an albumin to creatinine ratio in the urine of >300 mg/g or a protein to creatinine ratio in 24-h urine of >500mg/g. Exclusion criteria were active malignancy, NYHA IV heart failure, renal or any other transplantation, non-Caucasian origin and legal attendance. The study has been approved by the review boards of the participating institutions and informed consent was obtained from all the participants.

ELISA and Western blot in GCKD
For determination of the Lp(a) concentrations by ELISA, plates were coated using a polyclonal affinitypurified rabbit anti-human apo(a) antibody to immobilize the Lp(a) particles. The horseradishperoxidase-conjugated monoclonal anti-apo(a) antibody 1A2 (6) was used for detection. Each sample was measured twice at 1:150 and 1:1500 dilution and the OD reading, which was in the linear range of the 7-point standard curve, was used. The lower detection limit of the assay is 0.1 mg/dL. A detailed protocol has been published in (6).
For isoform determination 150 ng Lp(a) of each sample were loaded on a 1.46% agarose gel with 0.08% SDS. Electrophoresis time was 18 h at 0.04 A constant current. A size standard containing five plasma samples with only one apo(a) isoform each (13,19,23,27,35 KIV repeats) was applied in every seventh lane of the gel. Gel was semi-dry blotted to a PVDF membrane blocked with 1% BSA, 85 mM NaCl, 10 mM TRIS, 0.2% Triton X-100 for 30 min at 37°C. The membrane was incubated with horseradishperoxidase-conjugated 1A2 antibody, washed extensively and signals were detected using ECL substrate (WesternBright Chemilumineszenz Spray, Biozym, Vienna, AT; Amersham Hyperfilm™ ECL™, GE Healthcare, Chicago, IL, USA). A detailed protocol has been published in (6).
Individuals without any isoform expressed but Lp(a) values available are included in the analysis, but excluded from plots showing isoform data.

Look up in public datasets
The three best proxy SNPs for 4733G>A were selected based on the R2. Other selection criteria which were considered were: High D' and similar MAF compared to the 4733G>A variant. The direction of the OR reported in the dataset was adjusted to match the tagging allele. For assessing the coronary artery disease (CAD) risk for carrying both 4733G>A and 4925G>A, rs6938647 was used as proxy SNP for 4733G>A since the combination of rs6938647 (for 47233G>A) and rs75692336 (for 4925G>A) was the proxy SNP combination with the highest specificity (Supplemental Table 10).

Frequency in 1000 Genomes
The frequency of LPA KIV-2 4733G>A in the 1000 Genome (1000G) data was assessed as described previously (1,3). In brief, the 1000G phase 3 high coverage exome data that mapped to the LPA KIV-2 region (GRCh37, chr6:161,033,785-161,066,618) were downloaded as BAM file from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data and submitted to our LPA Server Pipeline (3). Since all reads are aligned to one single repeat, variants present in one or a few KIV-2 repeats are detected as low level mutations, resembling the calling of somatic mutation. A coverage of 780X is required for high confidence calls (i.e. variant calling at ≥1% mutation level with the 95% confidence interval (95%CI) of a binomial distribution not crossing zero) in paired end sequencing data (3).

Survival analysis and Lp(a) concentrations in UK Biobank
The UK Biobank study recruited 503,325 individuals between 2006-2010 (age at enrolment 40-69 years) and was approved by the North West Multi-centre Research Ethics Committee. The Lp(a) concentrations were measured by an immunoturbidimetric assay (Randox Laboratories; Crumlin, County Antrim, United Kingdom using a Beckman Coulter AU5800 Platform). Concentrations falling within the reportable range of the assay (3.8-189 nmol/L) are available in the UK Biobank showcase. Lp(a) levels outside the reportable range were requested separately. The number of UK Biobank participants with Lp(a) levels below and above the reportable range are 46,836 and 32,953, respectively. Genetic data available from UK Biobank were generated as described elsewhere (7).
We restricted our analyses to Caucasians by selecting participants with British, Irish or any other white ethnic background (Data-Field 21000, n=472,125). First, we investigated in UK Biobank the Lp(a)lowering effect of the proxy SNPs for 4925G>A (rs75692336) and 4733G>A (rs6938647) as well as their combination (n=433,563 with Lp(a) concentrations available; among them 45,533 (10.5%) and 31,051 (7.16%) individuals had Lp(a) levels below and above reportable range, respectively). Subsequently, we also investigated their protective effect on CAD (n=440,234 with genotyping data available). The incidence of CAD was assessed based on first reported occurrences of diagnoses falling within the codes I21-I25 (8) of the International Classification of Disease version 10 (ICD-10), covering acute myocardial infarction, subsequent myocardial infarction, certain current complications following acute myocardial infarction, other acute ischaemic heart diseases and chronic ischaemic heart diseases (Data-Fields 131298, 121300, 131302, 131304, 131306). The first reported occurrence was provided by mapping the earliest date recorded through either self-report at any assessment centre, inpatient hospital data, primary care or death record data. One individual was excluded due to an implausible date of CAD event. Hazard ratio for CAD was estimated as a function of the carrier status for the proxy SNPs adjusted for sex by cox proportional hazard regression with age as time scale (9), with censoring at the 1 st of January 2020 and year of birth as beginning of observation time. In total, 35,863 individuals developed an event. Of those, 17,835 events occurred before the recruitment date. The proportional hazard assumption was verified by visual assessment of Schoenfeld residuals.

Minigene assay
We used the previously described LPA-pSPL3 minigene (1). The structure of the minigene is given in Supplemental Figure 1. The 4733G>A mutation was introduced by site-directed PCR mutagenesis (Supplemental Table 14) and the complete LPA insert (2.6 kb) was verified by Sanger sequencing.
HepG2 hepatoblastoma cells (ATCC_HB-8065) were cultured in DMEM high glucose, pyruvate (Gibco, UK) supplemented with 10% fetal calf serum and 1% final concentration PenStrep (100 u/mL Penicillin and 0.1 mg/mL Streptomycin; Sigma Aldrich). 4 x10 5 cells/well were seeded in 6-well culture plates 24 hours prior to transfection and were incubated (5%CO2 at 37°C). The cells were transfected with 2.5 µg plasmid using the Transporter TM 5 Transfection Reagent (Polysciences Europe GmbH) in two technical and five biological replicates according to manufacturer's protocol and incubated with and without puromycin (10 µg/mL final concentration) for 5 hours prior to the harvesting of cells. Transfection success was assessed by fluorescence microscopy and transfection efficiencies examined using LUNA-FL Dual Florescence Cell Counter (Logos Biosystems). Cells were harvested 48 hours after transfection and RNA extraction was performed using TRI Reagent Solution (Invitrogen). The RNA quality was assessed by automated fragment analysis (Agilent Fragment Analyzer, kit DNF471). The RQN ranges from 7.6 to 10 (mean ± SD: 9.7 ± 0.6). 1 µg of RNA was reverse transcribed using AMV Reverse Transcriptase and random hexamers (Promega) and analyzed by PCR and Sanger sequencing as described (1).
Splicing patterns were assessed by PCR using primers that bind in the constitutive minigene exons that flank the LPA insert as described in (1) and Supplemental Figure 1.