Comprehensive classification of TP53 somatic missense variants based on their impact on p53 structural stability

Abstract Somatic variation is a major type of genetic variation contributing to human diseases including cancer. Of the vast quantities of somatic variants identified, the functional impact of many somatic variants, in particular the missense variants, remains unclear. Lack of the functional information prevents the translation of rich variation data into clinical applications. We previously developed a method named Ramachandran Plot–Molecular Dynamics Simulations (RP-MDS), aiming to predict the function of germline missense variants based on their effects on protein structure stability, and successfully applied to predict the deleteriousness of unclassified germline missense variants in multiple cancer genes. We hypothesized that regardless of their different genetic origins, somatic missense variants and germline missense variants could have similar effects on the stability of their affected protein structure. As such, the RP-MDS method designed for germline missense variants should also be applicable to predict the function of somatic missense variants. In the current study, we tested our hypothesis by using the somatic missense variants in TP53 as a model. Of the 397 somatic missense variants analyzed, RP-MDS predicted that 195 (49.1%) variants were deleterious as they significantly disturbed p53 structure. The results were largely validated by using a p53–p21 promoter–green fluorescent protein (GFP) reporter gene assay. Our study demonstrated that deleterious somatic missense variants can be identified by referring to their effects on protein structural stability.


Introduction
Deleterious genetic variation causes functional defects of the affected genes and contributes to various types of human diseases including cancer.Deleterious genetic variation is also a valuable biomarker for clinical prevention, diagnosis, treatment and prognosis of related diseases.There are two types of genetic variation: somatic variation and germline variation.The somatic variation originates during the lifetime of the carrier and is present only in the affected cells and not heritable, whereas germline variation arose during the evolutionary process, is present in every cell of the carrier, and is inheritable across generations.The prevalence of somatic variation is much higher than germline variation in the affected genes and related diseases.For example, deleterious somatic variation in EGFR contributes to over 60% of nonsmall cell lung cancer [1], whereas deleterious germline variation in BRCA1 and BRCA2 associated with hereditary breast and ovarian cancer (HBOC) contributes only 5%-10% of breast cancer and 15% of ovarian cancer [2].
The rapid progress of genomic studies has led to the identification of a vast quantity of genetic variants.Knowing the functional impact of the variants is a pre-condition to translate the knowledge into clinical applications.Different approaches have been developed to determine the function of germline variation and somatic variation.The American College of Medical Genetics and Genomics / the Association for Molecular Pathology (ACMG/AMG) guidelines are widely used as a standard to classify germline variants [3].It sets detailed criteria to classify a germline variant into one of the functional categories of Pathogenic, Likely Pathogenic, VUS (variants of unknown significance), Likely Benign, and Benign, and the ClinVar database is the largest database for the curated germline variants in different genes and diseases [4].For somatic variation, however, there are less unified criteria in classifying somatic variation than in germline variation, and the current efforts are largely made in documentation rather than functional classification.For example, the COSMIC (Catalogue of Somatic Mutations in Cancer) database is currently the largest resource hosting the somatic variants identified in various types of cancer [5].Under the scope of COSMIC, the Cancer Mutation Census (CMC) project 'integrates all coding somatic mutations collected by COSMIC with biological and biochemical information from multiple sources, combining data obtained from manual curation and computational analyses' (https://cancer.sanger.ac.uk/cmc/home).Under the CMC curation system, a somatic variant is classified into one of the five groups: 'Tier One', 'Tier Two', 'Tier Three', 'Other mutations', and 'Synonymous mutations'.The variants in Tier 1, Tier 2, and certain variants in Tier 3 are considered to be similar to the 'Pathogenic' and 'Likely pathogenic' variants in germline variant classification, whereas these grouped in 'Other mutations' and 'Synonymous mutations' are not (https://cancer.sanger.ac.uk/cmc/help).Instead of making a conclusive classification on its own, the CMC classification system relies on the classification information made by the original submitters [5].However, the classification from different sources are often highly redundant, inconsistent, and even controversial to each other, such that it is up to the users to make their own interpretation of whether a somatic variant is pathogenic or nonpathogenic.Compared to the germline variation, therefore, functional classification of somatic variants remains a big challenge.Furthermore, functional classification for somatic missense variation is even more troublesome as it doesn't disrupt the translation of full-length protein but changes a single codon.The lack of conclusive classification for somatic variants leaves high uncertainty for their clinical relevance and usage.This is particularly important when considering the much higher prevalence of somatic variants than germline variants in human diseases.
We recently developed the RP-MDS (Ramachandran Plot-Molecular Dynamics Simulation) method for the functional classification of germline missense variants [6].Molecular Dynamics Simulation (MDS) is a computation-based atomistic simulation method [7].By analyzing the physical movement of atoms and molecules for a given time period, the trajectories generated by MDS are used to measure the macroscopic thermodynamic properties of the protein structure.Ramachandran Plot (RP) measures the rigidity of the N-C peptide bond [8].By ruling out the unfavourable structure conformation due to the collusion between nonbonded atoms, RP provides high accuracy for the protein structure.The combination of MDS and RP provides a powerful tool to measure the effects of a genetic variant on protein structure, and the information is applicable to predict the deleteriousness of genetic variants.Using the RP-MDS method, we were able to predict the deleterious germline missense variants in multiple cancer genes, including TP53, BRCA1, and BRCA2 [6,[9][10][11].We reasoned that somatic missense variants and germline missense variants should have the same effects on local protein structure, regardless of their biological and evolutionary differences.As such, the RP-MDS method designed for characterizing germline missense variants should also be applicable to characterize the deleterious somatic missense variants.
In the current study, we used TP53 somatic missense variants as a model to test the potential of using RP-MDS to identify the deleterious somatic missense variants based on their impact on p53 structure.We identified a total of 397 TP53 somatic missense variants from the COSMIC database.By using RP-MDS, we measured the impact of each variant on p53 structure stability.We predicted 195 (49.1%) of the variants as deleterious as evidenced by their distortion of p53 structure stability.Using an in vitro p53-p21 promoter-GFP reporter assay, we validated the reliability of the prediction.Our study demonstrates that deleterious somatic missense variants can be identified by directly referring to their impact on the stability of protein structure.

Sources of missense benign, pathogenic, and unclassified variants
From the COSMIC database, we identified 26 pathogenic missense variants in Tier 1 and Tier 2 and 18 unclassified missense variants under 'Other mutations' and used these to train the RP-MDS model.In order to balance the number of variants between Tier 1/Tier 2 and 'Other mutations', eight wildtype bases randomly selected from p53 DBD were also included to increase the number of variants under 'Other mutations'.All variants selected in Tier 1 and Tier 2 were also present as pathogenic, and the variants in 'Other mutations' were also present as benign in the ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/,accessed April 9, 2020).We downloaded 29 900 TP53 variants from the TP53 database (https://tp53.isb-cgc.org,R20 of July 2019, accessed 21 April 2021).After filtering the germline variants in ClinVar and by our previous study [9][10][11], and the duplicated variants, we identified 397 TP53 somatic missense variants (Table S1).

Molecular Dynamics Simulation and Ramachandran Plot
The process was performed following the detailed procedures [9].In MDS, the MODELLER software in the Chimera package was used to build the missing atoms in the p53 DBD template (PDB ID: 2OCJ covering 94-313 residues, resolution 2.05 Å), and Rotamer software was used to replace the wildtype amino acid residues with the missense variant altered residues [20].The altered p53 structure by the missense variant was used as the starting configuration for MDS in GROMACS (version 2021) [21].The altered p53 structure was placed at the centre of the 10 × 10 × 10 nm simulation box, saturated with TIP3P water, and neutralized with Cl − ions.AMBER03 force field was used to model the protein complex and ions.Steepest descent algorithms were applied to the system for 1 ns equilibration run at 298 K and 1 bar in the NPT ensemble using Berendsen thermostats and barostats algorithms.Afterwards, V-rescale thermostats and Parrinello-Rahman barostats were used for a 40 ns production run for the altered structure [22].The trajectories from the last 10 ns production run were used for further analysis.A 2 fs timestep was set as the basis for the Verlet velocity algorithm.The particle mesh Ewald method was used to treat the long-range electrostatic potentials, and the cutoff distance was set at 1.0 nm.
The Ramachandran scatter plot (RSP) for each structure was generated by utilizing "gmx rama" command in the GROMACS software [21].The RSP for each variable structure was generated from the last 10-ns of the MDS production run and converted into the Ramachandran density plot (RDP) by kernel density estimation using the Python "SciPy" module [23,24].A grid dimension of 32 × 32 was used for each RDP, and each grid point was used as a base for comparison. where Each grid point was compared to the χ b,n− based file, which was comprised of an average of 18 benign variants, bV i , and eight wildtype bases, W j RDP grid points, where i and j are the number of benign and wildtype RDP.
A binary system (1, 0) was used to determine whether the difference between the variant and the based file grid point was beyond the standard deviation, S of the average of 18 benign variants and wildtype bases.The following equation illustrated the determination of the binary "1" or "0": The natural log of the total grid point, G binary system (in percentage) was used to determine the deleteriousness of the variant.'Deleterious' or 'non-deleterious' was used to classify somatic variants.

GFP-based functional assay
A GFP-based reporter assay was applied to validate the predicted deleterious effects of somatic missense variants on the regulation of gene expression [19,25].The assay consisted of two key components: the p53 expression plasmid pLX313-TP53 and the p53 binding plasmid pGL2-p21-promoter-GFP [25].The variantmodified p53 expressed from pLX313-p53 binds to the p53 binding motif in the p21-promoter of pGL2-p21-promoter-GFP to regulate GFP expression, which ref lects the impact of variant-modified p53 on its binding ability to the p53 binding motif in p21-promoter [26].The plasmid pLX313-TP53 containing wildtype TP53 was used as the normal control.Single-base variants were introduced into TP53 in pLX313-TP53, in which the variant-containing fragments were generated by polymerase chain reaction (PCR) using the primers containing the variants and used to replace the wildtype TP53 in the pLX313-TP53 by restriction digestion and ligation, transformation, and purification.Each variantcontaining TP53 construct was confirmed by Sanger sequencing.The criteria for variant selection were as follows: (i) the benign and pathogenic variants were the somatic and germline variants annotated in both COSMIC database and ClinVar database.(ii) the deleterious variants predicted were also the somatic and germline databases present in COSMIC database and ClinVar database.The H1299 TP53 −/− epithelial-like cell line was used to assess the effects of the variant-containing TP53 on gene expression by cotransfection of variant-modified pLX313-TP53 and pGL2-p21promoter-GFP plasmids.1 × 10 4 H1299 cells in RPMI1640 medium with 10% fetal bovine serum were cotransfected with 100 ng p53 plasmids and 100 ng p21-promoter-GFP plasmids using lipofectamine 3000 Transfection Reagent (Thermo Fisher Scientific, MS, USA).Forty-eight hours after the transfection, GFP activities in the transfected cells were read using the PerkinElmer Victor X3 Microplate Reader.The relative f luorescence value (RFU) was used to define p21 promoter activities in the H1299 cells and subtracted by the value from the negative control (cells transfected with no expression plasmids) to obtain the final RFU.Three repeats were performed for each test, and the values were averaged and presented as mean ± SEM.One-tailed T-test was used to compare the RFU between the wildtype and variant-containing TP53, with P-value <.05 considered as deleterious and P-value >.05 as nondeleterious.Statistical analyses were performed using Prism 9 software (GraphPad).

The degree of germline and somatic missense variants altering the same codon
Using the DNA damage repair (DDR) genes as the model, we analyzed the degree of germline variants and somatic variants causing the same codon change by referring to the germline variants in ClinVar and somatic variants in COSMIC databases.In the 170 DDR genes [27], we identified a total of 128 657 germline variants and 45 271 somatic variants, of which 12 421 were shared between germline and somatic accounting for 11% of the germline variants and 27% of somatic variants (Table S2).For example, 658 (78%) of the 841 PALB2 somatic variants, 922 (62%) of the 1495 TP53 somatic variants, and 989 (61%) of the 1634 BRCA2 somatic variants were shared with the germline variants in the same genes (Table 1).The data highlight that somatic and germline missense variants altering the same codon are widely present, particularly in functionally important genes as represented by DDR genes.

Determination of threshold between deleterious and nondeleterious variants
We identified the existing pathogenic variants, benign variants, and wildtype bases located in p53 DBD and used those as the training materials to set the threshold to differentiate deleterious and nondeleterious variants.The Ramachandran density plot (RDP) from benign variants and wildtype were combined to create the based files and used to compare with the RDP of pathogenic variants.Figure 1 shows the f lowchart of RP-MDS process.
To set the cut-off thresholds for non-deleterious and deleterious variants, we first performed MDS for the 25 pathogenic variants, 18 benign and 8 wildtype alleles.The known benign and pathogenic missense variants used to train RP-MDS affect p53 codons of G105, C141, A159, A161, H168, E171, C275, and P278.They were also germline variants present in ClinVar database.RSP was extracted from the last 10 ns MDS trajectories, converted to the RDP and used to calculate the structural deviation for benign and pathogenic variants (Fig. 2A).Kolmogorov-Smirnov  (K-S) and Anderson-Darling (A-D) statistical tests showed that the data did not reject log-normal distribution ( Fig. 2B) [28,29].
The results from a two-sided t-test showed that the mean values between the benign and pathogenic data were significantly different (P < .0001),indicating that the benign and pathogenic missense variants had distinct effects on p53 structure.Based on the pathogenic log-normal distribution curve, we set the structural deviation >3.46 as deleterious and ≤3.46 as non-deleterious.

Classification of somatic missense variants
We identified 397 unclassified somatic missense variants.Of the 397 variants, 384 (96.7%) were classified as 'other mutations', 11 (2.8%) as Tier 3, 1 (0.25%) as Tier 2, and 1 (0.25%) had no classification by COSMIC database.Using the wildtype p53 as the template, we generated the variable p53 structures for the 397 unknown somatic missense variants.We performed MDS and RP to classify the variants.Under the cut-off threshold at >3.46, we observed that 195 (49.1%) variants had a deleterious impact on the p53 structure (Table S3).These variants were enriched at the residue positions of G105, C141, P153, A159, A161, H168, E171, E180, D186, D208, N210, P223, D228, C275, and P278. Figure 3 shows the distribution and frequency of the residues altered by the variants in p53. Figure 4 shows the representative distribution of a group of variants measured by RMSD and RMSF in MDS, which indicated the likelihood of deleteriousness for the classified variants [9].The predicted deleterious variants were concentrated more at the 150-170 and 200-240 residue regions, whereas the predicted nondeleterious variants were concentrated more at the 130-160, 210-230, and 260-300 residues regions (Suppl.Fig. 1A).However, the difference between the deleterious and non-deleterious data was not significant (Fig. S1).p.A159D, p.Q192L, p.N210K, and p.T231N had the highest deleterious impacts on TP53 structure as ref lected by their high structural deviations in the range of 3.88-3.91,corresponding to 48.3%-49.9% of structural change.Figure 5 shows the distribution of RP images.Using the last 10 ns of MDS trajectories, it showed the overall protein structural change by quantifying the protein backbone (ϕ and ψ) for each residue (Fig. 5A).Each variant was compared to the based files from 18 benign variants and seven wildtype bases.The results showed that for all predicted deleterious variants, the density peaks for the αhelix region [ϕ, ψ = (−60, −30)], PII-spiral regions, and β sheet region significantly increased, ref lecting their significant deviation from the benign and wildtype structures (Fig. 5B), evidencing the deleterious impact of the missense variants on p53 structure stability.
Figure 6 further shows the interactions of each altered residue with their local amino acids residues: • Wildtype A159 interacted with S215, V216, Y234, I254, and I255 to maintain the β sheet structure, whereas the altered residue D159 altered the interaction with I195, V197, V216, Y234, I254, and I255, which destabilized the β strand and led to additional bend on residue 165-169; • Wildtype Q192 interacted with R174, R175, E180, and H214 to maintain the α-helix in the L2 loop, whereas the altered residue L192 altered the interaction to additional residues V172, V173, P190, H193, and D207 and destabilized the α-helix in the L2 loop; • Wildtype N210 interacted with F212 and D208 to maintain the bend of the β-sheet, whereas the altered residue K210 interacted with T211 causing destabilization of β-strand S6 and S7 through greater f lexibility, besides with F212 and D208; • Wildtype T231 interacted with V143, T230, and I232 to maintain a stable β-strand, whereas the altered residue N231 interacted additionally with D144, causing the β-sheet to bend away from the structure core.

Functional validation for the predicted deleterious variants
We used a p53-p21 promoter-GFP reporter assay to validate the predicted deleterious somatic missense variants on gene expression.In the assay, the expressed p53 in pLX313-TP53 bound to the p53 binding motif in p21 promoter to control the expression of GFP in pGL2-p21-promoter-GFP [25].The missense variant-containing pLX313-TP53 constructs were co-transfected with the pGL2-p21promoter-GFP in H1299 TP53 −/− cell line.By using the wildtype p53 as the control, the changed GFP level represented the effects of the altered p53 on gene expression regulation.We tested a total of 23 variants including three benign and two pathogenic controls and 18 RP-MDS predicted deleterious somatic missense variants.The results showed that 12 of the 18 (66.7%)predicted deleterious variants significantly reduced GFP expression, including the variants for S99F, Y107D, G117R, T125P, H214P, P219R, T253N, and I255S variants (P < .05),P98L, G262V, and P278R (P < .01),and C135W variant (P < .001)(Fig. 8).Of the six tested variants showing no significant changes, P153L and M169V were located in the p53 DBD region without defined secondary structure.

Discussion
Determination of the deleteriousness of somatic variation remains a challenge.Using the somatic missense variants in  TP53 as a model and the RP-MDS as the tool, our study shows that deleterious somatic missense variants can be identified by referring to their impact on protein structural stability.
The RP-MDS method uses protein structure as the reference to determine the function of missense variants.It avoids many uncertainties often present in other means.For example, the evidence listed by the CMC/COSMIC is largely originated from other in silico prediction methods, of which many often use evolution conservation-based sequence homology to determine the function of somatic variants.However, somatic variants are not heritable.As such, the same codon changes by the same somatic variants across different species would not be possible except these occurred by coincidence.In contrast, RP-MDS uses protein structure as the only reference.Protein structure is determined by intramolecular interactions of electrostatic, hydrophobic, hydrogen bonding, and Van der Waals interactions.During the RP-MDS process, MDS relaxes and equilibrates the dynamic conformational changes of protein structure by following the Newtonian equation of motions for a given simulation period [30,31]; RP then uses the MDS trajectories to precisely quantify the macroscopic properties of the protein structure by referring to the torsion angles of the N-C peptide bond restricted by sterically unfavourable structure conformation.Therefore, RP-MDS can precisely detect the effects of missense variants on protein structure [8,9,11,32].
In our previous studies, we compared extensively between RP-MDS and other in silico methods for variant classification (Table 1B in [6]; Table 1B in [9]; Table 2 and Fig. 5 in [11]).The results from these studies clearly show that RP-MDS is superior to each method compared in many ways.For example, in the TP53 germline missense variant study [9], we compared the classification of 340 germline missense VUS between RP-MDS and 10 in silico methods of Polyphen2_HDIV, Polyphen2_HVAR, SIFT, M-CAP, MutationTaster, LRT, PROVEAN, FATHMM, MetaSVM, and MetaLR.The results showed that RP-MDS had the lowest rate for the 'deleterious' classification and the highest rate for the 'undefined' classification among all methods tested (Table 1B in [9]), demonstrating that RP-MDS provides high specificity to classify germline missense variants.Low specificity is an inherited problem in current in silico prediction methods that they 'tend to have low specificity, resulting in over-prediction of missense changes as deleterious and are not as reliable at predicting missense variants with a milder effect' as indicated by the ACMG/AMG guidelines [3].Our current study indicates that the high specificity of RP-MDS demonstrated in germline variant analysis is also present in somatic variant analysis.This is due to the fact that the structural instability by the germline and somatic variants causing the same codon change is similar.
The cut-off to differentiate deleterious and non-deleterious variants is based on the statistical differences between the pathogenic variants and the benign variants [6].We used the 95% percentile, or 3.46, of the lower mean to compensate for possible statistical error.We postulated that although most of the pathogenic missense variants would have impact on protein structure, not all pathogenic missense variants will cause structural change.These could be ignored by RP-MDS.Therefore, the threshold can be arbitrary in affecting these slightly above or below the threshold.Therefore, the threshold for different studies may need to be adjusted to minimize its effects on certain variants.
Under the selection criteria, only 26 pathogenic and 18 benign missense variants and eight wildtype alleles were used for the training.Those might seem to be a small sample size.However, each variant had 333 conformation points and each conformation was saved every 30 ps over the 10 ns sampling period.The total combined points from the training process reached 17 316 [(26 pathogenic + 18 benign + 8 wildtype alleles) × 333].The rich information allows RP-MDS to distinguish the conformation between deleterious and non-deleterious structure.However, the information may not cover all possible deleterious conformation.When possible, increasing pathogenic and benign variants in the training model to provide more deleterious confirmations may provide better distinction between deleterious and non-deleterious structures.Further, the running time for MD simulation may also be a factor to consider.Different studies may use different simulation time; for example, 30 ns was used in a study [33].Our study used 40 ns.As long as computational power allows, longer time may provide better equilibrates and therefore improve the accuracy of detection.We also the following measures to ensure the accuracy of classification: (i) used known pathogenic variants as control to see if these can be predicted as deleterious; (ii) used known benign variants as control to see if these can be predicted as non-deleterious; and (iii) used GFP reporter gene assay to validate if the predicted deleterious variants can affect gene expression.
The deleterious variants classified by RP-MDS should cause significant change of TP53 structure, as defined by the classification criteria.However, in the validation test, 6 of the 18 tested deleterious variants did not show significant GFP signal reduction (Fig. 8).Possible explanations could be (i) The test was performed only in a single cell line, which may not ref lect the deleterious impact of the variants in other lineages due to the possible lineage specificity.(ii) The GFP expression relies on the interactions between the TP53-P21 promoter.The expression level of the altered p53 could be affected by the variants, causing lower presence of the altered p53.(iii) The transient co-transfection of altered p53 and p21 constructs may not allow synergistical expression of the proteins.The structure-function relationship is one of the essential issues in TP53 variation [ 12,13].Deleterious variants can disturb p53 structural stability at different degrees from low distortion to global unfolding [14].For example, R175 is located at the zinc-binding site and the DNA-binding interface, variants such as R175H can destabilize the structure, causing the loss of p53 transactivation and tumour-suppression function and the gain of oncogenic function under certain situations [34].In contrast, R175C causes no structural change and therefore retains the function of cell cycle regulation as the wildtype p53 [18].Another example is R249S located at the S4-S5 turn of p53.It distorts the alpha helix loop, leading to the disruption of the DNA binding and tumour-suppression function of p53 [19].
RP-MDS may also be applicable to classify the somatic variants in other somatically highly mutated tumor suppressors, oncogenes, and DDR genes (Table 1, Table S1).Taking BRCA1 as an example: germline pathogenic variation in BRCA1 is traditionally considered as the major contributor for BRCA1related oncogenesis.A total of 400 (44%) of the 916 somatic variants in BRCA1 were the same as germline variants (Table 1).The rich somatic variation data in BRCA1 suggest that somatic variation in BRCA1 may also play important roles in BRCA1related oncogenesis.Applying RP-MDS to determine functional significance of the somatic variants in BRCA1 may enhance our understanding of BRCA1-related oncogenesis.
While our study shows that somatic and germline missense variants can change protein structure, this similarity may not be interpreted automatically as they have the same function, even for the somatic and germline missense variants causing the same codon change, as the biology behind the germline variation and the somatic variation is very different: (1) Different mechanisms: a germline variant is inherited and is present in every cell of the carrier and has functional impact from fertilization across the entire lifetime, whereas a somatic variant occurs during life process of the carrier and is not inheritable; (2) Different diseases: the same germline and somatic variant can cause different diseases.For example, germline p53R175H is a genetic predisposition for Li-Fraumeni syndrome [34] but somatic p53-R175H is present in multiple cancer types including breast, lung, colorectal, and pancreatic cancers [35]; (3) Different timing of diseases caused: germline variation-related diseases usually occur at young age, whereas somatic variant-related diseases often occur at a later stage of life; (4) Different arising time: the germline variant in humans can be originated up to a thousand years ago [36], whereas the somatic variant is generated in a specific tissue/cell type during lifetime; and (5) Different heritage: a germline variant can be shared in In conclusion, our present study shows that deleterious somatic missense variants can be identified by referring to their impact on the stability of protein structure, and the RP-MDS method can be a powerful means to identify the deleterious somatic missense variants.

Figure 1 .
Figure 1. Outline of the RP-MDS process.RDP from benign variants and wildtype bases were combined to create the based file, and used to compare with the RDP from pathogenic variants and create the parameters to classify somatic missense variants into deleterious or nondeleterious categories.

Figure 2 .
Figure 2. Structural deviation distribution of benign and pathogenic variants.(A) Relative frequency of structural deviation between benign variants and pathogenic variants.The graph shows the collective structural deviation of benign and pathogenic variants, and log-normal distribution curves were fitted against the data.(B) Statistic data showing the distribution.Teal: benign variants; red: pathogenic variants.

Figure 3 .
Figure 3. Classification of 397 TP53 somatic missense variants.The lollipop graph shows the positions and frequencies of the RP-MDS-predicted deleterious and non-deleterious somatic missense variants in p53 DBD region.Green: non-deleterious variants; red: deleterious variants.

Figure 5 .
Figure 5. Ramachandran density plot of TP53 deleterious somatic missense variants.(A) RDPs for deleterious variants (p.A159D, p.Q192L, p.N210K, and p.T231N).The colours from blue to red represent low to high density.(B) The differences between the variants and the based files.The colours from blue to red represent from diminished to increased density.

Figure 6 .
Figure 6.Examples of the deleterious somatic missense variants on p53 structure.Left: wildtype residues; right: altered residues.The figure shows the altered interactions between the altered residues (p.A159D, p.Q192L, p.N210K, and p.T231N) and their neighbour amino acid residues.Red: amino acid residue changed by the variants; teal: interaction between residues.

Figure 8 .
Figure 8. Validation of the predicted deleterious somatic missense variants on gene expression regulation.(A) Location of the predicted deleterious variants along p53 DBD region.Red: significantly reduced GFP expression; green: nonsignificant changed GFP expression.(B) Comparison of GFP expression between benign, pathogenic, and predicted deleterious variants.The variants right of the red dot line had significantly reduced GFP expression.Blue: benign; red: pathogenic.

the
University of Macau (MYRG2022-00023-FHS) and the Faculty of Health Sciences of University of Macau (Startup fund, SRG2024-00014-FHS) to C.L.L. B.T. is the recipient of the University of Macau Postdoctoral Fellowship Class A of the Macao Talent Program and Postdoctoral Fellowship of Macau Science and Technology Development Fund (UMMTP-FDCT/0027/APD/2021).Conf lict of interest: None declared.

Table 1 .
Examples of DDR genes with highly shared germline and somatic missense variants.