Longitudinal plasma protein profiling of newly diagnosed type 2 diabetes

Background Comprehensive proteomics profiling may offer new insights into the dysregulated metabolic milieu of type 2 diabetes, and in the future, serve as a useful tool for personalized medicine. This calls for a better understanding of circulating protein patterns at the early stage of type 2 diabetes as well as the dynamics of protein patterns during changes in metabolic status. Methods To elucidate the systemic alterations in early-stage diabetes and to investigate the effects on the proteome during metabolic improvement, we measured 974 circulating proteins in 52 newly diagnosed, treatment-naïve type 2 diabetes subjects at baseline and after 1 and 3 months of guideline-based diabetes treatment, while comparing their protein profiles to that of 94 subjects without diabetes. Findings Early stage type 2 diabetes was associated with distinct protein patterns, reflecting key metabolic syndrome features including insulin resistance, adiposity, hyperglycemia and liver steatosis. The protein profiles at baseline were attenuated during guideline-based diabetes treatment and several plasma proteins associated with metformin medication independently of metabolic variables, such as circulating EPCAM. Interpretation The results advance our knowledge about the biochemical manifestations of type 2 diabetes and suggest that comprehensive protein profiling may serve as a useful tool for metabolic phenotyping and for elucidating the biological effects of diabetes treatments. Funding This work was supported by the Swedish Heart and Lung Foundation, the Swedish Research Council, the Erling Persson Foundation, the Knut and Alice Wallenberg Foundation, and the Swedish state under the agreement between the Swedish government and the county councils (ALF-agreement).

T a g g e d E n d T a g g e d P data-driven discoveries that may offer new insights into the dysregulated metabolic milieu of diabetes [5,6]. There is also a potential for comprehensive protein profiling in personalized medicine, by detecting early signs of disease development and providing simultaneous information on multiple cardiometabolic health indicators in individual patients [7]. In addition, protein profiling during diabetes treatments such as diet, physical activity and pharmacotherapy could potentially help to broaden our understanding of the therapeutic mechanisms [8].T a g g e d E n d T a g g e d P While efforts have been made to study protein alterations in diabetes [5,6], little is known about proteomic alterations at the very onset of the disease and before any diabetes treatment has been initiated. Patients at this stage of the disease can only be reached using screening programs since they lack classic symptoms of diabetes. Furthermore, there is limited information about the relative importance of hyperglycemia versus other metabolic aberrations for the protein signatures in blood. Identifying the main cardiometabolic drivers of protein patterns in blood has implications not only for the basic understanding of diabetes, but also for the potential of protein profiling as a cardiometabolic health indicator in these patients. Therefore, proteomic profiling could be a future approach to monitor diabetes interventions and it is therefore of major interest to understand to what extent the protein signatures of diabetes subjects are sensitive to the clinical improvement that occur during diabetes treatment.T a g g e d E n d T a g g e d P Recently, we have conducted a large research program to analyze "wellness" in the general population involving the molecular phenotypes of a longitudinal cohort, the Swedish SciLifeLab SCAPIS Wellness Profiling (S3WP) program. This has led to several articles regarding "wellness", including Zhong et al., [9], Dodig-Crnkovic et al.
T a g g e d E n d T a g g e d P [10] and Tebani et al. [11]. Here, we have used the same approach to target type 2 diabetes and describe for the first time a comprehensive analysis of plasma protein profiles in newly diagnosed type 2 diabetes patients before and after diabetes treatment, while comparing their protein profiles to that on non-diabetes controls.T a g g e d E n d T a g g e d P Fifty-two individuals with previously undiagnosed and treatmentnaïve type 2 diabetes were identified from large population-based screening programs and selected for a longitudinal study. Plasma protein profiles, based on 974 unique proteins, were analyzed using targeted affinity proteomics. The protein profiles were analyzed at baseline and after one and three months of guideline-based diabetes treatment, and the protein profiles of the diabetes group were compared to that of 94 subjects without diabetes in the S3WP program. In this way, we were able to conduct a comprehensive protein profiling to unveil systemic alterations of early-stage of diabetes and to investigate effects on the proteome during glucose lowering treatment.T a g g e d E n d T a g g e d H 1 2. MethodsT a g g e d E n d T a g g e d H 2 2.1. Study design and subjectsT a g g e d E n d T a g g e d P The diabetes group consisted of 52 subjects, age 50À65 years, with no history of diabetes who were diagnosed during populationbased screening examinations at the Sahlgrenska University Hospital, Gothenburg, and consecutively invited to the current study. The diagnosis was based on fasting p-glucose and oral glucose tolerance tests (OGTT). Presence of diabetes was defined according the Swedish standard, corresponding to the American Diabetes Association standards [1]: A fasting p-glucose 7.0 mmol/L or an 2-hour OGTT p-glucose 11.1 mmol/L (12.2 mmol/L when measured capillary). Subjects who met diabetes criteria were scheduled for a second glucose measurement on a separate occasion and enrolled if diabetes diagnosis was confirmed. To identify latent autoimmune diabetes in adults (LADA), glutamic acid decarboxylase (GAD), tyrosine phosphatase IA-2 (IA-2) and zinc transporter 8 (ZnT8) antibodies were measured. Exclusion criteria were severe hyperglycemia requiring hospitalization or immediate insulin treatment, presence of any clinically significant disease which, in the opinion of the investigator, may interfere with the subjects ability to participate in the study, or any major surgical procedure or trauma within four weeks of the first study visit. The diabetes group was examined at baseline and after one and three months of guideline-based diabetes treatment according to first-line therapy with lifestyle change including weight management and physical activity, with or without metformin as judged by the treating physician. Of the 52 subjects, 51 (98%) completed the 3-month follow-up visit. The non-diabetes control group consisted of 94 participants that completed the second year in the longitudinal Swedish SciLifeLab SCAPIS Wellness Profiling (S3WP) program [9À11] and did not have diabetes as judged from repeated fasting glucose and HbA1c measurements as well as baseline OGTT. The control group were examined twice during the same time-period and at the same site as the diabetes subjects (2016À2018, Wallenberg Laboratory, Gothenburg), and the mean values from these examinations were used in the data analysis.T a g g e d E n d T a g g e d H 2 2.2. EthicsT a g g e d E n d T a g g e d P The study conforms to the ethical guidelines of the 1975 Declaration of Helsinki and was approved by the Ethical Review Board of Gothenburg,. All participants provided written informed consent.T a g g e d E n d T a g g e d H 2 2.3. Clinical dataT a g g e d E n d T a g g e d P All study visits were performed after an overnight fast of at least 8 h. Study assessments in both groups included anthropometry,

Research in Context
Evidence before this study Circulating protein signatures may provide important information about the molecular phenotype of diabetes and the cardiometabolic health state of individuals. Our knowledge about the protein alterations of diabetes have grown considerably over the recent years, however this knowledge is mainly based on cross-sectional data from patients with different durations of the disease and with ongoing diabetes treatment.

Added value of this study
This study adds to previous proteomic studies since it describes the protein signatures of newly diagnosed, treatment naive type 2 diabetes and investigates the relative importance of diabetesrelated metabolic features for these signatures. Most importantly, the study adds the longitudinal aspect by performing repeated plasma profiling during standard diabetes treatment so that the dynamics during metabolic improvement can be elucidated. The comprehensive protein analyses revealed previously unknown associations with diabetes, as well as confirming previously published associations, thus contributing to our knowledge about the biochemical manifestations of diabetes.

Implications of all the available evidence
A broad range of blood-borne proteins are altered in newly diagnosed type 2 diabetes and protein profiling show promising potential as a cardiometabolic health indicator. In addition, protein patterns are sensitive to changes in metabolic status as well as to metformin medication, indicating that protein profiling can help to elucidate the molecular effects of diabetes treatments.
T a g g e d E n d T a g g e d P blood pressure, clinical chemistry and life-style questionnaires. Weight was measured with participants in light clothing using calibrated scales, and the body mass index (BMI) was calculated by dividing the weight (kg) by the square of the height (m). Waist circumference was measured midway between the palpated iliac crest and the palpated lowest rib margin in the left and right midaxillary lines. Total body fat was measured using a bioelectrical impedance scale (Tanita MC780MA, Tanita Corporation, Tokyo, Japan) according to manufacturers instructions. Systolic and diastolic blood pressure (SBP, DBP) was registered in supine position and after 5 min of rest, using the automatic Omron P10. Clinical chemistry and hematology measurements included fasting glucose, hemoglobin A1c (HbA1c), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), triglycerides (TG), apolipoprotein A1 (ApoA1), apolipoprotein B (ApoB), creatinine, high sensitive C-reactive protein (CRP), alanine aminotransferase (ALAT), gamma glutamyl transferase (GGT), urate, cystatin C, N-terminal pro-brain natriuretic peptide (NT-proBNP), hemoglobin (Hb), white blood cell count (WBC), red blood cell count (RBC) and platelet count. Estimated glomerular filtration rate (eGFR) was calculated from age, gender, creatinine and cystatin C according to the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) 2012 formula [12]. Insulin and C-peptide was measured in the diabetes group and the homeostatic model assessment of insulin resistance (HOMA-IR) was calculated according to the formula: fasting insulin (mU/L) x fasting glucose (mmol/L) / 22.5 [13]. Baseline measurements of liver fat content and visceral adipose tissue area (VAT) were performed in the diabetes group using a dedicated dual-source CT scanner equipped with a Stellar Detector (Siemens, Somatom Definition Flash, Siemens Medical Solution, Forchheim, Germany) as previously described [14].T a g g e d E n d T a g g e d H 2 2.4. Plasma protein measurementsT a g g e d E n d T a g g e d P All plasma samples were collected after an overnight fast and at the same visit as the clinical examinations. For three subjects, plasma samples for protein measurements were not available from the 1month visit. Multiplex proximity extension assays (PEA, Olink T a g g e d E n d T a g g e d P Bioscience, Uppsala, Sweden) were used to measure the relative concentrations of plasma proteins. Each kit provides a microtiter plate for measuring 92 protein biomarkers in all prepared samples. Each well contains 96 pairs of DNA-labeled antibody probes. Samples were incubated in the presence of proximity antibody pairs tagged as previously described [15]. To minimize inter-and intra-run variation, samples from the diabetes and control group were mixed and randomized across plates. Both internal control (extension control) and inter-plate control were used for normalization and then transformed using a pre-determined correction factor. The pre-processed data were provided in the arbitrary unit Normalized Protein eXpression (NPX) on a log2 scale, where a high NPX value represents high protein concentration. The analyses were performed at SciLifeLab's Plasma Profiling facility on eleven Olink panels including Cardiometabolic, Cell Regulation, Cardiovascular II, Cardiovascular III, Development, Immune Response, Oncology II, Inflammation, Metabolism, Neurology, and Organ Damage. Quality control was performed at both sample and protein levels and resulted in using a total of 974 unique proteins in 340 samples.T a g g e d E n d T a g g e d P The validation of epithelial cell adhesion molecule (EPCAM) was analysed in EDTA plasma diluted 1:3 using human EPCAM ELISA Kit (ab155442, Abcam, Cambridge, GB). Samples below detection limit were considered as 50% of the sensitivity of the ELISA (22,5 pg/ml) for statistical analysis.T a g g e d E n d T a g g e d H 2 2.5. StatisticsT a g g e d E n d T a g g e d P R version 3.6.1 was used for all statistical analyses. Imputation of protein abundances were performed using the function rfImpute in the package randomForest [16]. A protein was not imputed but instead excluded from the analysis if more than 20 percent of the values were missing (which was the case for n = 1 protein). In total, less than 0.2% of the data was imputed. To study the overall protein profile of newly diagnosed type 2 diabetes, we applied linear discriminant analysis (LDA) on the proteomic dataset from the diabetes groups baseline visit and the control group to maximize the component axes for group separation. We subsequently applied the LDA to T a g g e d E n d Table 1 Comparison of clinical characteristics of diabetes group and control group. T a g g e d E n d T a g g e d P identify diabetes status from the proteomic data, and to test robustness of the LDA model we also used prediction models based on both random forest and support vector machine learning since these methods are well established and represent different approaches to T a g g e d E n d T a g g e d P prediction. Linear discriminant analysis (LDA) was performed using the package "MASS" in R [17], random forest prediction modeling using "randomForest" [16], and support vector machines using the "e1071" [18] with the default radial kernel and the default parameter T a g g e d E n d T a g g e d F i g u r e This plot shows the separation trend between the diabetes and control groups along the linear discriminant dimension 1 (LD1 group ) which is a one-dimensional representation of the proteomic group differences. (b) Three machine learning algorithms (LDA, Random forest and support vector machine) were benchmarked to evaluate the classification performance of the protein signature to discriminate diabetes subjects from controls. The algorithms were trained on half of the cohort and internally validated on half of the cohort. Classification accuracy is presented as area under the curve (AUC) of the receiver operating characteristics (ROC) from the validation sample. (c) LD1 group correlated against clinical variables within the diabetes group. The order of clinical variables is based on the strength of correlation, the highest correlation shown to the left. Note that a low liver attenuation value corresponds to high liver fat content. (d) Relative importance analysis of clinical variables reveal that HOMA-IR explains most of the independent variability in LD1 group , followed by HbA1c, ALAT, liver fat and waist circumference. Insulin, glucose, BMI/VAT and GGT were excluded from the analysis due to high covariance with HOMA-IR, HbA1c, waist and ALAT, respectively. (e) Variable importance for the random forest diabetes classification model reveals NOS3, HGF, PON3, IGSF3 and ADGRG1 as the strongest predictors of diabetes status.T a g g e d E n d T a g g e d E n d T a g g e d P settings. Training set (50%) and test set (50%) were utilized to evaluate the performance based on the area under the receiver operating characteristic curve (AUROC) of the machine learning algorithms. Relative importance analysis was performed using the package "relaimpo" with the method "lmg" [19], and variable importance ranks (Gini coefficients and accuracy decreases) using "randomFores-tExplainer" [20]. Correlation coefficients refer to Spearman's rank correlation coefficients, and p-values were calculated using the Mann-Whitney U test for non-paired samples or the Wilcoxon signed-rank test for paired samples. Effect sizes were calculated from the Z-scores and the sample sizes of the respective significance tests. Mixed-modeling was performed using the package "lme4" [21] with metformin dose and visit (which adjusts for other effects related to the intervention) as random effects and subject as fixed effect. The control group was only used to determine the diabetes-associated protein profile and all other analyses were performed within the diabetes group to minimize the risk of bias being carried over from the group-wise comparisons. To correct for multiple comparisons, p-values were adjusted to a false discovery rate (FDR) of 0.05, based on the total number of proteins studied (i.e. 974).T a g g e d E n d T a g g e d H 2 2.6. Role of funding sourceT a g g e d E n d T a g g e d P The funders did not have any role in study design, data collection, data analyses, interpretation, or writing of report.T a g g e d E n d T a g g e d H 1 3. ResultsT a g g e d E n d T a g g e d H 2 3.1. Clinical characteristics at baselineT a g g e d E n d T a g g e d P Screening for diabetes was done in ongoing population studies at the Sahlgrenska University Hospital. The frequency of newly T a g g e d E n d T a g g e d P diagnosed diabetes based on repeated fasting capillary plasma glucose measurements and 2-hour OGTT was 1.7% and 0.5%, respectively. Of the 52 subjects that were included in the diabetes group, 35 (67%) had fasting glucose 7.0 at two separate occasions before inclusion, and for the remaining 17 subjects the 2-hour OGTT was required for diagnosis on at least one occasion. In addition to having higher fasting glucose and HbA1c, the diabetes group differed from the control group regarding the classical features of the metabolic syndrome, i.e. the diabetes group was more obese, had higher blood pressure, higher serum triglycerides and lower serum HDL levels. Liver function tests and inflammatory markers were also increased (Table 1).T a g g e d E n d T a g g e d H 2 3.2. Protein profiles at baselineT a g g e d E n d T a g g e d P To investigate the protein signature of the diabetes group, LDA was applied to determine the first linear discriminant for group separation (LD1 group ). The LDA showed that the overall protein profile clearly differed between the diabetes groups baseline visit and the control group (Fig 1a). When LDA, random forest and support vector machine were used to build prediction models based on the overall proteome, all three methods were able to identify diabetes status from the protein signature. The AUROCs were 0.90 (CI: 0.83À0.97), 0.94 (CI: 0.90À0.99) and 0.92 (CI: 0.86À0.98) for LDA, random forest and support vector machine learning, respectively (Fig. 1b). Variability in LD1 group correlated with several features of the metabolic syndrome including insulin resistance (HOMA-IR), insulin homeostasis (insulin, C-peptide), NAFLD (Liver fat, GGT, ALAT), glucose control (fasting glucose, HbA1c) and obesity (waist circumference, BMI), high triglycerides, and low HDL-C (Fig 1c). In a relative importance analysis that included variables with the highest correlations with LD1 group , HOMA-IR remained most important for variance in LD1 group (Fig 1d). In total there were 44 out of 973 (4.5%) proteins that T a g g e d E n d T a g g e d F i g u r e T a g g e d E n d T a g g e d P contributed significantly to the prediction of diabetes (FDR-corrected p<0.05) in the random forest model, the most important being NOS3, HGF, PON3, IGSF3, and ADGRG1 (Fig 1e).T a g g e d E n d T a g g e d P To identify plasma proteins that are altered in early-stage diabetes, we compared the plasma levels of each protein in the diabetes and control group using Mann-Whitney U test and found that 293 (30%) of the 974 proteins differed significantly between groups (Fig. 2a). The three proteins with the lowest p-value in the group comparison were PON3, HGF and NOS3 (FDR-corrected p<10 À8 ). The top 30 most significant proteins from the group-wise comparison are listed in Supplemental Table S1 along with a brief summary of their implication in cardiometabolic disease, and all 293 significant proteins are listed in Supplemental Table S2. Correlations between the top 30 proteins and clinical variables are visualized in Fig. 2b. The highest correlations were found with measures related to NAFLD (e.g. r = 0.70 for HGF versus liver fat content and r = 0.71 for ERBB2 versus GGT) and there was also a pattern of several proteins correlating T a g g e d E n d T a g g e d P with measures of insulin homeostasis (e.g. r = 0.65 for IGSF3 versus insulin and r = 0.62 for ADGRG1 versus C-peptide). Correlations with measures of hyperglycemia, adiposity, lipids, blood pressure and inflammation were generally weaker, exceptions being FABP4 and ADM which correlated with body fat content (r = 0.75 and r = 0.69, respectively), and IL6 which correlated with CRP (r = 0.63) and WBC (r = 0.62). All top 30 proteins were associated with diabetes independently of age and sex when examined in linear regression models, and all but one (FABP4) of the associations were also independent of BMI (Supplemental Table S1).T a g g e d E n d T a g g e d H 2 3.3. Metabolic improvement during diabetes treatmentT a g g e d E n d T a g g e d P The diabetes subjects were followed over three months of guideline based diabetes treatment, and during this period the mean reduction in HbA1c was 5.2 mmol/mol and the mean weight loss 3.3 kg, corresponding to a mean BMI reduction of 1.1 kg/m 2 . Of the T a g g e d E n d T a g g e d F i g u r e HbA1c that explain the variation in the changing proteome during treatment (represented by change in LD1 treat ).T a g g e d E n d T a g g e d E n d T a g g e d P 51 subjects completing the study, 23 subjects had a HbA1c reduction of 5 mmol/mol or more and 22 subjects lost 3 kg or more in weight. The distributions of HbA1c and BMI-change are shown in Fig. 3a and 3b, respectively. There were also significant improvements in most of the other metabolic variables including blood pressure, serum lipids, liver function tests and CRP (Supplemental Table S3). At 3 months, 13 subjects had a low dose (0.5À1 g) of metformin and 29 had a high dose (1.5À2 g), whereas 9 subjects were not treated with metformin.T a g g e d E n d T a g g e d H 2 3.4. Changes in the plasma proteome during treatmentT a g g e d E n d T a g g e d P To test if the proteomic alterations of the diabetes group at baseline were attenuated during treatment, we used the previously determined first linear discriminant for group separation (LD1 group ) and analyzed how the diabetes subjects were distributed at the 3-month follow-up visit. The signatures of the diabetes group changed in the direction of the non-diabetic group during treatment and this shift in the distribution was significant at 3 months compared to baseline (paired t-test p = 0.0092) (Fig. 3c). To estimate the importance of improved glucose control, insulin sensitivity, weight loss and metformin medication for the proteomic variance during treatment, we determined the first linear discriminant of the comparison between baseline and 3 months of diabetes treatment (LD1 treat ) and performed a relative importance analysis (Fig. 3d). The results indicated that weight change and metformin medication had the largest importance, each explaining 14% of the proteome variance during treatment, whereas change in HbA1c and HOMA-IR only accounted for 4.8% and 1.1%, respectively.T a g g e d E n d T a g g e d P Changes during treatment in the top 30 diabetes-associated proteins are shown in Fig. 4. Five (17%) of these proteins reached statistical significance for change from baseline in a FDR-adjusted paired Wilcoxon signed rank test, all changing in the direction of the control group. To visualize changes during treatment at a proteome level, we compared the 3-month visit with baseline for all proteins and plotted the effect-sizes in relation to the group-wise comparisons (Fig. 5). There was an overall trend towards normalization of the initial protein alterations during treatment, although only 22 (7.5%) of the 293 diabetes-associated proteins reached statistical significance for change from baseline. Notably, some of the most significant changes during treatment occurred in proteins that did not differ between groups at baseline. Furthermore, GDF15 showed a unique pattern with elevated levels in the diabetes group at baseline which were further increased during treatment. A mixed model analysis revealed that the five proteins presenting the most pronounced changes during treatment were all independently associated with metformin medication, including EPCAM (p = 2.6 £ 10 À9 ), GDF15 (p = 5.6 £ 10 À8 ), REG4 (p = 6.4 £ 10 À5 ), PCDH17 (p = 2.6 £ 10 À4 ) and CPA2 (p = 6.7 £ 10 À4 ) (Fig 5 and Supplemental Table S4).T a g g e d E n d T a g g e d H 2 3.5. Validation of the metformin-EPCAM associationT a g g e d E n d T a g g e d P Due to the apparently large importance of metformin for the proteomic shift during treatment, we performed a validation study on EPCAM which was the protein that showed the strongest association with metformin in our data (Fig. 6a). From a separate population study we identified 22 metformin-treated subjects (mean § SD; age T a g g e d E n d T a g g e d F i g u r e Fig. 4. Radarplot showing levels of top 30 diabetes-associated proteins in the diabetes group at each timepoint and in the control group. The symbol * denotes FDR-corrected p<0.05 for change between baseline and 3 months in a paired Wilcoxon signed rank test.T a g g e d E n d T a g g e d E n d T a g g e d P 63.9 § 4.0 years, BMI 30.4 § 3.7 kg/m 2 , fasting glucose 6.9 § 0.9 mmol/L) and control group of 44 subjects without metformin, matched based on age, sex, BMI and fasting glucose concentrations (mean § SD; age 62.7 § 4.5 years, BMI 29.6 § 4.4 kg/m 2 , fasting glucose 6.8 § 0.9 mmol/L). The proportion of men was 59% in both groups. The validation study confirmed the hypothesis that metformin medication is associated with reduced plasma EPCAM levels (p-value=0.001 [MannÀWhitney U test, 1-sided], Fig. 6b).T a g g e d E n d T a g g e d H 1 4. DiscussionT a g g e d E n d T a g g e d P Results from the present study show that subjects with screening detected early type 2 diabetes, void of classic diabetes symptoms, display wide-ranging alterations in the plasma proteome as compared to non-diabetic controls, to the extent that diabetes status can be predicted with high accuracy from the protein signature. These findings support the notion that broad biochemical alterations are present already at the onset of type 2 diabetes and that protein profiling could deliver individualized health assessments of cardiometabolic diseases.T a g g e d E n d T a g g e d P Our study represents the most comprehensive PEA proteomics study of type 2 diabetes so far, measuring 974 unique proteins at multiple time points, revealing several plasma proteins not previously associated with diabetes. One example is ADGRG1 which is the most abundant G protein-coupled receptor in human pancreatic islets and plays an important role in pancreatic b-cell function [22], T a g g e d E n d T a g g e d P but has not previously been reported to be a circulating biomarker of diabetes. Another example is IGSF3 which is a little studied member of the immunoglobulin superfamily of proteins that appears to be completely unknown in the context of diabetes and cardiometabolic diseases. Although the mechanism that links IGSF3 to diabetes is unclear, there were several observations that makes IGSF3 an interesting candidate to study further, including strong correlations with insulin and liver fat, as well as the responsiveness to diabetes treatment. These examples, together with several other proteins that have not been described previously to associate with diabetes (e.g. HNMT, SIT1, RTN4R, CDCP1, SIGLEC10, IFNLR1 and VSIG4) expand the knowledge about the biochemical manifestations of type 2 diabetes and provides a resource for new candidate biomarkers in this disease area. This study also confirms several previously published associations between key proteins and prevalent diabetes and/or diabetes progression, including PON3, HGF, CTSD, IL1RA, SIGLEC7, LPL, IL6, FGF21, ERBB2, ALDH1A1, GAL4,ADM and FABP4 [5,23À32].T a g g e d E n d T a g g e d P Cardiovascular disease (CVD) is the primary cause of morbidity and mortality in people with diabetes, and already at this early stage of the disease the diabetes subjects displayed alterations in a range of CVD-associated proteins. Proteins implicated in the atherosclerotic process and suggested as potential blood-borne biomarkers of CVD include HGF, CTSD, IL6, TNFR1, IL1RA, FABP4, FGF21, and LPL [24À27,32À34]. Notably, the protein most predictive of diabetes status in the random forest model was NOS3 (also known as endothelial NOS) which is known to play a key role in CVD-protection via the T a g g e d E n d T a g g e d F i g u r e T a g g e d E n d T a g g e d P generation of the vasodilator nitric oxide in blood vessels [35]. To our knowledge, there are no previous studies showing that diabetes is associated with elevated circulating NOS3. Our findings encourage future studies that evaluate integrated proteomics approaches to improve CVD risk stratification in diabetes subjects.T a g g e d E n d T a g g e d P The proteomic variance that separated diabetes subjects from controls reflected key metabolic syndrome features including insulin resistance, hyperglycemia, fatty liver and adiposity. The importance of fatty liver was particularly striking among the top 30 diabetesassociated proteins (e.g. HGF, IGSF3, IL1RA, ALDH1A1, HNMT, ERBB2 and CDCP1). NAFLD is an important risk factor for liver disease as well as CVD [36], yet often goes undetected in the clinical routine due to the limited sensitivity of current liver function tests [37], and therefore improved biomarkers are needed. Our observation that NAFLD is a strong driver of plasma protein patterns is in line with two recent studies suggesting that protein profiling could potentially serve as a biomarker for NAFLD-screening [7,38].T a g g e d E n d T a g g e d P The large majority of plasma proteins are stable over time in humans while healthy, and deviations from the individual s trajectory could serve as a comprehensive indicator of changes in the health state [11]. A previous study in insulin resistant subjects indicated that the proteome is sensitive to periods of weight gain and loss [39], however plasma protein changes during improvements in glucose control are not well studied, nor is the potential influence of diabetes medication on the proteome. To better understand the dynamics of protein signatures in diabetes, it was therefore of key interest to investigate if metabolic improvement is reflected in the overall protein profile. Our data showed that during treatment, the overall proteome in the diabetes group shifted significantly towards the control group. This indicates that diabetes-associated protein patterns are responsive to treatment and hence might serve as a tool to elucidate the systemic effects of diabetes treatments in a broad and data-driven manner. This notion was further supported by our findings that metformin medication overshadowed the importance of glucose control for the overall proteome, and that the proteins that changed the most during treatment correlated strongly with metformin medication. This is interesting since the mechanisms by which metformin T a g g e d E n d T a g g e d P regulates blood glucose are only partly understood [40], and the data provided here might provide important clues to its pharmacological effects. Among the top metformin-associated proteins in our study was GDF15, which was initially discovered to be metformin-associated from screening 237 serum biomarkers [41], followed by further studies showing that GDF15 mediates the effect of metformin on body weight [42,43]. Our findings confirm that metformin treatment is associated with GDF15 levels, but also clarifies that diabetes is associated with increased GDF15 levels even in the absence of metformin. The protein most strongly associated with metformin in our data was EPCAM, a protein that is implicated in cancer pathophysiology and suggested as a circulating cancer biomarker [44]. This finding was intriguing in view of the growing body of evidence that metformin may prevent cancer [45] by mechanisms that are poorly understood. Some evidence that metformin reduces EPCAM expression exists from in vitro studies of cancer cells [46], but to our knowledge it has not previously been shown in humans that circulating EPCAM is associated with metformin. Whether EPCAM mediates any of the metabolic effects of metformin remains to be investigated.T a g g e d E n d T a g g e d P One limitation of this study is the relatively small sample size. Even so, the corrected p-values for the associations discussed here were very low and the validity of our group-wise comparison is supported by the large overlap with the results in a recent cross-sectional proteomics study that searched for diabetes-associated proteins based on three of the 11 PEA panels used in the present study [5]. Novel findings should be verified in other cohorts to test external validity, given that our study population consists of middleaged subjects of mainly European descent. Another limitation is that it is observational and hence cause-effect relationships cannot be inferred. The key strengths of this study are that it captures the very early phase of type 2 diabetes, that the group-wise comparisons are not confounded by diabetes treatment, the detailed phenotyping that enabled us to link protein signatures to various cardiometabolic features, and the longitudinal aspect showing the dynamics of protein signatures during treatment.T a g g e d E n d T a g g e d P In conclusion, a broad range of blood-borne proteins are altered already at the very early stage of screening-detected type 2 diabetes, T a g g e d E n d T a g g e d F i g u r e To validate the finding, plasma EPCAM was measured with ELISA in a separate cohort of 22 metformin-treated subjects and 44 non-treated controls that were matched according to age, sex, BMI and HbA1c. The metformin-treated subjects had significantly lower EPCAM levels than the controls (p-value=0.001 [Mann-Whitney U test, 1-sided], n = 66).T a g g e d E n d T a g g e d E n d T a g g e d P reflecting key metabolic syndrome features such as insulin resistance, fatty liver, hyperglycemia and adiposity. The comprehensive protein analyses revealed previously unknown associations with diabetes, as well as confirming previously published associations, thus contributing to our knowledge about the biochemical manifestations of diabetes. The overall proteomic alteration observed at baseline was significantly attenuated during metabolic improvement and appeared to be modified by metformin medication independent of metabolic effects. The results suggest that comprehensive protein profiling may serve as a useful tool for metabolic phenotyping and to elucidate the biological effects of diabetes treatment.T a g g e d E n d