A longitudinal plasma lipidomics dataset from children who developed islet autoimmunity and type 1 diabetes

Early prediction and prevention of type 1 diabetes (T1D) are currently unmet medical needs. Previous metabolomics studies suggest that children who develop T1D are characterised by a distinct metabolic profile already detectable during infancy, prior to the onset of islet autoimmunity. However, the specificity of persistent metabolic disturbances in relation T1D development has not yet been established. Here, we report a longitudinal plasma lipidomics dataset from (1) 40 children who progressed to T1D during follow-up, (2) 40 children who developed single islet autoantibody but did not develop T1D and (3) 40 matched controls (6 time points: 3, 6, 12, 18, 24 and 36 months of age). This dataset may help other researchers in studying age-dependent progression of islet autoimmunity and T1D as well as of the age-dependence of lipidomic profiles in general. Alternatively, this dataset could more broadly used for the development of methods for the analysis of longitudinal multivariate data.


Background & Summary
Type 1 diabetes (T1D) is a chronic, autoimmune disease caused by progressive loss of insulin-secreting capacity due to the selective death of beta cells in the islets of Langerhans, of which there are more than one million in the human pancreas 1 . The age of onset is usually between 5-15 years, but, in recent years, many children before 5 years of age are being affected 2 . Although approximately 80% of subjects with T1D carry defined risk-associated genotypes at the human leukocyte antigen (HLA) locus, only 3-7% of the carriers of such genetic risk markers go on to develop overt disease. Seroconversion to islet autoantibody positivity is the first detectable signal demonstrating initiation of autoimmunity and risk of progression towards diabetes 3,4 . However, whilst seroconversion to autoantibody positivity precedes clinical disease by months to years, the point at which seroconversion occurs may already be too late for therapeutic approaches aimed at preventing progression to overt diabetes.
Previous metabolomic studies suggest that children who develop islet autoimmunity and/or T1D are characterised by specific metabolic disturbances prior to the first appearance of islet autoantibodies [5][6][7] . Metabolic profiling may thus be of clinical relevance by providing a complimentary tool for estimating risk of progression to T1D. However, the underlying causes of these early metabolic disturbances and their link to disease progression are still largely unknown. It is also not yet established if the observed, persistent metabolic changes are specifically associated with the progression to T1D, or more broadly, if they are associated with progression to islet autoimmunity and irrespective of disease outcome.
Here we provide a longitudinal plasma lipidomics dataset obtained from children participating in a prospective, birth cohort Type 1 Diabetes Prediction and Prevention study (DIPP). Three study groups were examined: children who (1) progressed to T1D (PT1D), (2) developed at least a single islet autoantibody (Ab) during follow-up but did not progress to T1D (P1Ab), and (3) controls (CTR) who remained autoantibody negative and healthy during the follow-up until 15 years of age. We analysed 428 plasma samples from 120 children (40 PT1D, 40 P1Ab and 40 CTR). The samples were collected up to six different time points corresponding to the ages of 3, 6, 12, 18, 24, and 36 (or above) months (Fig. 1). These age groups were selected with the objective of understanding the changes in lipidomic profile preceding prior to overt T1D. We performed untargeted lipidomics using ultra-highperformance liquid chromatography combined with quadrupole time-of-flight mass spectrometry (UHPLC-QTOF-MS). Both raw and pre-processed datasets were deposited in the MetaboLights repository (Data Citation 1). Along with the aforementioned lipidomic data, we provide information on the type of islet autoantibodies observed in this longitudinal setting. In this study, sphingomyelins (SMs) were found to be persistently downregulated in PT1D as compared to the P1Ab and CTR groups 8 . Triacylglycerols (TGs) and phosphatidylcholines (PCs) were mainly downregulated in PT1D as compared to P1Ab at the age of 3 months. These results suggest that distinct lipidomic signatures characterise children who progressed to islet autoimmunity or clinical T1D. Lipidomic profiling may thus be helpful in the identification of at-risk children before the initiation of autoimmunity.
This data descriptor is one of the first longitudinal lipidomic datasets allowing the investigation of progression to islet autoimmunity/T1D during the early prodromal phases of disease development. Considering the longitudinal study design, this clinical dataset may have many uses. Firstly, it may assist other researchers in their studies of early pathogenesis of T1D and, potentially, other immune-mediated inflammatory diseases. It may also allow other researchers to study the age-dependent progression of lipidomic profiles during infancy. Finally, the dataset has great potential to be used in the development and testing of algorithms for the analysis of multivariate longitudinal/prospective data.

Methods
These methods are expanded versions of descriptions in our related work 8 .

Study design
The plasma samples for lipidomics analysis were obtained from the Finnish Type 1 Diabetes Prevention and Prediction (DIPP) study 9 . The DIPP study has screened more than 220,000 newborn infants for HLA-conferred susceptibility to T1D at three university hospitals in Finland: Turku, Tampere and Oulu 10 . Over 25,000 infants were identified as having an increased genetic risk and approximately 17,000 families joined the follow-up study, which involves regular study center visits (with an interval of 3-6 months). The Ethics and Research Committee of the participating Universities and Hospitals approved the study protocol. The study was conducted according to the guidelines in the Declaration of Helsinki. All families provided written informed consent for participation in the study. At every visit, blood samples were collected to measure the titre of T1D-associated islet autoantibodies. Non-fasting blood samples were collected into sodium citrate tubes. Plasma was separated within 30 min of collection by centrifugation at 1600g for 20 min at room temperature, aliquoted, and stored at −80°C until analysed. Now, 1,663 of those children (9.8%) have seroconverted to positivity for one autoantibody during the follow-up, 808 (4.8%) have developed multiple autoantibodies, and 510 (3.1%) have progressed to clinical T1D.
The 120 infants included in the current data descriptor were selected from a subset of DIPP children from the city of Tampere. Up to six longitudinal samples per child were collected between 1998 and 2012, corresponding to the ages of 3, 6, 12, 18, 24 and 36 months (Fig. 1, Table 1). The details of the ages and selected characteristics of the current study subjects are given in the metadata (Data Citation 1). The three www.nature.com/sdata/ SCIENTIFIC DATA | 5:180250 | DOI: 10.1038/sdata.2018.250 study groups were matched by HLA-associated diabetes risk, gender and the period of birth. In total, 428 plasma samples were selected and analysed for this study.

Detection of beta-cell autoimmunity
The participants with HLA-conferred genetic susceptibility were monitored for the appearance of T1Dassociated autoantibodies: islet cell antibodies (ICA), insulin autoantibodies (IAA), islet antigen 2 autoantibodies (IA-2A), and glutamic acid decarboxylase autoantibodies (GADA). These autoantibodies were measured in the Diabetes Research Laboratory (University of Oulu) from the plasma samples taken at each follow-up visit 13 . ICA were detected with the use of indirect immunofluorescence, whereas the other three autoantibodies were quantified with the use of specific radiobinding assays 14 . We used cutoff limits for positivity of 2.5 Juvenile Diabetes Foundation (JDF) units for ICA, 3.48 relative units (RU) for IAA, 5.36 RU for GADA, and 0.43 RU for IA-2A. The disease sensitivity and specificity of the assay for ICA were 100% and 98%, respectively, in the fourth round of the international workshops on standardisation of the ICA assay. According to the Diabetes Autoantibody standardisation Program   (DASP) and Islet Autoantibody standardisation Program (IASP) workshop results in 2010-2015, disease sensitivities for the IAA, GADA and IA-2A radio binding assays were 36-62%, 64-88% and 62-72%, respectively. The corresponding disease specificities were 94-98%, 94-99% and 93-100%, respectively. The metadata in the Metabolights contains information on the types of autoantibodies detected in the longitudinal plasma samples (Data Citation 1).
The following internal standards were purchased for quality control (QC) and calibration purposes: A total of 428 plasma samples were extracted in randomised order using a modified version of the Folch procedure: 10 μL of 0.9% NaCl, 40 μL of CHCl 3 :MeOH (2:1, v/v) and 80 μL of the 3.5 μg mL −1 working standards solution were added to 10 μL of each plasma sample. The samples were vortex mixed and incubated on ice for 30 min after which they were centrifuged (9400 × g, 3 min, 4°C). From the lower layer of each sample, 60 μL was transferred to a glass vial with an insert and 60 μL of CHCl 3 :MeOH (2:1, v/v) and added to each sample. The samples were re-randomised and stored at À 80°C until analysis on the UHPLC-QTOF-MS system.
The mass spectrometer coupled to the UHPLC was a 6550 iFunnel QTOF-MS from Agilent Technologies interfaced with a dual jet stream electrospray (dual ESI) ion source. Nitrogen generated by a nitrogen generator (PEAK Scientific, Scotland, UK) was used as the nebulising gas at a pressure of 21 psi, as the drying gas at a flow rate of 14 L min −1 (at 193°C) and as the sheath gas at a flow rate of 11 L min −1 (at 379°C). Pure nitrogen (6.0) from Praxair (Fredericia, Denmark) was used as the collision gas. The capillary voltage and the nozzle voltage were kept at 3643 V and 1500 V, respectively. The reference mass solution including ions at m/z 121.0509 and 922.0098 was prepared according to instructions by Agilent and was introduced to the mass spectrometer through the other nebuliser in the dual ESI ion source using a separate Agilent series 1290 isocratic pump at a constant flow rate of 4 mL min −1 (split to 1:100 before the nebuliser). The acquisition mass range was m/z 100-1700 and the instrument was run in extended dynamic range mode with an approximate resolution of 30,000 FWHM measured at m/z 1521.9715 (which is included in the tune mixture) during calibration of the instrument. MassHunter B.06.01 software (Agilent) was used for data acquisition.

Data pre-processing
The data pre-processing was performed using MZmine 2.18.2 15 . Here, we have adhered to the data processing steps as suggested by the metabolomics standards initiative 16 . The typical pre-processing workflow includes raw file import, filtering/smoothing, detection of peaks, peak list de-isotoping,  Table S1). The following steps were applied in the processing: (1) crop filtering with a m/z range of 350-1700 m/z and a retention time (RT) range of 2.5 to 21.0 min, (2) mass detection with a noise level of 750, (3) chromatogram builder with a minimum time span of 0.08 min, minimum height of 2250 and a m/z tolerance of 0.006 m/z or 10.0 ppm, (4) Chromatogram deconvolution using the local minimum search algorithm with a 70% chromatographic threshold, 0.05 min minimum RT range, 5% minimum relative height, 2250 minimum absolute height, a minimum ration of peak top/edge of 1 and a peak duration range of 0.08 to 5.0, (5) isotopic peak grouper with a m/z tolerance of 5.0 ppm, RT tolerance of 0.05 min, maximum charge of 2 and with the most intense isotope set as the representative isotope, (6) peak filter with minimum 12 data points, a FWHM between 0.0 and 0.2, tailing factor between 0.45 and 2.22 and asymmetry factor between 0.40 and 2.50, (7) peak list row filter keeping only peak with a minimum of 1 peak in a row, (8) join aligner with a m/z tolerance of 0.006 or 10.0 ppm and a weight for of 2, a RT tolerance of 0.1 min and a weight of 1 and with no requirement of charge state or ID and no comparison of isotope pattern, (9) peak list row filter with a minimum of 53 peak in a row (=10% of the samples), (10)

Code availability
All the pre-processing analyses were performed using the publicly-available software package MZmine 2.18.2 and with parameters as described in Supplementary Table S1.

Data records
The raw data files in .mzML format are deposited in the MetaboLights repository (Data Citation 1). Additionally, the deposited data contains the pre-processed lipidomic data as obtained from MZmine  Figure 2. The typical workflow for processing mass spectrometry data using MZmine 2. The workflow includes raw file import, filtering/smoothing, detection of peaks, peak list de-isotoping, alignment, gap filling, and integration of peaks, normalisation, and, finally, peak/feature identification. data processing, including the identified lipids. The associated data were captured using the ISA-creator package available from MetaboLights. This manuscript describes the samples, data collection, processing steps and overall study design.

Ethical approval and informed consent
The ethics and research committees of both the participating university and hospital at University of Tampere, Tampere Finland, approved the study protocol. The study was conducted according to the guidelines in the Declaration of Helsinki. All families provided written informed consent for participation in the study.

Technical validation
This study followed the best practices of analytical methodologies for the global profiling of lipids in the plasma sample as described by Hyotylainen et al. 17,18 . For quality monitoring, QC samples (pooled plasma samples), blank samples, pure standards (standards in solvent) were run at regular intervals after every 4-16 study samples. To assure data quality, relative standard deviations (%RSDs) were calculated for the retention times and the peak areas for each QC standard in the QC samples To assess experimental error, nine internal standards were spiked into the plasma samples during sample preparation. The plot of peak areas from nine QC standards (Fig. 3a) clearly shows that two samples had eccentric internal standard profiles. This was traced to an error during sample preparation, wherein the QC standard was added two these two samples twice. Therefore, these two outlier samples were not included in downstream analysis (and not included in the data descriptor). The representative quality plots after removal of aberrant samples (Fig. 3b) demonstrates that the peak areas of the QC standards have very low variability between the measured samples. This indicates that use of proper internal standards and careful follow-up of the response of these internal standards critically reduces the chances of unexpected variation, which may arise in LC-MS analysis. Based on this, we conclude that the samples in this data descriptor were successfully analysed following generally accepted analytical quality guidelines.
During LC-MS analyses, it is also highly important to keep track of blank samples inserted into the sample sequence. This is done to detect possible contamination/interference, which affect the actual sample data. We analysed 293 blank or solvent blank samples throughout this sample set. Solvent blank samples (consisting of a mixture of CHCl 3 : MeOH, 2:1, v/v) were analysed after every 4 th sample and blank samples (the same solvent mixture, which underwent the normal sample preparation procedure except for the addition of internal standards) after every 16 th sample. Additionally, a set of QC samples (186 samples) were analysed at the same frequency as the blank samples (i.e. after every 16 th sample). This set of QC samples consisted of: (1) extracted and non-extracted standard samples (the chosen QC standards in solvent), (2) a pooled plasma sample originating from a larger plasma pool from actual diabetes patients for following up on long-term method performance and to facilitate batch correction if needed, and (3) a Standard Reference Material 1950 sample from the National Institute of Standards and Technology (NIST) that serve as community-wide benchmarks for intra-and inter-laboratory QC and method validation 19 .
Next, for quantitative analysis, calibration curves of each analyte must be generated. For global lipidomics analyses this is, however, logistically and practically impossible due to the high number of analytes present in the biological sample matrix. To overcome this limitation, lipid-class-specific compounds were used for creating a semi-quantitative method including six different concentrations of the lipid class specific standards PC (16:0e/18:1(9Z)), TG (   successfully quantified within pre-defined limits. Achievement of this standard is evaluated statistically (global linear models) based on the value for the coefficient of determination (R 2 ). Figure 4 shows coefficients of determination higher than 0.9 for TG (17:0/17:0/17:0) and LPC(16:0) at six different concentrations using a global linear model with a 1/x weighting. The other standards show similar values (i.e. R 2 > 0.9). This statistical analysis consequently ensured the reliability of results presented in this data descriptor. Furthermore, the assurances provided by the QC methods herein derive from analytical reproducibility, within or between batches of the LC-MS data. These inter-and intra-batch variations can be assessed and corrected using the QC samples. The principal component analysis (PCA) score plot (Supplementary Figure S1) highlights the aforementioned reproducibility across all batches obtained from the pre-processed dataset in this data descriptor. From this score plot, it is apparent that there are no patterns or any clustering that correlated with the batch run during LC-MS. Together, the dataset in this data descriptor reflects robustness of lipid measurement over time, which, in turn, indicates that data analysis performed using this dataset will not be affected by systematic error arising due to instrumental factors.

Usage notes
The pre-processed data are available from MetaboLights (Data Citation 1). The detail parameters for data pre-processing are available in supplementary material (Supplementary Table S1). The preprocessed data were normalised using the nine different class-specific internal standards. In addition, the zero values in the lipidomics dataset were imputed with half of that row's minimum value for each lipid (the code for this imputation were written in-house using R, and are available on request). Due to the non-normal distribution of data, it is recommended that the data are log-transformed before analysis (log-transformation performed in MZmine 2.18.2). Other required guidelines for data-preprocessing are available from the MZmine website (http://mzmine.github.io/documentation.html) and authors can be contacted for further help if required.