A genome-wide association study to identify candidate genes for metabolic disorders in offspring by in vitro fertilization

Background: In vitro fertilization (IVF) processes increase offspring's short-term and long-term health risks, but their mechanisms remain unclear. Methods: We conducted a bibliometric analysis to determine the landscape of IVF offspring health. Subsequently, a bioinformatics method was utilized to identify the co-genes properties and biological function mechanisms of IVF and type 2 diabetes mellitus (T2DM). Finally, we predicted compounds against key targets and performed multiple validations of the mechanisms underlying IVF offspring health risks. Results: We identied 15 genes associated with T2DM, and their biological functions are primarily associated with lipid metabolism. We also identied the properties of co-genes, modied characteristics, identied 3 SNPs sites, and determined the three core genes, APOA1, APOB, and APOE, which were mainly correlated with metabolic and cardiovascular diseases. In addition, we predicted drugs that may improve metabolic abnormalities in IVF offspring. Conclusions: The impact of aberrant lipid metabolism in offspring after IVF therapy warrants additional investigation, particularly in terms of long-term health consequences and possible mechanisms. activated pathway from GSE122214 data series: maturity-onset diabetes of the young; glycine serine and threonine metabolism; fatty acid metabolism; peroxisome.


Introduction
Although the success of IVF holds a promise for infertile couples, whether a series of non-physiological operations affect the health of their offspring remains controversial. Assisted reproductive technology (ART), especially IVF, includes a series of non-physiological processes such as superovulation, embryo manipulation, in vitro culture, and embryo transfer before implantation. During this critical period of genetic recombination, gametes and embryos are exposed to altered hormonal environments, temperature changes, pH values, and oxygen tension [1] . According to Developmental Origin of Adult Health and Disease Hypothesis (DOHAD) [2] , early environmental changes affect adult health and develop phenotypic plasticity, and children born with ART have an increased risk of metabolic diseases. Although most children born with IVF are healthy, the safety of this technique requires additional validation, particularly regarding long-term health risks. Animal and follow-up studies of IVF offspring have an increased risk of tumors [3] , early-onset diabetes [4] cardiovascular disease [5] , dyslipidemia [6] , and long-term neurologic morbidity [7] .
Numerous studies have indicated that IVF offspring is strongly associated with low birth weight (LBW) [8,9] . Fresh embryo transfer after ovarian hyperstimulation is associated with LBW. Although frozen embryo transfer (FET) is expected to reduce the adverse consequences of maternal hormone levels, it has been reported to be linked to large gestational age (LGA) [10] . There is a robust and well-documented observational that abnormal birth weight (LBW and LGA) increases the risk of cardiovascular disease and T2DM in advanced life [11] . Birth weight is a critical measure for assessing neonatal health. Prenatal fetal growth restriction leads to "catch-up" growth in the early postnatal period, predisposing adults to metabolic abnormalities such as visceral obesity, insulin resistance, and glucose intolerance [12] . Since these changes are related to T2DM, this study focused on the risk associated with IVF embryo metabolism changes and T2DM.
Sun's team collected villi from patients with fresh IVF embryo transfer and those with natural conception for transcriptome analysis and found that the two groups had signi cantly different data in multiple metabolic pathways, including insulin resistance, fatty acid metabolism, steroid biosynthesis, and other T2DM-related pathways [13] . In addition, there is su cient evidence to prove that excessive oxidative stress during IVF treatment affects the long-term health risk of offspring, and oxidative stress has adverse effects on offspring on fatty acid oxidation and mitochondrial function [14] . Abnormal fatty acid oxidation is one of the factors for increased metabolic risk of offspring [15] . Since metabolic abnormalities early in the embryo lead to an increased risk of metabolic diseases in adulthood, we explored the association between IVF offspring differences in gene expression and T2DM.
In this study, we used data mining methods to compare IVF with villous transcriptomet data from early natural pregnancy to identify genes associated with generational diseases, identify genes associated with long-term effects of IVF treatment on offspring health, and explore the mechanisms by which IVF affects offspring health. It is necessary to understand the effect of IVF treatment on progeny metabolism and identify related biomarkers and mechanisms to reduce treatment-related risks.

Data mining
More attention has been paid to the safety of IVF treatment, especially its effect on offspring health. In this regard, we conducted a bibliometric analysis using VOSviewer tool to explore the current research landscape and identify relevant keywords on this issue. Furthermore, we employed the bioinformatics method to identify IVF differentially expressed genes and subsequently investigated their biofunction.
Bibliometric analysis. VOSviewer is a freely available programme created by Leiden University, displaying large bibliometric maps in an easy-to-interpret manner [16] . We constructed and reviewed bibliometric maps of IVF offspring health. All literature data were retrieved from Web of Science Core Collection from 1980 to 2021. The network map developed by VOSviewer eliminated various words that represented different keywords associated with offspring health.
Microarray data processing. IVF-related mRNA expression data were obtained from GEO database (https://www.ncbi.nlm.nih. gov/geo/). After a systematic review, the pro le GSE122214 was selected since it contained four IVF patients and four normal pregnancy volunteers. In the IVF group, twin-tosingleton selective fetal reduction was performed 30-35 days after embryo transfer, equivalent to 7-8 gestational weeks. Collected villi from double chorionic twin pregnancies during routine abortion are used as a control group. Genes associated with T2DM were obtained from Phenopedia database [17] (https://phgkb.cdc.gov/PHGKB/startPagePheno Pedia.action). GEO2R was utilized to identify differentially expressed genes (DEGs) between IVF and natural conception. The adjusted p < 0.05 and |log FC| ≥ 2 were set as cut-off criteria for DEGs screening in the two groups using Bioconductor (http://www.bioconductor.org/). Subsequently, T2DM-related genetic data were downloaded from the website [18] . All the selected genes were examined using Venn diagram web tool (https://bioinfogp.cnb.csic.es/tools/venny/index.html).

Properties of selected genes
We rst identi ed the expression of co-genes in IVF treatment and naturally conceived patients' villi and then analyzed these genes' characteristics and interaction networks in detail.

Network establishment and analysis
Network establishment and hub gene analysis. Protein-protein interactions (PPIs) of the studied genes were assessed using STRING v11 tool [30] and visualized by Cytoscape [31] . In addition, to further analyze PPI network, cytoHubba [32] and MCODE [33] applications were used to identify the key nodes, and the correlation between genes was explored through gene expression. We then did reverse validation by looking for genetic diseases. DISEASES is a web asset that integrates evidence on disease-gene associations derived from programmed text mining, physically curated literature, disease information, and genome-wide association researches [34] . The key gene-related diseases were obtained by DISEASES, and the gene-disease interaction network was constructed to reversely verify the enriched key targets.
The connection between hub genes and diabetes was performed using Attie Lab Diabetes database (http://diabetes.wisc.edu), an open-source database displaying gene expression pro les of different experimental groups (lean and obese BTBR mice at 4 and 10 weeks of age) in different tissues [35] . We contrast both lean and obese BTBR mice in different ages (i.e., at 4 and 10 weeks) to verify mRNA levels of hub genes in different tissues (signi cance was set at P < 0.05).
Identi cation of potential drugs. DrugBank is a bioinformatics and chemoinformatics database that combines detailed drug data with comprehensive drug target information provided by the University of Alberta [36] . We used DrugBank to explore molecular substances related to key targets to provide theoretical guidance for improving the health status of offspring.

Gene set enrichment analysis (GSEA)
GSEA is a free software package developed initially to discover changes in the metabolic pathways correlated with human diabetes. Apart from using an arbitrary cut-off in terms of fold-change or signi cant score, GSEA evaluates all genes in an experiment and determines signi cance by permuting the class labels to determine gene-gene correlations, resulting in a more realistic null model [37] . In addition, GSEA can be employed to identify the pathways correlated with gene expression. Quality sets were considered signi cantly enriched at prede ned p-qualities and FDR < 0.25. GESA programming analyzed and sorted genes according to the algorithm after importing gene annotation les, analyzed the positions of all genes, and accumulated them to obtain enrichment pathways.

Results
The current research landscape of IVF offspring health records that met the threshold. Finally, 53 keywords related to health problems of offspring after IVF implantation were chosen ( Figure 1C), which mainly includes keywords correlated with metabolism such as birth weight, BMI, diabetes, blood pressure, cardiovascular disease, long-term health, obesity, etc.

Identi cation of common genes between IVF and T2DM
The following transcriptome pro le datasets were downloaded from NCBI GEO database: GSE122214. After being analyzed, the villus transcriptome following IVF treatment was signi cantly different from natural conception. We successfully identi ed 1,806 DEGs, including 1,064 upregulated and 742 downregulated genes (Figure 2A). In addition, the potential targets associated with T2DM were retrieved from Phenopedia database. Further analysis of these DEGs using Venn diagram revealed 15 common genes as hub genes to identify IVF infants at risk of developing T2DM as adults, and the common gene heatmap is displayed by a heatmap ( Figure 2B).

Functional annotation for DEGs via DAVID and Metascape
To further explore the biological function of target genes, the online databases David was used to conduct functional analysis of the selected genes. Through DAVID analysis, the results of KEGG pathways indicated that the top canonical pathways associated with target genes included PPAR signaling pathway, HIF-1 signaling pathway, vitamin digestion, and absorption, as well as fat digestion and absorption ( Figure 3A). GO analysis revealed that variations in common genes linked to biological processes were mainly enriched in cholesterol metabolic process, positive regulation of PI3K signaling, blood pressure regulation, lipoprotein metabolic process, and so on ( Figure 3B).
Target genes linked to cellular components were signi cantly enriched in plasma membrane, extracellular exosome, very low-density lipoprotein particle, and chylomicron ( Figure 3C). Regarding molecular function, these genes were signi cantly enriched in cholesterol transporter activity, identical protein binding, phospholipid binding, and lipid binding ( Figure 3D). Furthermore, the functional enrichment analysis with Metascape revealed that target genes were signi cantly enriched in regulating lipid metabolic process, small molecular, metabolic process, oxidoreductase activity, and so on (P< 0.05, Figures 2E-G).

Properties of selected genes
Gene expression in the sample. A total of 15 common genes were differentially expressed in the villus of IVF and naturally conceived patients. Compared with the villus of naturally conceived patients, eight genes were upregulated, and seven genes were down-regulated in the villus of IVF patients ( Figure 4A).
Physiochemical properties of selected genes. The 15 common genes' physicochemical features are listed in Table 1. The table lists the gene name, gene ID, protein length, MW, PI, instability index, and predicted Nglycosylation site. The results illustrated that these selected genes are distributed on chromosomes 1, 2, 3, 6, 7,11, 12, and 19. The protein sequence length ranged from 167 to 4653 amino acids, with LEP having the shortest sequence at 167 amino acids and APOB having the longest sequence at 4653 amino acids. Besides, the molecular weight (MW) of selected genes ranged from 18.64 to 515.6 kDa. Furthermore, PI ranged from 5.56 to 8.93. Based on protein instability index, most studied proteins (10/15) were unstable.
Also, most proteins exhibited negative GRAVY, revealing that they possess hydrophilic properties.
Protein modi cation information. Posttranscriptional modi cations such as phosphorylation and glycosylation are involved in regulating protein stability and protein interactions. Therefore, the potential phosphorylation and glycosylation sites between IVF and T2DM were predicted into the amino acid sequence of selected genes (Table 1, Figure 4B). As predicted, APOB was identi ed as an Nhyperglycosylated protein that may be mostly glycosylated (Table 1). Also, the results demonstrated that APOB gene exhibits more phosphorylated sites ( Figure 4B).
Other characteristics. Three common SNP sites were associated with diabetes mellitus, including two SNP sites of ApoE: rs7412 and rs429358 ( Figure 4C). Next, we identi ed three diabetes-associated SNP sites and the conserved site for miRNA families. Concurrently, multiple miRNAs were identi ed to be involved in gene expression regulation, and green miRNAs, as central miRNAs, were involved in regulating multiple genes ( Figure 4D).

Network establishment and analysis
The protein-protein interaction (PPI) network involves most biological processes, such as DNA transcription and replication, protein transport, protein degradation, and cell cycle regulation. In this study, STRING v11 was employed to construct a PPI network ( Figure 5A), visualized using Cytoscape 3.5.1.
First, the Maximal Clique Centrality (MCC) of each node was calculated by CytoHubba, a plugin in Cytoscape. The genes with the top 6 MCC values were considered hub genes. Concurrently, another Cytoscape plugin Mcode analyzes the network and determines the central key nodes, resulting in seven hub genes. Subsequently, an online Metascape enrichment analysis was used to obtain a key target complex: APOA1, APOB, and APOE ( Figure 5B). Next, the gene correlation analysis was deployed to further investigate the expression relationship between genes-the stronger the gene correlation, the darker the color ( Figure 5C). Finally, the Venn diagram was employed to identify common genes; APOE and APOA1 were identi ed as two such genes ( Figure 5D). When the correlation between genes is considered, it can be concluded that APOA1, APOE, and APOB are strongly correlated and positively associated. As a result, we assumed that APOA1, APOE, and APOB were the key genes.
Gene complex-related diseases. Disease correlation analysis was conducted on the gene complex composed of APOA1, APOB, and APOE to nd the diseases related to the complex for reverse veri cation, and a gene-disease network was constructed ( Figure 5E). As the gure displays, diseases closely related to the three genes include metabolic diseases, cardio-cerebrovascular diseases, nervous system diseases, and immune-related diseases, among which metabolic diseases include obesity, diabetes, non-alcoholic fatty liver disease, and lipid metabolism disorders. Cardiovascular and cerebrovascular diseases include hypertension, heart disease, coronary artery disease, cerebrovascular disease, etc. Immune-related diseases mainly include Leukocyte chemotactic factor 2 Amyloidosis, and ApolipoProtein C-III is associated with Amyloidosis and other neurological diseases, including Alzheimer's disease.
Expression levels of hub genes in obesity. We applied Attie Lab Diabetes database to nd the correlation between hub genes and diabetes; BTBR mice become severely diabetic with obesity at 10 weeks of age. We discovered that APOA1, APOB, and APOE expressions were expressed differently in different tissues in 4-week and 10-week BTBR obese diabetic mice (Figure 6). In the 4-weeks BTBR obese diabetic mice's adipose, APOA1 was downregulated, and APOE was downregulated in adipose and islet. However, in the 10-weeks BTBR obese diabetic mice's liver, APOA1 was signi cantly downregulated, while APOB and APOE were signi cantly increased in islet tissue.
Identi cation of potential predictive drugs. From DrugBank, we obtained ten drug-mRNA interaction pairs between the three key targets. The basic information and structural formula of drug are presented (Table  2). There were nine drugs regulating lipid metabolism and one antioxidant drug. Rosuvastatin, pitavastatin, and gem brozil were approved for use, whereas gamolenic acid, lovastatin, and mipomersen were approved for use but require additional investigation.
GSEA revealed offspring's safety and potential health effects through IVF-ET.
Functional differences between the two groups were determined from a macroscopic genome-wide perspective, not just DEGs. The most signi cantly enriched gene sets correlated with natural subjects were maturity-onset diabetes in young, glycine serine and threonine metabolism, fatty acid metabolism, and peroxisomes (Figure 7).

Discussion
Since IVF affects the long-term health and safety of offspring, it has garnered increased attention in reproductive areas, indicating that it gradually became a research hotspot, even into the future. Based on bibliometric analysis results, we found that metabolism-related issues, including birth weight, are the focus of IVF offspring healthy. The question of whether IVF affects newborn birth weight remains uncertain. Some studies have indicated that IVF causes LBW [38] , some revealed that the IVF process does not affect fetal birth weight, and others indicated that frozen embryo transplantation leads to LGA fetus [39] . A multicenter cohort study identi ed signi cant differences in body weight among IVF centers with several confounding factors unrelated to the IVF process [40] . However, it is an indisputable fact that adverse intrauterine environments affect children's health, including metabolic diseases [41] . As a common cause of fetal growth restriction(FGR), LBW is a major public health problem having a higher risk of developing adipose tissue metabolic abnormalities, insulin resistance, and T2MD in adults compared to normal birth weight [42] .
Fatty acid metabolism is closely related to adult metabolic diseases; abnormal fatty acid metabolism leads to lipid deposition, obesity, hepatic insulin resistance, and glucose overproduction [43] . Our basic research has also demonstrated that excess palmitic acid (PA) enrichment in the decidua causes glutamine oxidation through TLR4/JNK/NF-kB pathway leading to decidual dysfunction and is associated with several adverse pregnancy outcomes, such as gestational diabetes mellitus, preeclampsia, and preterm delivery, and intrauterine growth restriction [44] . To avoid research bias, we need to study the impact of IVF from a global perspective, so we performed GSEA function analysis on all transcriptional genes from the villus of IVF and naturally conceived patients. Consistent with earlier ndings, the IVF group demonstrated signi cant variations in metabolism, particularly fatty acid metabolism. Obviously, abnormal fatty acid metabolism is an important factor in abnormal embryonic development and increased long-term health risk following IVF treatment.
We examined IVF-induced DEGS and T2DM-related genes and discovered 15 co-genes that may contribute to long-term metabolic risk. Then, we investigated their properties and functions to better understand the mechanisms underlying the increased risk of metabolic disease in IVF offspring. Most selected genes were anticipated unsteady proteins, demonstrating that stability of chemicals is low in elaborate cell responses, and most studied proteins have negative GRAVY value, indicating that they are more hydrophilic [45] . Furthermore, the functional analysis of 15 differentially expressed genes revealed that they were mainly concentrated in PPAR signaling pathway, HIF-1 signaling pathway, vitamin digestion and absorption, and fat digestion and absorption pathway.
Lipid metabolism plays an important role in pregnancy outcomes and affects fetal growth and development. PPAR signaling pathway mainly regulates lipid metabolism in vivo and anti-in ammatory pathway, participates in maternal and fetal metabolic disorders, intrauterine pro-in ammatory environment and developmental defects, and reduces in ammation and insulin resistance in liver, skeletal muscle, and adipose tissue [46] . In the rat model of gestational diabetes mellitus (GDM), PPARγ expression in fetal rats was signi cantly different by sex, with elevated levels in the liver of male fetuses but not in female fetuses, explaining why there is considerable sexual dimorphism in adult IVF phenotypes [47] .
The 15 genes are also associated with lipid metabolism and diabetes. Slc2a1 (GLUT1) is mainly expressed in rodents, and human placenta between maternal-fetal glucose transport and is critical in placenta glucose transport. Excessive and exogenous endoplasmic reticulum stress may lead to abnormal placenta function or partly result in GLUT and change of blood vessel-related gene expression, leading to LBW [48] . The analysis of physicochemical characteristics, protein modi cation effects, and SNPs of these genes revealed SNPs of NOS3 rs1799983, APOE rs7412, and rs429358, which were previously reported to be related to diabetes [49] . ApoE polymorphisms are associated with complex metabolic changes, including abnormalities in several novel biomarkers associated with elevated cardiac metabolism and the risk of all-cause mortality [50] .
After analyzing these protein interaction networks, we discovered a gene complex composed of APOA1, APOB, and APOE. All three proteins are apolipoproteins, which mainly carry lipid substances and the structure of stable lipoproteins and participate in lipid metabolism and transformation. APOA1 is related to embryo quality, development, and survival. Studies have indicated that APOA1 expression is signi cantly increased in chorionic villi, decidua, and serum of patients with IVF and early abortion, and Verma' results also proved that APOA1 expression in the villi of IVF patients was increased [51] .
Furthermore, APOE plays a major role in antiphospholipid functional attenuation (APL), inducing pregnancy complications [52] . Subsequently, the expression of the three gene-related diseases in different tissues of diabetic mice was analyzed to verify our conclusion. Finally, we predicted the potential compounds targeted by the three genes, including rosuvastatin, pitavastatin, gem brozil, and simvastatin drugs for clinical use, as well as gamolenic acid, lovastatin, and mipomersen compounds were approved for clinical use and investigational use.
These are clinical studies, and there are signi cant variances between clinical individuals and no shared mechanisms. However, animal experiments have established that IVF mice do exhibit metabolic and cardiovascular abnormalities [53] . To ensure offspring health, it is vital to investigate the in uence of IVF-ET on embryo metabolism and related mechanisms. ROS production in assisted reproduction technology not only leads IVF offspring to abnormal lipid metabolism but also plays an important role in the adverse health outcomes during later life. Our previous studies have implied that oxidative stress stimulation during IVF operation can lead to DNA damage in the fertilized egg, affecting offspring health [54,55] . Since fatty acids are important energy source for oocyte and embryo development, their abnormal metabolism can affect the normal development of embryo [56] . Oxidative stress leads to impaired embryo fatty acid oxidation (FAO) and increased lipid storage and impairs embryo fatty acid oxidation and mitochondrial function [57] . In addition, oocyte quality is a critical factor determining embryo quality, such as high lipid levels and ROS increase, resulting in poor oocyte and embryo development quality.

Conclusion
Academics believe that studying embryo metabolism is critical since we now understand that the metabolic mechanism is complex, multi-molecular, multi-target, and networked. Therefore, we studied physicochemical properties, modi cations, polymorphisms, and miRNA of 15 targets to determine the mechanisms underlying IVF treatment of increased risk of metabolic diseases in offspring. Indeed, embryo metabolism is a series of complex processes, which require additional research validation so that humans can have a deeper understanding of health problems associated with babies born through assisted reproduction and intervention means.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
If the manuscript is accepted, we approve it for publication in Diabetology & Metabolic Syndrome.
Availability of data and materials Not applicable.

Competing interests
The authors declare that they have no known competing nancial interests or personal relationships that could have appeared to in uence the work reported in this paper.

Funding
This study received funding from the following sources: the National Natural Science Foundation of China Tables   Tables 1-2       The key gene expression in the 4-weeks and 10-weeks BTBR obese diabetic mice different tissues (Liver, Islet, Adipose, and Soleus).

Figure 7
GSEA-based KEGG-enrichment plots of representative gene sets for activated pathway from GSE122214 data series: maturity-onset diabetes of the young; glycine serine and threonine metabolism; fatty acid metabolism; peroxisome.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Tables.docx