2.1 LUAD patient data collection
Download the gene expression profile and clinical data of lung adenocarcinoma (LUAD) dataset from TCGA database. The mRNA data were obtained from 594 samples, including 59 normal lung tissue and 535 LUAD tissue samples. Clinical data included 522 patients with LUAD. The Genomic Data Commons (GDC) data transfer tool was used to aggregate the mRNA expression data into an expression matrix, and the integration ID was converted into a gene symbol based on the annotation filling. The GEO database (http://www.ncbi.nlm.nih.gov/geo/) was systematically searched with the following keywords: "Lung adenocarcinoma" and "survival". Nine chips (GSE32863, GSE31210, GSE7670, GSE10072, GSE8894, GSE11969, GSE14814, GSE41271 and GSE42127) were selected for the final analysis. The original (.cel) and platform (GPL) files were downloaded. All matrix data were background-corrected, normalized, and log2-converted. The missing value replenishment was performed using the "affy" and "impute" packages in R software (version 4.0.3). LIPG gene expression data were extracted using “limma” package in R. In addition, the TCGA-LUAD dataset was analyzed using GEPIA website(http://gepia2.cancer-pku.cn/#index)16, and several GEO data sets (GSE50081, GSE30219, GSE31210, GSE37745) were analyzed using Kaplan-Meier (K-M) plotter (https://kmplot.com/analysis/)17websites. Detailed inclusion information of GEO datasets is shown in Table 1.
At the same time, patients with LUAD who were pathologically diagnosed and surgically resectable in our hospital from July 2004 to June 2009 were selected as the clinical subjects of present study. Inclusion criteria: (1) All patients were pathologically diagnosed as lung adenocarcinoma without prior or co-existing cancer; (2) All patients were diagnosed for the first time and had not received previous anti-cancer treatment such as radiotherapy and chemotherapy; (3) All patients received thoracoscopic radical resection of lung cancer by the same group of physicians; (4) The patient data included age, sex, clinical stage, lymph node metastasis and distant metastasis; (5) 18-85 years old; (6) The expected survival time is more than 3 months; (7) The patient is conscious and willing to accept the examination with good compliance; (8) ECOG 0-3. Exclusion criteria: (1) Cases with incomplete medical history; (2) Patients with other primary malignant tumors; (3) Persons under the age of 18 or over 85; (4) Patients with concomitant diseases associated with elevated lipid levels (such as diabetes, hyperlipidemia or metabolic syndrome); (5) Receiving hormone replacement therapy or drugs known to affect lipid metabolism; (6) Patients have poor compliance and explicitly refuse to follow visitors. Finally, a total of 142 patients with NSCLC were enrolled, including 75 males and 67 females. The age range is 20 to 84 years.
2.2 Immunohistochemistry (IHC)
Immunohistochemical staining was performed on LUAD tissues and corresponding non-cancer tissues. Paraffin-embedded tissue sections were dewaxed in xylene and dehydrated with gradient alcohol. The slices were placed in 0.3% hydrogen peroxide freshly prepared with methanol for 15 min at room temperature to inactivate endogenous peroxidase. The sections were boiled in 0.01 mol/L citronic acid buffer (pH 6.0; 95℃, 15–20 min) and cooled for antigen repair. Non-specific binding was blocked with 10% normal goat serum blocking solution at 37℃ for 30 min. The solution was then incubated with diluted primary antibody at 4℃ overnight, followed by incubation with biotin-labeled secondary antibody at 37℃ for 30 min. Streptomyces antimicrobial tin-peroxidase complex working solution was then added and the sections were incubated at 37℃ for 30 min. Animal serum from the same species as primary antibody or 1×phosphate-buffered saline instead of primary antibody was used as a negative control. Tissue sections with known high expression of LIPG were used as positive controls. Diaminobenzidine was used to observe antibody binding. Staining was independently scored by two observers who were unaware of the clinical data. Depending on the dyeing strength, the staining intensity of 0 was labeled as -, 0.5 was labeled as +/-, 1 was labeled as +, 2 was labeled as ++, and 3 was labeled as +++. The staining intensity of 0 and 0.5 was defined as low expression, while 1, 2, and 3 were defined as high expression.
2.3 Expression of LIPG gene
LIPG gene expression levels in LUAD and normal lung tissues were compared in the TCGA-LUAD and four chips (GSE10072, GSE32863, GSE31210, and GSE7670) from the GEO database, and clinical specimens were analyzed for further verification. In addition, immunohistochemical images for LUAD patients and normal lung tissue from the HPA database (https://www.proteinatlas.org/) were downloaded and the differences between them observed. The "limma" and “beewarm” package in R was used for analysis. Wilcoxon test was used to compare the differences between the two groups.
2.4 Survival and clinical characteristics analysis in LUAD public database
The online analysis website GEPIA2 and K-M plotter were used to comprehensively analyze the relationship between LIPG expression and prognosis of LUAD patients in public databases, and supplement GEO data sets not mentioned in the above website. Data sets with fewer than 50 cases were excluded. Finally, TCGA-LUAD database and nine datasets (GSE50081, GSE30219, GSE31210, GSE37745, GSE8894, GSE11969, GSE14814, GSE41271 and GSE42127) in GEO database were selected and included in present study.
In addition, clinical data of LUAD patients were used for further validation. The clinical data of LUAD were used to analyze the correlation between the high and low expression of LIPG and the clinical characteristics of LUAD patients, including age, sex, pathological grade, tumor node metastasis classification (TNM) stage, and distant metastasis.
2.5 Analyzing genes co-expressed with LIPG
Bilateral Pearson correlation coefficient (r value) and Z-test were used to investigate the correlation between LIPG and other gene expression levels in TCGA-LUAD. Genes positively or negatively associated with LIPG were considered to be LIPG-related genes, also known as co-expressed genes (|r|>0.4, p<0.001).
2.6 Gene oncology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Set Enrichment Analysis (GSEA) enrichment analyses
LIPG and its co-expressed genes were enriched using GO and KEGG analyses. GO enrichment analysis included three aspects: biological process (BP), cellular component (CC), and molecular function (MF). BP is typically an ordered biological process with multiple steps. CC is used to describe the location of gene products in a cell. MF refers to the function of gene products. The “clusterProfiler” package in R was used for enrichment analysis. GSEA involves a calculation to identify consistency differences between two biological states and to determine the existence of statistical significance in a predefined set of genes. GSEA4.1.0 was used for GSEA, and c2.cp.kegg.v6.2 was selected as the gene setting database. False discovery rate was considered to be significantly enriched.
2.7 Evaluating the relationship between LIPG and immune cell infiltration (ICI) in microenvironment
CIBERSORT (http://cibersort.stanford.edu/) is a deconvolution algorithm based on gene expression that can evaluate a set of genes relative to all other gene expression changes in a sample. CIBESORT algorithm was used to predict the proportion of different ICIs in LUAD samples from the TCGA database, the difference in the number of ICIs in groups with high and low LIPG expression, and the correlation between LIPG expression and the number of ICIs.
2.8 Statistical analysis
The data were analyzed using R 4.0.3, where p<0.05 was considered statistically significant. Wilcoxon test was used to compare variables between two groups, and Kruskal-Wallis test was used to compare variables between multiple groups.