Repairing the in situ hybridization missing data in the hippocampus region by using a 3D residual U-Net model

The hippocampus is a critical brain region. Transcriptome data provides valuable insights into the structure and function of the hippocampus at the gene level. However, transcriptome data is often incomplete. To address this issue, we use the convolutional neural network model to repair the missing voxels in the hippocampus region, based on Allen institute coronal slices in situ hybridization (ISH) dataset. Moreover, we analyze the gene expression correlation between coronal and sagittal dataset in the hippocampus region. The results demonstrated that the trend of gene expression correlation between the coronal and sagittal datasets remained consistent following the repair of missing data in the coronal ISH dataset. In the last, we use repaired ISH dataset to identify novel genes specific to hippocampal subregions. Our findings demonstrate the accuracy and effectiveness of using deep learning method to repair ISH missing data. After being repaired, ISH has the potential to improve our comprehension of the hippocampus's structure and function.

This supplement published with Optica Publishing Group on 1 May 2024 by The Authors under the terms of the Creative Commons Attribution 4.0 License in the format provided by the authors and unedited.Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.Supplement DOI: https://doi.org/10.6084/m9.figshare.25680018Parent Article DOI: https://doi.org/10.1364/BOE.522078 Step 1(Download data): Using the API provided by the Allen experimental platform and use the Linux wget command to batch download gene expression files with expression energy as the analysis indicator.The file name ends with .rawand .mhdas suffix.
Step 2(Downsample): Visualizing the gene expression file through the ITK-snap tool, with a resolution of 200 microns.The minimum dimension of the mouse brain reference template provided by Allen is 100 microns.In order to extract all voxel information of the whole brain and specific brain regions, the reference template needs to be downsampled.This part of the work is completed using the SimpleITK library in Python.
Step 3(Format conversion): Because the gene expression file and tissue template (file name ends with .nrrdas suffix) file formats are different, we convert the two files together into tif files.This is done using the SimpleITK, Libtiff and PIL libraries in the Python tool.
Step 4(Extract expression information): Using the tissue template as a mask to extract voxel three-dimensional coordinate information and expression energy values of the entire brain and specific brain regions, and generate a mat file.This is done using the MATLAB tool.
Step 5(Integrate expression information): Reading the mat file of the expression energy information of each gene and integrating it into the Allen mouse brain coronal and sagittal gene expression matrix.This is done using the R tool.The scripts and code used in this part of the work are available on GitHub(https://github.com/ZjjLab/Processing-and-analysis-of-Allen-mouse-brain-ISHdata/tree/main/3DResUNet_ISH/Get%20gene%20expression%20data/Script).

Data analysis processing
Step 1(Prepare dataset): First dividing the gene data with missing expression in the hippocampus, and mapping the deletion pattern (Raw) of each gene on a series of hippocampal coronal slices to the gene data with complete expression on the slice (Label), so that we can get the data Set Miss and store the three data together in HDF5 files for model training.This part of the work is done using R tools.
Step 2(Model training): Dividing the training set, verification set and test set, putting them into model training.After the model fitting is completed, predict the Miss data of the test set and obtain the Predict data.This part of the work is done using libraries such as PyTorch in Python tools.
Step 3(Model performance evaluation): To compare the repair performance of different models, we analyze the similarity between the predicted image data and the original fully expressed image data from two perspectives: the entire hippocampus and the coronal slice where the hippocampus is located.Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM) are used as image evaluation metric.From the analysis of coronal slices where the hippocampus is located, we count the missing hippocampal coronal slices in the Raw data corresponding to each test set gene, and extract all voxels of the corresponding slices in the Predict and Label data for analysis.
From the perspective of the entire hippocampus, we extracted all voxels of the hippocampus in the Predict and Label data for analysis.In order to prevent bias in a single random allocation of test set indicators, we randomly sampled the test set 100 times, calculated the average of the three indicators each time, and finally visualized the average indicator after 100 times.This part of the work is done using libraries such as Scikit-image, Scikit-learn, and Scipy in Python tools.*Raw: Lack of voxels expression gene dataset; Label: Full voxels expression gene dataset; Miss: missing voxels information in Raw mapping to the corresponding position in Label; Predict: Complete missing voxels during DNN.
Step 4(Correlation analysis of coronal and sagittal data): Because the overall expression distribution of the data is non-normal, we used nonparametric tests.Spearman correlation was used to evaluate the similarity changes between coronal hippocampal data and sagittal hippocampal data before and after repair under the same gene.First, because the sagittal ISH experiment only has left hemisphere data, the expression information of the corresponding voxels of the left hippocampus was extracted for the Raw and Predict data respectively.Secondly, because Raw is the data before repair and Sagittal is the sagittal hippocampus data, both of which have voxels with missing expressions.Therefore, to analyze the correlation between the coronal hippocampus and the sagittal hippocampus before and after model prediction, it is necessary to separately extract the correlation with the sagittal hippocampus.The voxel information co-expressed in the hippocampus was analyzed together.Finally, Spearman correlation indicators are used to analyze and visualize the processed data.This part of the work is done using R tools.*Sagittal: Left hippocampus in sagittal dataset.The scripts and code used in this part of the work are available on GitHub(https://github.com/ZjjLab/Processing-and-analysis-of-Allen-mouse-brain-ISHdata/tree/main/3DResUNet_ISH/Data%20analysis%20process/Script).