Superpixel Segmentation-Enabled Transmission Electron Microscopy Images for Rapid and Accurate Detection of Coronavirus

Worldwide, SARS-CoV-2 has been responsible for millions of fatalities and extensive disability. Hence, to stop the spread of novel viruses like SARS-CoV-2, Omicron, and other worrying types, rapid and accurate diagnostic techniques are needed to identify symptomatic and asymptomatic carriers as soon as feasible. Early recognition and diagnosis are essential to effective epidemic management. However, different viral strains’ shapes and spatial characteristics are similar, complicating image classification, especially in medical virology. This study uses a super-pixels segmentation technique based on transmission electron microscopy (TEM) images to differentiate SARS-CoV-2 from SARS-CoV. This paper aims to develop a method that enables virologists to detect and diagnose viral infections more accurately. In results, SARS-CoV-2 had a median area of 25,145.54 pixels and SARS-CoV of 38,591.35 pixels. The model can help to better understand how viruses develop, spread, diagnose and contain outbreaks. Furthermore, an exceptionally low root mean square error (RMSE) of 0.0275 between the segmentation of the viral area between humans and machines is obtained. Indeed, this low error rate indicates the accuracy of this automated measurement technique. Finally, the developed superpixel segmentation technique provides quick and reliable identification of coronaviruses, promising to significantly contribute to medical virology and help manage epidemics by simplifying prompt viral diagnosis.


INTRODUCTION
Many people have lost their jobs and businesses have had to cut operations due to the ongoing global Coronavirus Disease 2019  outbreak.According to data, 6,911,896 million people have died from contracting the virus and more than 694 million have been infected.The rapid spread of the virus negatively impacts public spaces, markets, schools, and airports, despite efforts to stop its spread.The rapid and widespread reach of the virus has highlighted the need for a more efficient environmental detection system to control its further spread and improve the health care of sufferers (Ai et al. 2020;Engineering 2021;Rahman et al. 2023;Bakr Ahmed Taha et al. 2020;Bakr Ahmed Taha, Al Mashhadany, et al. 2021;Bakr Ahmed Taha, Al-Jubouri, et al. 2022;Bakr Ahmed Taha, Ali, et al. 2021;Bakr Ahmed Taha, Mehde, et al. 2022) Artificial intelligence (AI) encompasses methods that enable machines to imitate or exceed human intelligence, particularly in cognitive functions.The main branches of AI include machine learning, computer vision, and natural language processing (Joonas 2020;Mamat et al. 2023;Manea et al. 2023;Robertson et al. 2018;Bakr Ahmed Taha, Al Mashhadany, Al-Jubouri, Haider, et al. 2023;Bakr Ahmed Taha, Al Mashhadany, Al-Jubouri, Rashid, et al. 2023a).
The machine learning techniques have frequently been employed to segment microorganisms present in the environment.Machine learning can be categorized into traditional methods and artificial neural networks.Previously, algorithms such as Support Vector Machine (SVM), K-Nearest Neighbour (KNN), Randon Forest (RF), and others have been utilized to detection of viruses.
AI can potentially expedite even the most timeconsuming tasks, making it particularly valuable in microorganism image analysis (Abd Alkarim et al. 2023;Li et al. 2015;Quality 2023;Rashid et al. 2023).
Researchers recommend using various technologies for virus identification, such as intelligent computer systems that implement deep learning algorithms to classify and organize microscopic images related to medical and food components (Fang et al. 2020;Kang et al. 2023;Kohler & Farr 1966;Rivenson et al. 2019;Schwartz 2020).Radiographs and Computed Tomography (CT) scans have been lauded as useful diagnostic tools in recent research on the diagnosis of respiratory disorders such as COVID-19.These methods are particularly helpful in determining the severity of COVID-19 infections, keeping tabs on the most seriously ill patients, and forecasting the course of the disease.
However, in such emergencies, relying solely on conventional manual diagnostic procedures is often impractical.One such answer is provided by computeraided detection systems that use deep learning algorithms to make more accurate diagnoses in less time (Maghdid et al. 2021;Sahiner et al. 2019;Shi et al. 2021;Bakr A. Taha 2021).
Transmission Electron Microscopy (TEM) imagery is essential for studying significant viral replication and discovering new infections to create efficient plans for illness prevention, accurate diagnosis, control of viral outbreaks, and understanding the biology of pathogens and the causes of viral diseases (Richert-Pöggeler et al. 2019).Researchers have been trying to decipher viral architecture since the late 19th century, when it became clear that viruses played a significant role in disease transmission.The TEM image resolution is well-suited for nanoscale research.Therefore, it has the potential to give direct images of viruses for diagnostic and investigative reasons (Athirah et al. 2023;Mettenleiter 2017).
Although studies show that Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is 80% similar to Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV), the two envelope viruses differ in the spike area's size, length, and density.These structural differences significantly affect the way these viruses are identified and diagnosed (Z.Zhu et al. 2020).In addition, coronavirus structures and morphologies observed in Vero cell cultures were determined using conventional thin-section TEMs with particle sizes of a median of 100 nm (Laue et al. 2021).Zhao et al. used TEM images to detect any differences in cell shape caused by SARS-CoV-2 to understand how the virus causes disease (Zhao et al. 2020).
A scarcity of literature is present on the topic of automated virus recognition in TEM images, with a disproportionate number of contributions coming from a limited group of researchers (Gustaf Kylberg et al. 2011;Matuszewski & Sintorn 2018;Roingeard et al. 2019).The researchers presented an automatic segmentation method to detect different types of virus particles in TEM images.The process consists of analyzing the local neighborhood of all pixels in the image, a broader object distinction, and a border correction phase for extended objects (G.Kylberg et al. 2012).
Image labels have been studied at the pixel level; however, deep learning methods have shown poor performance in the separation of visual objects.In particular, deep learning methods do not explicitly consider the possibility that neighboring pixels belong to the same category of objects in images (De Geus et al. 2019;Jha et al. 2020;Siam et al. 2018).Tracking Coronavirus particles in TEM images and calculating missed detection probabilities with deep recurrent neural networks Multiple recognize were combined to generate assignment probabilities.Moreover, the network calculated the likelihood that tracks would exist if created and destroyed (Spilger et al. 2020;Bakr Ahmed Taha, Al Mashhadany, Al-Jubouri, Rashid, et al. 2023b;Bakr Ahmed Taha, Al-Jubouri, et al. 2023;Bakr Ahmed Taha, Mashhadany, et al. 2022).

MORPHOLOGY MEASUREMENT
A morphology measurement method assigns numerical values to the geometric properties of an object in image processing and computer vision.The process involves assessing the spatial distribution of things and their relative sizes and shapes (Al-Kinani et al. 2020),(Yousif et al. 2012).It can identify these patterns from photos, and data can be extracted using a categorized approach.Alternatively, it can count objects, gauge their sizes, and identify their shapes from images.As part of this study, the threshold image's brightness and contrast were altered to reveal the details of the envelope and spike proteins.SARS-CoV-2 has a diameter of between 60 and 140 nm, making it spherical and giving it the appearance of a solar corona when viewed under an TEM (Ogando et al. 2020;Ou et al. 2020;N. Zhu et al. 2020).In many studies, particle sizes use the geometric selection tools in (Image J software) such as freehand and segmented lines, to determine the outer viral envelope and length of spikes (Abràmoff et al. 2004;Laue et al. 2021;Yao et al. 2020).
With area selections, it becomes feasible to calculate the (max/min) diameters for virus features such as roundness, shape descriptors, diameter, area, and circularity.Subsequently, a stepwise extension in nanometers of the geometrical selection employed to identify the maximal diameter of each virus particle is employed to identify spikes associated with that particle.The accuracy of the technique was then verified using the adjusted scale bar tool to verify the spike length and diameters of various particles from various images, which was necessary to establish the average envelope diameters for both viruses.Moreover, the diameters of the virus particles were measured using a cross form since fixation artifacts in clinical specimens caused their noncircular shapes.Images of virus particles derived from cell cultures successfully mitigated the morphological variability resulting from fixation artifacts (Aggarwal et al. 2012).
Virus particle areas were determined using a free-form selection tool in the software.It is possible to decide on the location, line length, angle, and point values using this method based on the type of selection made (T.Ferreira 2012).Perform the analysis of particle areas by selecting a specific region and scanning the section until the outline of a virus shape is detected and repeat the process until the boundary of the selection allows for the extraction of the characteristics of the virus particles in a precise manner.
Figure 1 shows Coronavirus particle morphology based on TEM images.In Figure 1(A), the SARS-CoV virus can be seen, while in Figure 1(B), the SARS-CoV-2 virus can be seen with its scale bar adjusted to 100 nanometers.A consistent scale bar allows accurate measurement of virus particle sizes and interpretation of the results based on a precise comparison between the two images.

MATERIALS AND METHODS
Super-pixels are fundamental to identifying and acquiring visual features representing objects in future research activities that must meet the following characteristics: First, the process must be fast enough to be used in near-real-time scenarios.Second, clusters should display a low level of intra-cluster variance while displaying a high level of intercluster variance.Third, Groups should be positioned consistently within the imaging area.Finally, segmented super-pixel boundaries are defined accurately by matching appropriately sized borders and discrete pixels.Figure 2 demonstrates the automated extraction process of area features for SARS-CoV-2 and SARS-CoV using the superpixel algorithm techniques.Further, emphasis is placed on preserving boundary information and obtaining localized data, which is essential for robust feature extraction and quantitative analysis in this work.Superpixel techniques are widely used in computer vision applications.They use set criteria to process all pixels randomly, resulting in unnecessary superpixel boundaries and regularity (Yuan et al. 2021).SLIC is better than the pixel-level approach.Due to this, the SLIC algorithm has been recently used to segment medical images (Cong et al. 2014).Image segmentation can be considered a set of boundaries or segments that span an entire image.The colour, brightness, and texture can be the same for the majority of the segmented pixels but differ dramatically between adjacent segments.Segment borders and edges tend to be closely connected due to the considerable brightness shifts at the section boundaries.Therefore, a different segmentation technique has been developed based on edge extraction methods using k-means clustering and SLIC segmentation to group nearby pixels.Initially, the desired number of superpixels, is chosen, with the aim of creating approximately equally sized superpixels.In an image with pixels, the approximate size of each superpixel is pixels.To create superpixels of approximately equal size, a superpixel centre is chosen at every grid interval as in the equation .In the k-means algorithm, clusters are divided based on the proximity of data points to the cluster centroids (C k ).Data points are assigned to the cluster whose centroid is closest to them in terms of Euclidean distance.This assignment is done iteratively until convergence, with data points potentially shifting between clusters to find the most appropriate grouping.At the beginning of the process, superpixel cluster centres are selected as in the equation with at regular grid intervals .Because the spatial span of each superpixel is approximately , it is assumed that pixels associated with a particular cluster centre are located on the plane within a radius around the superpixel centre.In contrast, there is the radius around the superpixel centre.This represents the search region for pixels that are closest to each cluster centre, and it is defined in the CIELAB colour space as well as the pixel coordinates.The SLIC algorithm uses the CIELAB colour space because it is perceptually homogeneous for tiny colour differences.The distance between two pixels and in SLIC is a composite of two distances, and , which express colour and spatial closeness, respectively, as shown below in Eq. 1, Eq. 2, and Eq. 3 (Achanta et al. 2010) (1) (2) is denote the sum of the distances in both the and planes, which are then normalized by the grid spacing ( ).The density of a superpixel ( ) can be controlled by incorporating a variable with the set higher emphasis on spatial proximity, thereby influencing the density of the cluster and giving added importance to the pixel pitch metric.
denotes the indication of edges that exist between two pixels, suggesting the likelihood of an object boundary lying between these locations.To initiate the process of generating super-pixels, an edge detection algorithm assigns a value for each pixel , indicating the probability of it being situated on a boundary.Consequently, the distance between two pixels is determined as the highest edge probability among all pixels along the line connecting these pixels.Equations 4, 5, and 6 provide the revised distance calculation: Where is the color distance between two pixels in an image with their color values red , green , and blue color values.Squared differences between color values in each channel produce a scalar value, and the square root of this value is taken to obtain the color distance.
is the spatial distance between two pixels ( , ) in an image based on their coordinate values in the image.The complete distance between two pixelswithin an image is established considering their color difference , spatial distance , and an extra component which signifies the edge information's strength between the pixels.Scaling spatial distance is performed using a factor , and normalizing color differences is accomplished utilizing a factor Holistic distance is determined by taking the square root of the sum of squared color difference, spatial separation, and edge component.To ensure the creation of coherent super-pixels, all super-pixels within the image need to be interconnected to form edges.In the context of generating super-pixels, the decision to merge two neighboring super-pixels is guided by a similarity metric termed the chi-square histogram distance in Equation 7represents the disparity between two histograms denoted P and Q.On the contrary, a superpixel-based segmentation approach involves partitioning images into regions corresponding to distinct objects or semantic sections within the images.The objective is to establish super-pixels that align with object boundaries and significant image features.These super-pixels can be generated through algorithms that optimize energy functions, including techniques like normalized cuts and SLIC, utilizing graphical representations of images.Figure 3 demonstrates the identification and detection of coronaviruses based on super-pixel segmentation from TEM images using the SLIC algorithm within MATLAB 2020.

K-MEAN CLUSTERING
In machine learning, the k-means clustering algorithm is often used for data clustering.In the selected clusters, the data automatically aggregated into numerous categories, with high similarities within each set and low similarities between them.Cluster-based super-pixels emerge by amalgamating similar pixels via clustering algorithms like k-means and mean shift.The notion entails partitioning the image into regions marked by similar colors, textures, or other pixel properties.However, clustering algorithms must align the borders between adjacent super-pixels to preserve the boundaries and accurately capture fine details.Post-processing techniques, such as graph-based optimization or edge-based refinement, can achieve this alignment process (Kumar et al. 2016).In the past, researchers have used a k-means-based multi-objective text clustering strategy.Based on the results, the proposed method performed better than the alternatives in text clustering, measured by well-known metrics such as precision and the F measure (Abualigah et al. 2016).The method was thoroughly tested on large datasets and consistently demonstrated its superior performance.The versatility and efficiency of the algorithm make it suitable for a wide range of text clustering tasks (Z.Chen et al. 2020;Luo 2016).Cluster K organizes the N-D data sets and forms meaningful clusters and the following pseudocode form for the super-pixels algorithm process: 1. Initial specifying cluster centers, denoted as .
2. For each assign it to the nearest cluster in the dataset cluster if .
3. Update the cluster centers by calculating their average value, which is computed as in cluster , .
5. Repeat steps 2 and 3 until the cluster centers stop changing, indicating that the algorithm has converged.
6. Ensure that neighboring pixels are in the same cluster by enforcing connectivity through a postprocessing step that merges spatially close but unconnected clusters.
7. Finally, return the resulting clusters after completing the entire process.

EXTRACTION FEATURE
Extracting features from the super-pixels generated by the SLIC algorithm provides a means to capture distinctive attributes for each superpixel, outlining its defining characteristics.These feature descriptions encompass elements such as color, texture, and shape, offering valuable insights for image segmentation, object recognition, and scene understanding.By analyzing the features of individual super-pixels, researchers can gain a deeper understanding of the visual content within an image, allowing for more accurate and efficient analysis.Additionally, these feature descriptions are used to compare super-pixels across different images, enabling the development of algorithms that can recognize and classify objects based on their visual attributes.Gabor filters, on the other hand, utilize a set of spatial-frequency filters to capture the texture information present in each superpixel, allowing for more accurate texture-based analysis.Additionally, local binary patterns (LBP) can be used to encode the spatial arrangement of pixels within a superpixel, providing valuable shape information for further processing and analysis (Ren & Malik 2003).
As a result, combining relational analysis in conjunction with heterogeneous super-pixels and deep-fold features enables the identification of objects within images.Capsules demonstrate the feasibility of identifying superpixel attributes, highlighting the potential value of structured image analysis.The uniqueness of this deep learning architecture is tested in an image classification setting, with an emphasis on the network's explicable decision-making (Smith et al. 2010;Toth et al. 2018).Consider a collection of pixels denoted as , which are partitioned into multiple disconnected segments that encompass all pixels within an image .Each of these disconnected segments is associated with a combined value representative of region R.This process generates a superpixel representation within the feature space, as outlined by the Eq 8 (Yang et al. 2019): (8) Where is a variable used for summation, and it represents the individual elements or pixels within a specific region .Incorporating the concept of , the representation accounts for how the different parts of the image connect as a whole.This restructuring of features aims to encompass substructures, thereby enhancing awareness of the overall structure of the image.Despite this rearrangement, the alignment between the feature space vectors and their positions within the image is preserved.Segmentation algorithms can be classified into four types: region-based, edge-constrained, classification or clustering, and hybrid approaches (Y.Chen et al. 2016).
In addition, k-means clustering can be applied to n-dimensional data sets, with the caveat that natural groups already exist.Natural clustering of n-dimensional data requires the user to choose a value of k.In addition, the super-pixels are color-segmented to classify the region of the virus's morphology.Iteratively adding neighboring pixels to cluster centers based on a similarity score yields this segmentation result.Common region-based strategies use region-growth algorithms, which necessitate the creation of comparable measurements and growth criteria.To classify unknown regions, these segmentation methods cluster them, and then classifiers are trained using the characteristics of individual super-pixels (Soltaninejad et al. 2017).The imsegkmeans was used for image segmentation based on the k-means clustering algorithm.It is generally used to divide images into regions or segments, with pixels in segments having similar colours, textures, or other characteristics.The imsegkmeans function takes an image and the number of clusters as input and returns an image where each pixel is assigned a label corresponding to the cluster.The function utilizes the k-means algorithm to find the cluster centres and assigns pixels to the closest cluster.

RESULTS AND DISCUSSION
Developing insight into the unique characteristics of the SARS-CoV-2 virus can shed light on the complex relationships between viruses and their hosts.Critical scientific phenomena like virus entrance, reproduction, mutation, escape mechanisms, viral abundance, and virus structure are all part of this web of interdependence.
Despite the complexities involved, many parallels exist between the techniques and methods of studying different viruses (Laue 2010).Figure 4 illustrates a manual analysis performed using ImageJ software to examine the morphological characteristics of the SARS-CoV-2 and SARS-CoV viruses.Specifically, Figure 4 (A) represents the original SARS-CoV-2 image with its segmented.In contrast, Figure 4 (B) shows the SARS-CoV authentic images with their segmented.The segmentation method was carried out so that the geometric parameters of each virus could be rigorously isolated and assessed, allowin for a thorough comparative study of their size and shape.Several factors can explain variations in size for SARS-CoV-2 particles within the same image, such as the stage of maturation and/or activation.The location of the virus's replication cycle also impacts the particle density and size.Also, particles located at different positions within the cell may vary in size due to other environmental factors.Table 1 shows sample results to compare automatic and manual area segmentation matching.Automated methods have become increasingly prevalent in various research fields, particularly in data analysis.The primary objective of using auto-methods is to achieve faster and more efficient data processing than traditional manual methods.Furthermore, the scalability of automated processes enables them to easily handle large datasets, which would be a daunting task if performed manually.Figure 6 shows the relationship between the value of the test samples and the mode of measurement used to determine the segmented area.The error values, calculated as the difference between the manual and automatic measurements, are plotted as individual points, providing a clear visual representation of the accuracy of the automated method.FI GURE 7. The error between the manual and Auto virus area segmentation Figure 7 shows a discrepancy between the manual and automated measurements of the tested samples.The error determined was subtracting the automatic measurements from the manual ones.The results of the error analysis between the segmentation of the manual and automated virus area showed a low root mean square error (RMSE) of 0.0275, indicating a low level of variance between the two measurement methods.This relatively low error rate highlights the accuracy of the automated measurement method, suggesting an effective use in practical applications.Furthermore, it represents a crucial step towards developing a robust deep learning model to accurately detect and classify SARS-CoV-2 levels in TEM images.Future work should consider incorporating more diverse data and fine-tuning the model parameters to enhance its performance.Finally, the findings of this study contribute to the advancement of automated virus detection in TEM images, empowering researchers and healthcare professionals with a powerful tool for efficient virus analysis, early detection, and improved understanding of viral diseases.

FIGURE 2 .
FIGURE 2. Flow chart of estimating the density of spike proteins and areas of SARS-CoV-2 using the superpixel segmentation technique

FIGURE 3 .
FIGURE 3. Identification of the superpixel segmentation based on the SLIC algorithm: A. SARS-COV 2 images, B. SARS-COV images

FIGURE 4 .
FIGURE 4. Morphometry analysis of virus image using manually: A. Original and segmented images of SARS-CoV-2; B. Representing original and segmented images of SARS-CoV.

Figure 5
Figure 5 shows the results of the automatic detection of the SARS-CoV-2 and SARS-CoV area per pixel based on image segmentation with the superpixel algorithm.Several factors can explain variations in size for SARS-CoV-2 particles within the same image, such as the stage of maturation and/or activation.The location of the virus's replication cycle also impacts the particle density and size.Also, particles located at different positions within the cell may vary in size due to other environmental factors.Table1shows sample results to compare automatic and manual area segmentation matching.
FIGURE 5. illustrate the Auto-morphometry analysis of Coronavirus image using superpixel segmentation for A. SARS-CoV-2 and B. SARS-CoV

F
IGURE 6. M orphometry analysis of virus image: A. Original and segmented images of SARS-CoV-2; B. Representing original and segmented images of SARS-CoV.

CONCLUSION
In conclusion, this paper has provided a new model for accurately and quickly detecting and classifying SARS-CoV-2 based on superpixel segmentation using TEM images.Furthermore, it includes valuable information on the mean area size of coronaviruses.This methodology offers useful information on the life cycle and progression of the virus, reducing the laborious manual work of experts.Additionally, this tool has significant potential for outbreak management and infection control, facilitating timely virus diagnosis.It can also be extended to investigate other unknown viruses to improve early diagnosis and understanding of viral replication mechanisms and pathogenesis.

TABLE 1 .
Comparison of manual and auto area segmentation results for SARS-CoV-2 & SARS-CoV