A review of current aspects of sars family, genome, database, drug, vaccine and its pathogenic member SARS-CoV-2

The coronaviruses (CoV) show great diversity with respect to their genomic structures and the ability to infect animals and human beings. Multiple omics studies have been conducted to understand and identify the genome organization, immunological responses, and molecular markers for vaccine and drug development. However, due to constant mutational changes in the CoV-2 genome, the drug and vaccine development is becoming a challenge to target new variants. A tremendous amount of research is being carried out for vaccine development by utilizing bioinformatics and immunoinformatics, however, most of the developed vaccines are under trial. In this review, SARS family members are systematically illuminated for their differences, infections and mechanism of action along with recommendations to counter the new variants of CoV-2 through multi-omics, machine learning and structural bioinformatics techniques.


Introduction
Coronaviruses (CoV) are very distinct types of viruses that have great diversity and the ability to mutate (Cui et al., 2019).They infect many animals and induce respiratory syndrome in human beings.CoVs are positive-sense single-stranded (ss) RNA viruses belonging to a family called coronaviridae (Cui et al., 2019).After the phylogenetic analysis and genome organization studies, CoV is grouped into the sub-family Coronaviridae which has four genera (Cui et al., 2019), including Alphacoronavirus, Deltacoronavirus, Gammacoronavirus, Betacoronavirus.The origin of these genera was revealed through evolutionary analysis and it was found that αCoV and βCoV originated from rodents and bats (Kumar et al., 2020).The CoV transmission is species independent and spreads rapidly in hosts.The focus of this review is βCoV, which caused Middle East respiratory syndrome CoV (MERS-CoV) and severe acute respiratory syndrome CoV (SARS-CoV) (Fung et al., 2021).In 2002, SARS-CoV emerged, whereas MERS-CoV originated in 2012 (Cui et al., 2019).CoV is a zoonotic disease, and is responsible for fatal respiratory disease in humans that has made

Review Article
SARS-CoV-2 was performed using electron microscopy.The SARS-CoV has a size of 70-90 nm (Park et al., 2020).The higher sequence identity between both viruses suggested the same structure of both virus types.(Kumar et al., 2020).Coronaviruses have viral protein spikes on their surface and the viral RNA is encapsulated by the lipid bilayer embedded inside the host membrane (Yang et al., 2020).The research on understanding the structure of viral spikes has opened more opportunities for therapeutic treatments.

Genome organization
A genome similarity of 79% between SARS and CoV and 50% between MERS and CoV, has been reported (Deslandes et al., 2020).Coronavirus genomes constitute 6 to 100 open reading frames (ORFs).
Sixteen non-structural proteins (nsps) are encoded by the first ORF which covers about 67% of the genome, while the rest of the ORFs code for structural and accessory proteins (Guo et al., 2020).SARS-CoV-2 has two untranslated regions (UTRs).The 5' untranslated region end is of 265 nucleotides length whereas the 3' untranslated region end is of 358 nucleotides.There has been no substantial sequence variation between the two viruses.SARS-CoV-2 genome consists of two viral cysteine proteases that include papain-like protease (nsp3) and main protease (nsp5).Several other NSPs are present such as helicase (nsp13) that plays its role in replication (Chan et al., 2020).Besides nsps, the ORFs encode four major structural proteins that include accessory proteins, spike surface glycoprotein (S), envelope (E), nucleocapsid protein (N).The M protein contains an N-terminal glycosylated domain, along with other domains including a long C-terminal CT domain, and three transmembrane domains (TM) (Kumar et al., 2020).
In coronavirus, M and E proteins are required for morphogenesis, assembly, and budding.The S protein is a two subunit (S1 and S2) protein containing fusion viral protein.The bat SARS-CoV S1 subunit has been reported to have a 70% sequence identified with human CoV.The S1 subunit has a signal peptide and two domains, receptor binding (RBD) and N terminal (NTD) (Walls et al., 2020).The main difference was found between the external subdomain which is specifically involved in association with the ACE2 receptor.The spike protein's ectodomain was crystallized to understand the glycoprotein structure.The glycoprotein is similar in structure to the SARS-CoV spike protein whereas the receptor-binding domain has a high percentage of structural differences (Wrapp et al., 2020).The bat SARS-CoV S2 subdomain shares a 99% sequence identity with human SARS-CoV (Coutard et al., 2020).The comparison of genome organization among three coronaviruses is given in Table 1.

Pathogenesis Endocytosis and replication
The endocytosis occurs when the glycoprotein of the viral spike binds to the receptor of the cell, and the host protease stimulates the S protein.For internalization, both viruses use ACE2 receptors and S-protein priming is done by TMPRSS2 serine proteases (Hoffmann et al., 2020).A higher expression of ACE2 receptors leads to a higher risk of COVID-19.The affinity of SARS-CoV-2 spikes protein binding to the ACE2 receptor is about 20 times higher than the SARS-CoV (Wrapp et al., 2020).Once the spike protein binds to the ACE2 receptor, a conformational change takes place.The fusion of the viral envelope protein in the cell membrane of the host allows the entry of the virus through the endosomal pathway (Coutard et al., 2020).After undergoing translation, the virus produces replicase polyproteins pp1a and pp1b.Proteinases produced by the virus cleave these proteins into smaller proteins.During the viral replication phase, multiple copies of sub-genomic RNA by irregular transcription are produced along with full-length genomic production due to ribosomal frameshifting in translation.When the viral RNA, endoplasmic reticulum and Golgi complex form an interaction, the virion assembly takes place and is released out of the cells with the help of vesicles.

Symptoms
The SARS incubation period usually remains under 2-7 days, and it can be extended to almost 10 days.The symptoms of coronavirus appear generally with a fever (>38°C) which is often quite high accompanied by rigors and chills (Kumar et al., 2020).The other symptoms include, headache, malaise, muscle pain, mild respiratory issues and diarrhea in some cases.After 3-7 days, a dry cough and eventually shortness of breath is observed.These can further progress to hypoxemia i.e., low level of oxygen.In about 10-20% of infected patients, the respiratory issues are severe enough and require ventilation.In many cases, chest radiographs appear normal during the illness phase but not for all patients.In some patients, white blood cell count is also decreased, and it can result in a low level of platelets at the peak of the disease that causes further complications (Kumar et al., 2020).

Omics Aspect of SARS family Phylogenetic analysis
Phylogenetic analysis is a very important technique to determine the relationship of a new micro-organism with the existing one.This allows us to put a new strain on a specific family and subfamily and then formulate preventive measures based on the information about the family.
The phylogenetic analysis clusters the SARS-CoV-2 inside beta coronaviruses including SARS-CoV which led the researchers to its induction in the subgenus Sarbecovirus of genus betacoronovirus.SARS-CoV-2 is categorized in a separate lineage that contains other viruses RaTG13, RmYN02, ZC45, and ZXC21 taken from the four horseshoe bat coronaviruses along with the one identified in pangolins (Gorbalenya et al., 2020;Yang et al., 2020).

SARS-CoV-2 functional genomics analysis
The source, spread and intermediate host can be determined using the SARS-CoV-2 sequences.During the early research, 96% identical sequences were observed against RaTG13 confirming the presence of an ACE2 receptor in both (Zhou et al., 2020).Therefore, it was suggested that the pangolins were the carriers of SARS-CoV-2 (Zhang et al., 2020).The variant analysis discovered important mutations and their associated risk (Plante et al., 2021).Several mutation analysis studies have reported an in-depth molecular understanding of the genome functionality of SARS-CoV-2.The researchers found three crucial mutations that were associated with geographical distributions associated with each of the three subtypes (Zhou et al., 2020).It relies significantly on the binding affinity of ACE2 with the spike glycoprotein for transmission and replication kinetics (Zhu et al., 2022).A study showed the binding affinity of N439K S protein to be quite high with ACE2 receptors in human beings.These N439K were similar in all the aspects like symptoms, infection cycle, and replication to the wild type (Thomson et al., 2021).In addition to this, Benton with his colleagues found out that the G614 open conformation increases infectivity and predominance of SARS-CoV-2 (Benton et al., 2021), and Hou with his coworkers carried out research on the D614G virus variant for the viral pathogenesis and transmission.These variants were found with function to enhance the infectivity and high transmission of the virus in human and mouse model cells (Hou et al., 2020).Most of the research work demonstrated the functional effect of the mutation, and it was observed that the virus variants have an improved capacity to spread than its pathogenicity (Zhou et al., 2021).Furthermore, the clinical data proposed that the alteration of D614G has no substantial association with the severity of the disease and it would not change the effectiveness of vaccine targets under development (Korber et al., 2020).It was also found that N501Y contributes to a higher rate of transmissions, ranges from 40 to 70% (Gu et al., 2020), 484K and K417N induce an immunogenic response to the antibodies (Wibmer et al., 2021).Furthermore, two coronavirus variants E484Q and L452R have been found in India and reported to be the main cause of the surge in April 2021 cases (Zhu et al., 2022).The mutations mostly happen due to changes in spikes, and they were analyzed using the bioinformatics tools to understand the functional changes associated with the mutations.Several bioinformatics tools have been employed to predict the effect of these mutations on the structural stability of viral proteins (Pereira et al., 2020).The variants N439K, N501Y, and K417N have been reported to have a neutral impact on the stability of the protein.PPA Pred web server (that uses mutational data to predict the effect of mutations on binding between the two proteins) has also been used for the prediction of ACE-spike protein binding (Brielle et al., 2020).All the mutations were found to increase the ability, except for N439K and N501Y, causing an increased viral infection (Zhu et al., 2022).

Table-2: Integration of SARS-CoV-2 Specific Databases and Genomics Resources
Resource Name Web Link

Transcriptomics data and immune response
The transcriptomics data analysis allows us to understand the quantitative RNA expression of the organism (Castro et al., 2022).RNA sequencing data taken from infected COVID-19 patients has revealed that the viral infection causes reduced antiviral defense accompanied by high production of cytokines.Through different studies, the immunological characteristics are also identified which are associated with blood, lung, and bronchoalveolar lavage fluid, showing an association between SARS-CoV infection and its link to increased cytokine production (Blanco-Melo et al., 2020).Currently, the immunological response of patients to viral infection by using single cell technology is a very emerging field of research.Immune samples taken from the infected COVD-19 patients have revealed diverse blood profiles.A diverse immune response landscape was demonstrated and the pathway of immune response to COVID is revealed by the use of single cell technology (Huang et al., 2021).The immune abnormalities caused by COVID-19 also can cause bacterial infections.Thus, many studies were performed to understand the role of microbiome in COVID-19 patients (Cao et al., 2021).These studies also recognized the interaction between the virus and microbiota.These studies might be beneficial in the prediction of diagnosis, treatment, and prognosis.Besides this, tremendous amounts of research is being conducted on the use of probiotics for adjunctive therapy for alleviation and prophylaxis of symptoms (Bottari et al., 2021).
The transcriptomic data of COVID 19 was also analyzed using clinical bioinformatics and it was found that SARS-CoV-2 can affect the pattern of gene expression for many genes that includes the immunerelated molecules, like IL6, IFIH1, DDX58, STAT1, IRF7, ISG15, IFIT3, TNF, IRF9, and MX1.The studies have shown that the alterations in the expression of IL6 and TNF are closely associated with an increased level of plasma cytokines level in coronavirus-infected patients (Darif et al., 2021).The enrichment analysis also demonstrated that the expression level changes are also closely linked to the neurological symptoms, cardiac dysfunction, and coagulation dysfunction in SARS-CoV-2 infected patients (Zhang et al., 2021).
Most of other evidence revealed that disturbances in the signaling pathway like TL4 (toll like receptor) signaling pathway cascade can affect immune cells such as B and T cells which causes the generation of Th17 releasing cytokines leading to an illness associated with acquired and innate immune systems.
In a study by Li and his colleagues, it was revealed that the molecules like NF-κB and IL 6 induce COVID-19 innate immune antagonism (Attiq et al., 2021).Also, SARS can increase the cell apoptosis to MAPK and STAT3 signaling pathways (Hemmat et al., 2021).Thus, SARS-CoV-2 can affect many types of immuneassociated molecules and signaling cascades.These types of SARS-CoV-2 infection cases possess a distinct impact on immune cell proliferation, differentiation, chemotaxis, and migration (Zhang et al., 2021).Therefore, there is a need for further to elucidate the underlying molecular mechanisms and factors.

Drug, Vaccine, and Therapeutics for SARS family Drug and therapeutics
Target selection and identification are very important for the drug development pathway.This allows us to identify the target with high specificity with existing drugs and treatments that can then be used to cure COVID-19 (Chen et al., 2020;Hodgson, 2020).

Vaccines
In the last decade, significant development has been made for distinct types of vaccines through techniques such as subtractive genomics, recombinant protein vaccines, whole virus vaccine, structural vaccinology, subunit vaccine.With the outbreak of COVID-19 pandemic, the main target of the scientific community was to develop a vaccine that is not only biologically safe but also has few side effects and curtails its follow-up for waves (Kaur and Gupta, 2020).Due to the genetic similarity of SARS-CoV-2 with SARS-CoV and MERS CoV, their vaccines were considered potentially useful against COVID-19 (Ahmed et al., 2020).Furthermore, SARS and MERS-CoV infection is effectively controlled by neutralizing the antibodies produced against S-protein and it was also considered as a prospective approach to control COVID-19 (Zhou et al., 2020).In genome comparison of two viruses, it was identified that the spike proteins of CoV-2 were quite variable in their S1 subunit, thus, antibody neutralization is ineffective against COVID-19 (Ishack and Lipner, 2021).
Currently, several teams are working for developing vaccines.In one of the approaches, the researchers are working to develop inactive, protein subunit, vectorbased, virus-like particles, and RNA-DNA-based vaccines against COVID-19 (Grifoni et al., 2020).
Previously, for SARS-CoV, different types of vaccines were developed which were able to protect animals against infection but their use caused lung damage and eosinophil infiltration in model mice and infections in the liver in ferrets (Abdelmageed et al., 2020).Typically, a new vaccine follows research and development (R&D), clinical trials and final approval from a regulatory authority.This process mostly requires 12-18 months and sometimes even more (Ishack and Lipner, 2021).Chen and his colleagues categorized vaccines for COVID-19 into three broad categories (Ishack and Lipner, 2021); Subunit Vaccine, Nucleic Acid Vaccine and Whole Virus Vaccine.
The vaccine platform for COVID-19 is divided into recombinant protein, RNA-DNA, viral vector-based, inactivated, live, and attenuated vaccines.The main characteristic of the COVID-19 vaccine is the consideration of unwanted immunoprotection minimization, stockpiling suitability, and taking into account hypertension or diabetes in patients (Han et al., 2021).Most of the vaccine developing organizations for SARS-CoV-2 infection are targeting S protein antibodies in human beings by delivering S antigen through vaccine inoculation.The first vaccine used for COVID-19 was mRNA-1273, utilizing the S protein in a distinct type of lipid nanoparticles for injection (Fang et al., 2022).This is developed by Moderna therapeutics.It is thought that after the injection of this vaccine, the host cell will take in the foreign mRNA, and then the immune system will generate an immune response against the S protein to prevent infection and invasion.Similarly, another mRNA vaccine is under research by Clover biopharmaceuticals, China (Ishack and Lipner, 2021).

Role of Bioinformatics in SARS Drug Discovery
In silico studies and analyses have gained a lot of acceptance in the last two decades.It is one of the most rapidly growing disciplines which has applications in almost all the fields and its potential is quite visible in the last decade.One of the important domains of in silico analysis in medical sciences is the development of an efficient way for vaccine and drug discovery by the use of computational tools.Several in silico researches were done to predict the potential drugs and chemical compounds for the treatment and cure of COVID-19 patients (Ray et al., 2021).Many machine learning techniques were implemented for screening of chemical compounds that are suitable for the target protein.
Molecular docking, virtual screening and molecular dynamic simulation are among the most widely used computational techniques for drug discovery against pathogens including viruses.Molecular docking envisages the region of one molecule to another when they are bound in the form of a stable complex.Molecular docking helps to predict interaction of chemical compounds or ligands to a target biomolecule in order to find the activity and affinity of the potential drug molecule (Broni et al., 2021).On the other hand, molecular dynamic simulation is a machine learning technique for drug discovery.In this method, the structures of drug target compounds are predicted in silico and then they are simulated inside a virtual host environment for stability analysis of in silico structures or molecularly docked compounds.Besides these, homology modeling , distribution, drug-likeness, metabolism, absorption, excretion (ADME) analysis, and network-based identification are used extensively for screening of many potential drug candidates against pathogens (Manne, 2021).

Recommendations and limitations
This review focused primarily on the multi-omics approaches applied on the CoV-2 genome to elucidate evolutionary behavior of the drug and vaccine targets.This limitation hinders the insights that can be gained through the research focused on proteomics of CoV-2.Therefore, it is recommended that research should be designed and executed to elucidate drug and vaccine targets that remain conserved over time in CoV-2.Structural bioinformatics and machine learning techniques should be applied to predict the probabilities of possible mutations that can arise in the CoV-2 genome.Such insights can allow researchers to design drugs and vaccines that can target current and future mutations in CoV-2 target proteins.

Conclusion
The SARS-CoV-2 is a novel pathogenic virus that has caused multiple fatalities in the world.Multiple potential drugs and vaccines have been developed for CoV-2.However, due to rapid mutational changes in the CoV-2, the development of vaccines for new variants of CoV-2 is becoming a challenging task.Therefore, incorporating bioinformatics and immunoinformatics, the vaccine development processes can be enhanced to counter new COV-2 variants.