Prediction of specific virus outbreaks made from the increased concentration of a new class of virus genomic peptides, replikins.

Advance warning of pathogen outbreaks has not been possible heretofore. A new class of genomic peptides associated with rapid replication was discovered and named replikins. Software was designed to analyze replikins quantitatively. Replikin concentration changes were measured annually prior to, and “real time” every few days during, the 2009 H1N1 influenza pandemic. Replikins were seen by both linear sequence representation and three-dimensional X-ray diffraction, and found to expand on the virus hemagglutinin surface prior to and during the H1N1 pandemic. A highly significant increased concentration of virus replikins was found a) retrospectively in three pandemics from 1918 to 1999 (14,227 sequences)(p<0.001), and b) prospectively before the H1N1 2009 pandemic (12,806 sequences) (in the hemagglutinin gene (N=8,046), p values by t-test = 1/10 130 , by linear regression = 1/10 24 and 1/10 29 , by Spearman correlation < 2/10 16 , by Wilcoxon rank sum<1/10 16 , by multiple regression adjusting for correlation between consecutive years = 2/10 22 . Rising replikin concentration in H1N1 from 2006 to 2008, predicted one year in advance the H1N1 outbreak of 2009; and in H5N1, predicted the lethal outbreaks of H5N1 1997-2010.


Introduction
No structures of infectious organisms have been described to date which correlate quantitatively and temporally with epidemic outbreaks, course, and lethality, and permit early or advance warning of such outbreaks. Replikins are genomic structures related to rapid replication defined by the authors' algorithm: peptides 7 to 50 amino acids long, containing two or more lysines, six to ten amino acids apart, at least one histidine, and a lysine concentration of 6% or more (8).
Replikins are the first reported conserved virus structures whose increasing concentration correlates quantitatively with, and predict, strain-specific virus outbreaks.
As observed in the 2009 H1N1 pandemic, advanced Replikins warning was published one year in advance, in April 2008 (12). The outbreak occurred in April 2009. Because of the time taken to produce vaccine, vaccine was not available when the brunt of the pandemic struck in April 2009. Only 20% of the world's population at risk had vaccine available, and that was eight months after the outbreak, when the virus Replikin Count had already indicated that the lethal aspects of the pandemic would soon be over except for a brief recurrence in December 2010, also predicted ( Figure 1a). Fortunately, so far, the H1N1 pandemic has been less severe than the three pandemics in the last century. The 2011 outbreaks have begun for both H1N1 and H5N1. Initial 'scout' virus outbreaks of H1N1 have occurred, again in Mexico, with Replikin Counts of the Infectivity Gene up to a record 16.7 (see below) and a human mortality rate of 10.7% (2); and outbreaks of H5N1 in Egypt, better established, have begun with a current cumulative mortality rate of from 34.7% (3,4) to 37.8% (5). It is generally agreed that new approaches are required for the control of acute emergent diseases (6.7). Five year plans with the hope to have a vaccine available in five years may not be relevant to the current threat (24).
The benefits of having more time to prepare for and to respond to acute lethal environmental events has been demonstrated in satellite warnings for hurricanes. The consequences of having little or no advance warning have been recently demonstrated in earthquake-tsunamis and in the 2009 H1N1 influenza pandemic. To date there have been no reliable technologies to predict emergence of specific virus strains. The only global surveillance available is post outbreak and based solely on epidemiological data (1). Acute emergent infectious diseases coupled with current global travel pose challenges to timely implementation of public health measures such as tracking and isolation of cases, and the design, testing and distribution of specific effective vaccines and therapeutics to the world's population.

Methods
Software based on the authors' algorithm (9) first identified and then counted the replikin peptides in each genomic sequence (Replikin Count = number of replikins per 100 amino acids). For each group of specimens' replikins, the mean and standard deviation of the mean (SD) were calculated and compared over the past 93 years. Highly statistically significant increases and decreases were examined, for example by strain, host, country, history, year, month or week; by substitution, morbidity, and lethality. The terms 'increase' and 'decrease' of Replikin Counts were used only when the p level was less than 0.001. Counts in H1N1, H5N1 and other influenza strains were each monitored separately, retrospectively from 1918 to 1999, and prospectively from 2000-2011 for all countries reporting to Pubmed. During outbreaks, Replikin Counts were compared to Counts for the same strain in non-outbreak ('resting') time periods. Statistical analyses of rate of change, trend, pattern, and growth models in the evolution of each virus strain were initiated. Replikin genes were isolated in silico by scanning and identifying those areas of the virus genome which had the highest concentration of replikins. When the eight H5N1 genomic areas were examined year by year, gene areas were found which became upregulated when associated with particular outbreaks. When the upregulation was found to be associated with high infectivity (morbidity over time period), the high Count area was named Replikin Infectivity Gene. When a high Count area in a sequence was found to be associated with high mortality rates, the high Count area was named Replikin Lethality Gene. The Replikin Count of these two genes in the H1N1 virus were determined annually from 2001 to 2008 before the pandemic, then during the pandemic 'real-time' every few days, then weekly, from April 2009 to February 2011. When Replikin Counts exceeded 5 per 100 amino acids, stacking was sought and found. Counts were expressed in two ways: 1) for the entire gene, e.g. for the entire hemagglutinin gene in H1N1 (as employed in Figures 1-4); and 2) for the highest concentration of Replikins within each infectivity or lethality gene, which was designated the Replikin Peak Gene (Figures 4 and 5). Human morbidity and mortality rates data during particular time periods (CDC and WHO) (10,11) were compared with Replikin Counts. Replikin peptides were visualized by two means: a) by linear display of sequences of contiguous numbered amino acids in the primary structure and b) by X-ray diffraction analysis of the 3-dimensional folded structure.

Prospective Prediction of Outbreak Solely from Replikins Analysis
An increase in H1N1 virus Replikin Counts was found before the outbreak of the 2009 H1N1 pandemic.
A rising mean Replikin Count of the hemagglutinin area of H1N1, between 2002 and 2009, globally from 3.5 to 9.7, was suggested by data from just one country, Mexico; confirmed in Peru-Argentina, Austria, Japan-Vietnam and globally (Table I and        . Replikin count is analyzed as a continuous variable, although we do report frequencies of some specific values because the distribution exhibits a discreteness suggesting it is a mixture distribution. Statistical analyses consisted of linear regression of Replikin count on year, Spearman correlations, as well two-sample t-tests and Wilcoxon rank sum tests to compare means between two different years. These analyses assume that Replikin counts are independent between specimens. It is possible that there is a temporal correlation (e.g. correlation of specimens taken close together in time), but assessment of temporal correlation is difficult without day, or at least month of specimens. Existence of a temporal correlation, would lower the level of significance. For the analysis of a slope between years 2001 to 2008 multiple regression was used to test for a slope over time while adjusting for the mean level of Replikin counts in the year before in order to remove any temporal correlation between successive years.

Conclusion Independent Statistical Analysis
Replikin counts increased 2 fold between the years 2001 and 2008, with much of the increase between 2006 and 2008, and they increased significantly between 2008 (the year before the pandemic), and 2009 (the year of the pandemic). They decreased slightly between 2009 and 2010. The distribution of Replikin counts appears to be dominated by a handful of modes, or 10 to 20 classes. This suggests that there are a handful of variations of the virus, such that for specimens within a class the Replikin count is identical. The increase in the proportion of classes with high Replikin counts accounts for the increase in mean Replikin counts leading up to the pandemic.

Virus Replikins Visualized Before and During the H1N1 Pandemic of 2009
The presence and position of the replikins in the H1N1 hemagglutinin gene area prior to and during the course of the H1N1 pandemic were visualized as well as counted. Table II shows that as the Replikin Count increased from 3.2 in 2002 to 11.7 in 2010, a substantial structural change occurred, most evident in the HA1 area, from amino acid 1 to 276, in which more Replikins appeared, increasing from 0 to 34 in number per 100 amino acids. At the same time, the presence of 'free' K's and H's (not within replikin structures) in the same HA1 area, decreased from 20 to 0. As the number of replikins increased, they were found to increase on the surface the hemagglutinin gene.

Increased Strain-Specific Replikin Counts Correlate with Pandemics and Outbreaks 1918-2007: Retrospective Analysis 1917-2000; Prospective Analysis 2001-2011
Increases in Replikin Count were observed specifically for each strain, but not for other strains, in those years in which each strain was shown to be responsible for the outbreak (p<0.001) (Figure 4). Replikin Counts of Replikin Peak Genes were examined in all Pubmed records for sequences of H1N1, H2N2, H3N2 and H5N1 and Influenza B. An important natural control is seen in the rarely lethal influenza B over 69 years ) for which the mean Replikin Count rarely exceeded 4. When one or both of the infectivity and lethality replikin genes are upregulated, in preparation for, or in the midst of an outbreak, the Replikin Count increases above 4 (p<0.001) (Figures 1-3, 5; Table 1) and replikins increase their presence on the surface of the hemagglutinin gene ( Figure 3).

Evolution, trans-flu strain sharing, and conservation of Replikins
The substitutions which have occurred in the Goose Replikin from 1917 and the 1918 H1N1 pandemic to 2011 shown in Table IV appear to be selective and retained (conserved) rather than random. The conservation data from 1917-2004 (1) is updated to 2010 in Table IV.  Year

Prediction of Geographic Location of H5N1 (Avian flu) Outbreak in Indonesia
Instead of comparing neighboring genes, neighboring countries were compared for the Replikin Counts of H5N1 scout infections in humans over several years. In the replikin prediction of 2005-2006, Indonesia was predicted to be the country that would be worst affected in terms of increased human mortality (18). Following the replikin prediction, 277 human H5N1 cases were reported and the human mortality rate increased in Indonesia from 40% to 82% (WHO) (11).

Concurrent H5N1 and H1N1 Build-Up in 2011
Because of the increase in Replikin Count in birds from 2002 to 2008 ( Figure 5), and the increased Counts in its precursor H9N2 in chickens (19,20), the authors issued a warning in January of 2009 that H5N1 outbreaks would surge (19,20). By January of 2010, these H5N1 outbreaks occured; in addition to outbreaks in birds and chickens in 63 countries (21,22) Figure 6a, the persistent increase in the Replikin Count after the 2009 Pandemic had been declared over by WHO (13), suggests that the pandemic is not over. In Figure 6b, as seen in Figure 1a above, the Lethality Gene peak increase occurred 2 years before the pandemic outbreak, then declined promptly to its pre-outbreak level. However, this has been followed by the recurrence in 2011 of a marked increase in Replikin Count again to pre-pandemic levels.

Discussion
The Risk of a Combined H1N1-H5N1 Pandemic. The risk of a combined H1N1 (high infectivity) -H5N1 (high lethality) pandemic may have increased because of the simultaneous rise in the Replikin Counts of each of these two virus strains to their highest levels recorded since 1918 and the appearance of 'scout' virus outbreaks of H1N1 again in Mexico and of H5N1 in Egypt. The simultaneous emergence of two virus strains with record high Replikin Counts has not been observed previously. All increases observed in the past 93 years have been in only one strain (see Figure 4). Simultaneous pandemic outbreaks could bring the two virus strains more frequently in contact with each other, facilitating transfer of genomic material to form a hybrid with pathogenic capability of each strain. H1N1-H5N1 combination is now only a risk; it has not yet occurred, and it is not certain that it will occur. However, with simultaneous increases in the Replikin Counts of each strain, outbreaks of H1N1 and H5N1 are now in progress in Mexico and Egypt repectively (2,41,44), and in preparation to meet this threatened combination, the authors have prepared the first completely synthetic TransFlu TM Replikins Vaccine against these two and the other common influenza strains. TransFlu TM was successful in blocking H5N1 (23) in the first of its continuing independent trials in the U.S. and elsewhere.

Pan-Influenza Vaccines.
A new technology is presented here of Replikins, genomic peptides which act as epitopes, conserved and shared inter-strain. These facts of conservation and sharing of the newly defined Replikins epitopes were not known, and this unawareness and the believed absence of conservation and sharing was the basis of the reason given for decades of not being able to make pan-influenza vaccines which could be used from year to year. However, having reviewed Replikins' technology since 2003 from data provided by the authors, the National Institutes of Health (NIH) first confirmed the influenza epitopes earlier defined by specific Replikins by landing anti-flu antibodies on them (16). This NIH data was simultaneously independently confirmed by studies at Scripps (17). NIH has now announced that they hope to have a pan-influenza vaccine, possibly in 5 years, based on these epitopes (24). As noted above, the Replikins TransFlu TM pan-influenza vaccine, based on the Replikins epitopes confirmed by the NIH, is available now, and is being tested (23).
Improved vaccine production. The need for new vaccines in influenza is widely recognized (6,9). The conservation of replikins over time and the sharing of influenza trans-strain replikins (ref. 1 and Table V) has been used to design and produce effective solid phase synthetic vaccines in as little as seven days, permitting a more rapid response to newly appearing strains (23). The first synthetic replikins pan-influenza vaccine was successful against H5N1; it also blocked virus excretion, providing for the first time the potential to block the development of H5N1 reservoirs  (Table II and Figure  3a) is here shown to occur in advance of and during the pandemic. This change is concomitant with changes from neutral to acidic conformation assisting fusion of the virus with the host membrane during virus entry (27)(28)(29)(30). The actual visualization of Replikins, with the ability to count them, supports the reality and practical clinical relevance of Replikin structures. Consequences of the surface increase may be: 1) the replikins themselves may contribute to the increased infectivity and lethality of the virus by encircling and supporting the essential active site of the virus for entry via sialic acid (Figure 3a). The central role of the sialic acid receptor for influenza virus entry into host cells was demonstrated a) in 1959 by the authors by blocking the virus entry into brain cells by sialic acid conjugates from brain gangliosides (25), and b) by the release of similar sialic acid conjugate decoys (sialoresponsins) by the chorioallantoic membrane of the chick egg under influenza attack (26). The molecular definition of the reaction between the hemagglutinin unit and the sialic acid receptor pocket of the host membrane was recently demonstrated (27)(28)(29)(30).
2) The increased surface coverage by replikins may represent an increased 'shield' or 'armour' against the immune system of the host (Figures 3a and 3b).

Replikins in other lethal zoonotic organisms and other pathogenic states. The 2010-2011
Foot and Mouth Disease virus outbreaks were predicted by the Replikin Count in 2009, one year previously (31). Lethal pathogens other than viruses also contain replikins; for example, in bacteria, such as tuberculosis, and in trypanosomes (malaria) (8). Replikins also play a key role in human cancer (8,32,33,34). The authors have postulated that viruses, bacteria, trypanosomes, cancer cells and other biological organisms may be carriers or vectors for mobile pathogenic replikin sequences associated with rapid replication (communication in preparation). Some replikins are associated with rapid replication in healthy growth throughout biology, as in algae and food plants (8).