Discrete Groups and Internal Symmetries of Icosahedral Viral Capsids

Abstract A classification of all possible icosahedral viral capsids is proposed. It takes into account the diversity of hexamers’ compositions, leading to definite capsid size.We showhowthe self-organization of observed capsids during their production results from definite symmetries of constituting hexamers. The division of all icosahedral capsids into four symmetry classes is given. New subclasses implementing the action of symmetry groups Z2, Z3 and S3 are found and described. They concern special cases of highly symmetric capsids whose T = p2 + pq + q2-number is of particular type corresponding to the cases (p, 0) or (p, p).


Capsid viruses
One of the outstanding features of nature is its extraordinary economy of means. Nature seems to be very parsimonious, no matter whether the economy concerns some fundamental laws processes, or more complicated phenomena including life and evolution.
In the inanimated world this parsimony can be found in the least action principle and the minimization of free energy or free enthalpy in thermodynamical equilibrium. In biology one can often observe extraordinary efficiency of organisms which adapt themselves to various conditions by minimizing their energy loss and maximizing their survival probability.
One of the important ways to ensure minimal or maximal value of essential parameters is the use of symmetries. It is not a coincident that the least surface area and the greatest volume of a convex three-dimensional body is attained by the most symmetric one, which is the perfect sphere. In living organisms one is often amazed by the utter economy and efficiency, by perfect hydrodynamical properties of the fish and the aerodynamics of birds' wings.
There is also an fantastic ability to pack a maximal amount of information into smallest volumes available, which is done by the DNA dense packing in chromosomes. This spectacular ability to store information in an optimal way can be observed in viruses, in particular in the so-called capsid viruses, especially in their icosahedral capsid variety. Capsids play an essential role in viruses' survival, protecting the DN A, which is the acting part of the virus, from external dangers like chemical attacks or solar radiation. The capsids' sizes are optimally adapted to the size of the DN A to be packed within, and when a mutation occurs leading to a longer DN A chain, the size of the capsid must follow in order to be able to contain it. The information ruling capsids' construction and build-up is contained in the DN A or both in DN A and RN A, depending on virus type. It is encoded in the particular type of coat proteins, and can be reduced using the possibilities offered by the high degree of symmmetry displayed by the icosahedral shape.
In what follows, we present the generalization of the model of assembly schemes of viral capsid symmetries successfully applied to icosahedral capsids in our previous papers ( [21], [2], [3], [4]) based on the analysis of the protein content of the elementary building blocks, the five-and six-fold capsomers displaying different internal symmetries.
A very large class of viruses build protection shells called capsids, made of special coat proteins and produced during the reproduction cycle inside an infected cell.

Figure 1: Papilloma viruses A herpes virus with its tegument
During infection, the capsid is left behind, while the DN A strain is injected into cell's nucleus. Later on, the DN A multiplies itself using the genetic material of the cell; parallelly, special coat proteins are synthetized, with which new capsids are constructed inside the infected cell. Then the newly produced complete DN A strains are packed into the empty capsids, and newly born viruses leave the cell, infecting its neighbors. A better knowledge of capsids' structure and symmetries can be helpful in understanding evolutionary trends and kinship between different virus species. The building schemes for capsids made of coat proteins are ruled by and encoded in the RNA and DNA molecules.

Triangular number
The triangular number T is determined by two non-negative integers, (p, q), via the simple formula: All possible icosahedral structures made of 12 perfect pentagons and an appropriate number of perfect hexagons have been found by Coxeter in the middle of last century. Coxeter's classification was based on the notion of an elementary triangle, defined as follows.
Given two numbers (p, q) and a perfect hexagonal grid, we start by picking up a hexagon and transform it into a perfect pentagon. Then make p steps to the right, then q steps at the angle of 120 o , and plac the second pentagon there. Then repeat the same operation once more, and get the elementary triangle. Here are the examples of how this prescription works:  Let us show the schematic representations of icosahedral capsids. Due to the symmetry T (p, q) = T (q, p), it is enough to consider pairs with p ≥ q. The smallest icosahedral capsids correspond to triangular numbers T = 1, T = 3 and T = 4. The next three icosahedral capsids are generated by triangular numbers T = 7, T = 9 and T = 12. It is also useful to observe that the total number N 6 of hexamers in a capsid with a given T -number is given by the formula N 6 = 10(T − 1).

Random versus programmed agglomeration
The growth of icosahedral capsids via random agglomeration of capsomers seems highly unprobable. Were it really so, the final yield would be close to zero (something like 2 −23 10 −8 . This is so because in a random agglomeration process the error rate at each elementary step consisting in adding a new capsomer would be close to 50%, as it is shown in the next figure (7). The observed efficiency of capsid construction from campsomers produced in infected cells is close to 100% The most natural conclusion is that instead of random agglomeration of capsomers, what takes place is a very strict assembling process, with exclusive sticking rules. At least in the species whose triangular number is not too high, up to Adenoviridae (T = 25): this is the most plausible explanation of the observed full use of capsomers, with almost no waste.
For very large icosahedral capsids the agglomeration is less successful, and displays less ordered character In this case the driving forces for assembly in a particularly symmetric way are most probably of entropic and energetic nature.
As already stated above, the information about icosahedral capsid's structure is encoded in its basic triangle, twenty identical copies of which form an icosahedron. Consider the simplest case (besides a dodecahedron, with no hexamers at all, observed in Microviridae). The T = 3 capsid is quite common, namely in Cowpea and in the group Paroviridae.

Capsomer differentiation
Sometimes the capsids are assembled with dimers and trimers, but the resulting pattern is the same, as shown in the Figure (  capsid hexamers occupying different positions in the capsid must differ from each other. There are four different types, as dictated by symmetry of the capsid, plus the twelve pentagons.
We therefore must conclude that hexamer differentiation is necessary in order to ensure the right agglomeration scheme. Had hexamers all their sides equivalent, nothing would stop the formation of undesirable clusters which cannot lead to the correct construction of the Adenovirus capsid, as shown in the figure below:  Capsomers, which are the building blocks from which capsids are assembled, display various internal symmetries due to differentiation of coat proteins forming them. Pentamers and hexamers can be made of one, two, three or more different proteins. The internal symmetries of capsomers can be analyzed with the help of simplest discrete groups, known as permutation groups. Denoted by S n , they consist of all permutation operations acting on any set containing n items. The dimension of an S n group is therefore equal to n!. Cyclic permutations of n elements form an n-dimensional subgroup of S n denoted by Z n .
The S 2 group contains only two elements, the identity keeping two items unchanged, and the only non-trivial permutation of two items, (ab) → (ba). This permutation is cyclic, so the S 2 group coincides with its Z 2 subgroup. The simplest representations of the Z 2 group are realised via its actions on the complex numbers, C 1 . Three different inversions can be introduced, each of them generating a different representation of Z 2 in the complex plane C 1 : One should not forget about the fourth possibility, the trivial representation attributing the identity transformation to the two elements of the group, including the non-trivial one: iv) the identity transformation, z → z.
The Z 2 group can be implemented on a plane with two different actions: We shall denote the first realization by Z I 2 , and the second by Z R 2 .

Figure 13: Inversion Rotation by 180 o
Two simple discrete groups next in row after Z 2 are of particular interest to us: The symmetric S 3 group and its cyclic subgroup Z 3 .
The symmetric group S 3 containing all permutations of three different elements is a special case among all symmetry groups S N . It is exceptional because it is the first in the row to be non-abelian, and the last one that possesses a faithful representation in the complex plane C 1 .
It contains six elements, and can be generated with only two elements, corresponding to one cyclic and one odd permutation, e.g. (abc) → (bca), and (abc) → (cba). All permutations can be represented as different operations on complex numbers as follows.
The cyclic group Z 3 has a natural representation on the complex plane. Let us denote the primitive third root of unity by j = e 2πi/3 . Let the permutation (abc) → (bca) be represented by multiplication by j. Then the three cyclic permutations can be represented via multiplication by j, j 2 and j 3 = 1 (the identity), corresponding to the rotation by 120 o , 240 o and 360 o (the identity transformation, equivalent to the rotation by 0 o ).
Odd permutations must be represented by idempotents, i.e. by operations whose square is the identity operation. We can make the following choice: let the odd permutation (abc) → (cba) be represented by the complex conjugation z →z, or the reflection with respect to thge real axis.
Then the six S 3 symmetry transformations contain the identity, two rotations, one by 120 o , another one by 240 o , and three reflections, in the x-axis, in the j-axis and in the j 2 -axis. The Z 3 subgroup contains only the three rotations, as shown in the following figure (14):

Internal symmetries of hexamers
The symmetry of a hexamer is dictated by the place it occupies in the icosahedral capsid. It may lie on one of the symmetry axes, which may be two-or three-fold. A three-fold symmetry is realized by hexamers with the ababab coat proteinscheme, while a 2-fold symmetry can be realized in many ways, e.g. with abcabc scheme.
Let us proceed to the analysis of discrete symmetries that can occur in various hexamers. We shall start with the least symmetric one, which has totally differentiated sides according to the scheme (abcdef ), then proceed to the more symmetric cases. The abcdef scheme is shown below: The next example is provided by a hexamer admitting only one symmetry Z R 2 . Its sides are labeled according to the scheme (abcabc). Another simplest example is a hexamer admitting only one symmetry Z I 2 . Its sides are labeled according to the scheme (bdf f db). A third possibility of hexamer admitting only one symmetry Z I 2 is given below. Its sides are labeled according to the scheme (abcdcb). More elaborated hexamers schemes can admit two Z I 2 symmetries. In such a case hexamer sides are labeled according to the scheme (bbccbb). Next example is provided by a hexamer admitting the Z 3 symmetry. Its sides are labeled according to the scheme (ababab). With this in mind, we can

Affinity matrices
In what follows, we shall mark the five pentagon forming proteins with letter "p", and the sides of chosen hexamers that stick to pentamers' sides with letter "a".
The information concerning the agglomeration scheme can be encoded in a corresponding affinity matrix, which is a square table whose lines and columns are labeled with different letters denoting all different protein types. In the intersection of lines and columns we put 1 if the corresponding protein types stick together, and 0 if the corresponding agglomeration is forbidden. This can be also interpreted as a matrix of probabilities, 1 for the 100% probability of sticking together, and 0 when sticking together is totally excluded. The first example is provided by agglomerating pentamers with only one type of hexamers, containing only two types of coat proteins: The next  New hexamers are needed to continue the game. For larger capsids, in which the rate of pentamers is lower, one cannot obtain proper assembling rules unless more than one type of hexamer is present, out of which only one is allowed to agglomerate with pentamers. In the case of two different hexamer types one obtains either the T = 9 capsid, or, with one totally differentiated hexamer and two hexamers, one of the (ababab) type and another of the (abcabc) type, the T = 12 capsid. To get the T = 25 adenovirus capsid, one must introduce no less than four different hexamers, all of the being of the (abcdef ) type, out of which only one type can agglomerate with pentamers. As 25 is a square of prime number, each of these four types contains 6 different sides (proteins).
Counting in the unique protein found in pentamers, we get the result that the triangular number defines at the same time the number of different proteins participating in the construction.
The "affinity matrices" giving the links between various proteins belong the the set of the so-called circulant matrices, characterized by the fact that there is only one unit in each row and in each column, the rest of the entries being equal to 0 The square of any such matrix is the unit matrix, and its eigenvalues are 1 or −1. Due to this circumstance, each protein appears exactly 60 times in the fully built capsid.

General scheme with three hexamer types
In this section we consider only three hexamer types, (ababab), (abcabc) and (abcdef ). We can organize all the capsids obtained with these hexamers in a single table below. To each value of triangular number T corresponds a unique partition into 1 + (T − 1) where the "1" represents the unique pentamer type and (T − 1) is partitioned into a sum of certain number of different hexamer types, according to the formula (T − 1) = 6 α + 3 β + 2 γ with non-negative integers α, β and γ, of which β and γ can take on only the values 0 or 1.  To each value of the triangular number T corresponds a unique partition into 1 + (T − 1), where the " 1" represents the unique pentamer type and (T −1) is partitioned into a sum of certain number of different hexamer types according to the formula: (T − 1) = 6 α + 3 β + 2 γ with non-negative integers α, β and γ, the numbers β and γ taking on exclusively the values 0 or 1. This leads to four different classes, according to the choices: A : β = 0, γ = 0; B : β = 1, γ = 0; C : β = 0, γ = 1; D : β = 1, γ = 1. This classification is based on the fact that the corresponding hexamers are centered on a three-fold or a two-fold symmetry axis, so that the first type must be found at the center of icosahedron's triangular face, whereas the second type must be found in the center of an edge between elementary triangles. The number α of maximally differentiated hexamers follows then from the corresponding partition of a given triangular number, as shown in the following table (Fig.26).  A good example of predictive ability of our scheme is the analysis of the Herpes virus T = 16 capsid.
The total number of major capsid proteins (capsomer forming blocks) can The T = 16 capsid's basic triangle contains two totally differentiated hexamers (blue and yellow), and one "axially symmetric" hexamer (red). It is quite easy now to proceed to the following protein counting: Indeed, inside the basic triangle there are three copies of " 6A", three copies of " 6B" hexamers, which amounts to 3 × (6 + 6) = 36 proteins, and three halves of type "3" hexamers, i.e. 3 × 3 = 9 proteins, giving altogether 45 proteins; there are also three "p" proteins coming from pentamers (3/5) fragments, giving the total of 48. The capsid contains 20 such triangles, i.e. 20 × 48 = 960 major proteins.

Higher symmetries
The classification scheme presented above is based on the exclusive use of three types of hexamer symmetries, (ababab), (abcabc) and (abcdef ). However, among the capsids with all possible values of triangular number T there are two classes that display an extra internal symmetry. These are the ones corresponding to the particularly symmetric choice of two integers (p, q). Capsids with T -numbers generated by combinations (p, 0) and (p, p) display an additional symmetry of the edges, which may contain hexamers of other types than the three ones used up to now: (abbabb), (abccba), etc.
New internal symmetries are displayed in the figure (28) below. Let us consider the edge symmetry. The first case when such symmetry occurs is in T = 4 capsids. The second case with additional symmetry on the edge can be realized under the condition that the b-sides are polarized so that

Reduction of coat proteins number
Almost all non-chiral capsids, especially the "perfect" ones corresponding to triangular numbers givenb by pairs (p, 0) or (p, p) admit higher degrees of symmetry, and therefore, the reduction of number of different coat proteins needed for their construction. It can be seen on the following examples of Figure 31: Two versions of the T = 9 capsid: the first one with Z 3 symmetry and 9 different coat proteins, the second one with the S 3 × Z 2 symmetry and with 6 different coat proteins.

The T = 25 example
The best way to see how the reduction due to the additional symmetry does work is to consider a gradual construction respecting the full symmery group, which in this case will be Z 3 × Z 2 × Z 2 . Let us do it on the example of the T = 25 viruses: the Adenovirus and the PRD1 virus, both having the same T -number equal to 25, but not the same number of different coat proteins, as can be seen in the next two figures, (34) and (35): The adenovirus structure is well known, and it is based on  four species of totally differentiated hexamers. Its symmetry group is Z 3 .
The complete analysis of the P RD1-capsid symmetry shows how the number of different coat proteins can be reduced due to the consequent use of symmetry.

Conclusions
From the point of view of the symmetrical reduction icosahedral capsids can be divided in three categories: The irreducible ones, with T = a prime number 6n + 1. The total number of basic coat proteins remains the same, and in all cases is equal to As a consequence, in the case of the reduced number of different proteins due to symmetry reduction, the total number of those which appear not three, but six times in the elementary triangle, is equal to 120. It seems plausible that the three types are genetically related, because they display a similar hexamer structure: all hexamers maximally differentiated in the first case, and symmetrically reduced in the second and third cases.
The families with additional symmetries are liable to possess some evolutionary kinship.
The two schemes, the less symmetric and the maximally symmetric one, may have their own advantages and shortcomings. The first one displays the same number of copies of each particular coat protein, i.e. 60, which supposedly simplifies their chain production. In the second case the number of different coat proteins is reduced, but they must be produced in different amounts, some in 60, some in 120 copies. It would be interesting to know why certain types of icosahedral capsid viruses choose the particular scheme between these two possibilities.