Introduction

The ability of the visual system to detect and interpret information from light is an evolutionary adaptation following an ancient divergence from a primitive light sensing structure that has resulted in species specific systems with varying degrees of sophistication yet remarkably similar configurations1,2,3. The eye lens is a key refractive element that has been tailored for each organism in order to meet visual demands required for functional needs1,3. It is composed of lens fibre cells that grow in concentric layers over existing tissue, with no concomitant cellular loss; the tissue accrual is a process that begins in utero and continues through life1,2,3,4. The fibre cells are differentiated from epithelial cells located under the anterior part of the lens capsule, which is a semi-elastic basement membrane that contains the lens and transfers the forces needed to adjust its shape with changes in focussing power3. During the differentiation process, the structural proteins of the lens, the crystallins, are synthesised in concentrations that vary across the tissue in order to create a refractive index gradient required for optimising image quality3. Cytosolic concentrations can approach ~400 mg/ml in humans3,5,6,7,8. In the terminal step of differentiation the nucleus become pyknotic and the cell enters proteostasis, retaining the requisite concentration and mixture of crystallins3,5,9. The lens, therefore, retains within its refractive index gradient, a chronology of ageing with newly synthesised proteins in peripheral fibre cells and proteins produced during gestation in its centre evolved for longevity and for maintenance of optical function3,7,8.

The major function of the lens is to maintain transparency and to provide sufficient refractive power for light to focus on the retina. In terrestrial species, around two-thirds of the refraction occurs at the air/cornea interface with the lens providing the additional refractive power and, in species with sufficiently malleable lenses, the fine-tuning required to adjust lens shape for different viewing distances2,3. In aquatic species, the comparatively high refractive index of water negates the refractive power of the cornea and has resulted in the evolution of lenses with steep index gradients and high refractive index magnitudes required to provide all or most of the refractive power for the aquatic eye3,10,11.

The refractive index is related to protein concentration by the Gladstone-Dale formula3,12,13. This simple linear equation introduces the refractive index increment (dn/dc), which defines how much a given concentration of a protein will contribute to the refractive index3,12,13,14,15. Across the crystallin isoforms differences in dn/dc values have been found, with the smallest of the crystallin classes: the γ-crystallins, having the highest dn/dc13,14. This is also the crystallin class that is found in the core of the lens3,16,17 where refractive index is the highest3,18,19.

It is thought that the requirement for longevity, which is of particular importance in the core of the lens that contains the oldest cells, has been met by the Greek key structure, which is known for its thermodynamic stability8,20,21,22,23 and is a feature of γ-crystallins and found in the larger βγ-crystallin family8,24,25. Higher order structures as well as the dn/dc of any given protein are influenced by the primary sequence of amino acids. McMeekin et al measured the dn/dc of each amino acid for 589 nm and at 25°C26 taking into account individual amino acid refractivities and partial specific volume. These values were recently used by Zhao et al14,15 to compute the dn/dc values for an array of proteins. Zhao et al14,15 reported the enrichment of aromatic residues, associated with high dn/dc, in lenticular crystallins, particularly in the βγ isoforms.

Whether there is a correlation between the Greek key structure and primary sequences containing amino acid residues with high refractivities in all proteins or whether the Greek key motifs and amino acids with higher refractivities are found in certain crystallin isoforms was not known. This work suggests that βγ crystallin isoforms have comparatively high dn/dc values compared to other proteins with Greek key structures.

Results

Specific refractive increments of Greek key domains

Figure 1 shows the dn/dc distribution of the βγ-crystallin dataset. The mean dn/dc of the βγ-crystallin dataset is 0.1999 ml/g (SD: 0.0036 ml/g). The mean dn/dc of the double Greek key motif of crystallins was found to be 0.2003 ml/g (SD: 0.0035 ml/g). Plotting predicted dn/dc against sequence length for the 292 entries of the βγ-crystallin dataset shows a general trend to decreasing dn/dc with longer sequences (Figure 2A). These observations may suggest a link between the double Greek key motif and high dn/dc. It is clear from Figure 2B, which highlights the range of sequence lengths between 50 and 300 residues, that there is a predominance of native proteins with lengths of between 170–180 residues, which corresponds to the double Greek key motif.

Figure 1
figure 1

Frequency distribution of predicted refractive index increments of βγ crystallins.

Green, red and yellow colours indicate the overall average value, the average value of double Greek key motif sequences and the average value of other human proteins, respectively.

Figure 2
figure 2

Predicted refractive index increment plotted against sequence length of βγ crystallins showing A) all crystallins in the dataset; B) crystallins the sequence length of which is between 50 and 300 amino acids.

Green, red and yellow colours indicate the overall average value, the average value of double Greek key motif sequences and the average value of other human proteins, respectively.

The dn/dc of representatives of double Greek key domains from a wide range of proteins were analysed to establish if high dn/dc is a consequence of the motif structure. The distribution of dn/dc for representatives of 52 different double Greek key domains27 indicates that Greek keys from βγ-crystallin proteins have the highest dn/dc values, appearing as an outlier as seen in Figure 3A. The mean dn/dc value for double Greek keys motifs is 0.1908 ml/g which is close to the mean for human proteins, ie 0.1899 ml/g, but substantially lower than that of βγ- crystallin (0.2009 ml/g). No correlation is seen between either the sequence length or the number of strands of the double Greek key domain and the predicted dn/dc value (Figure 3B and 3C).

Figure 3
figure 3

Frequency distribution A), sequence length B) and number of strands C) plotted against refractive index increments of double Greek key domains.

The red colour represents βγ crystallins.

Amino acid compositions and refractive increments in crystallin isoforms

In order to investigate the specificity of the βγ-crystallin double Greek key domain, a subset of the initial dataset was created containing only proteins within the 170–180 residue range. This comprises 116 βγ-crystallin isoforms with minimum and maximum dn/dc values of 0.1965 and 0.2084 ml/g respectively and a mean of 0.2024 ml/g. Their amino acid compositions show an inverse correlation between the proportion of arginine and lysine residues and the dn/dc value (r2 = −0.85, p < 0.0005): the sum of arginine and lysine residues is constant at ~21 but as the dn/dc value increases, the number of lysine residues decreases and that of arginine increases concomitantly (Figure 4A). A similar inverse correlation is seen with glutamic acid being replaced by aspartic acid (r2 = −0.70, p < 0.0005) (Figure 4B). In both cases, an increase in charged amino acids with high dn/dc values is at the expense of those with lower dn/dc values, leading to a higher overall dn/dc value for the given protein. A similar analysis was conducted on the S-crystallin family, which are the major protein class in eye lenses of cephalopods (eg. octopi, squid, cuttlefish). As illustrated in Figure 4C and Figure 4D, inverse correlations between the proportion of arginine and lysine residues and glutamic acid and aspartic acid are also observed in this family of proteins (r2 = −0.92, p < 0.0005 and r2 = −0.81, p < 0.0005, respectively). Notably, S-crystallins from the octopus, which are demarcated by open circles (Figure 4C), differ in composition from other S-crystallin sequences. This causes the deviations between the curves representing K and R residues but does not affect the strength of the correlation between them.

Figure 4
figure 4

The number of residues plotted for each respective protein from the selected datasets of βγ- and S-crystallins, showing in A) and C) proportion of Lysine (K) and Arginine (R) residues and in B) and D) the number of Glutamic Acid (E) and Aspartic Acid (D) residues.

βγ-crystallin K-R correlation: -0.85,p < 0.0005; βγ-crystallin E-D correlation: -0.70,p < 0.0005; S-crystallin K-R correlation -0.92,p < 0.0005;S-crystallin E-D correlation:. -0.81,p < 0.0005. Open circles correspond to S-crystallins from octopus.

Interspecies comparison

Amino acid sequences from a range of crystallins with different dn/dc values were compared in five species to examine whether these specific correlations, more lysine and glutamic acid in crystallins with relatively low dn/dc values and more arginine and aspartic acid in crystallins with relatively high dn/dc values, were consistent across species. Comparison was made between two aquatic and three terrestrial species: ranine (Xenopus laevis), piscine (Danio rerio), murine (Mus musculus), bovine (Bos taurus) and human (Homo sapien) (Table 1). The greatest variation in dn/dc values is seen in the zebrafish (Danio rerio) and the least in the human. The results show that a number of lysine residues in sequences of lower dn/dc values were substituted by arginine in sequences with higher dn/dc values (Table 2). There are no such consistent correlations between glutamic acid residues and aspartic acid residues. Rather, Table 2 also shows that between the protein with the lowest dn/dc and those with the highest dn/dc in any given species, there is a consistent substitution of phenylalanine by tyrosine. In contrast with the other set of substitutions, this creates a very slight decrease in dn/dc.

Table 1 Refractive index increment of selected sequence
Table 2 Interspecies comparison of amino acid substitutions

Salt bridge analysis

Analysis of representative βγ-crystallin structures indicate that mammalian βγ-crystallins have around 5 salt bridges per domain length (of around 90 residues); this is not found for the non-mammalian proteins. A separation that is considerably greater than 10 residues is the most frequent separation length (73% of cases) and in cases of separation that are fewer than 10 residues, a separation of 2 residues is the most common (36% of cases); with 21% of cases showing a separation of 7. Although most bridges are formed between amino acids that are >10 residues apart, they are largely from the same domain (84% of cases). Around a third of these cases are interstrand bridges. Figure 5A highlights salt bridge conservation across five βγ-crystallin isoforms, ie γB, βB2, βB3, βB1 and βA4. Conserved salt bridges connect two Greek keys within a domain as well as between two domains (Figure 5B and C). The isoforms βB2, βB3, βB1 display two cross domain salt bridges; β4A has a single salt bridge with a disulphide bond within 4 residues of this. The γ-crystallin (γB) does not have any cross domain salt bridges; these are prevented from forming because of the negatively charged residue 29 on one domain and corresponding 147 on the other domain. All βγ-crystallins have a salt bridge between the two Greek keys of the second domain. β4A also displays a salt bridge between the two Greek keys of the first domain.

Figure 5
figure 5

Salt bridge conservation in βγ-crystallins.

(A) Multiple alignment of βγ-crystallin sequences including alignment of the two domains. Only residues relevant to conserved salt bridges and disulphide bond are displayed. Green and red numbers represent negatively and positively charged residues, respectively. Yellow dashed lines show disulphide bonds. Pale blue shaded residues are involved in cross domain salt bridges. Navy shaded residues are involved in completely conserved salt bridges51; (B) inter protein disulphide bridges observed in a homodimer of crystallin βB2. Monomers are shown in different colours51; (C) inter domain salt bridges formed within a single crystallin βB1 monomer. Salt bridge interactions are shown as red dotted lines52.

A fundamental feature incorporated in the Greek key structure is the conserved β-hairpin motif24. Calculating the values of dn/dc for sequences that comprise the four β-hairpins in each of the isoforms shown in Figure 5A indicates that there is a slight increase in the dn/dc value for these sequences compared to the whole sequence for each respective isoform. The change in dn/dc value ranges from an increase of 0.36% for βB3 (dn/dc = 0.1977 for β-hairpins compared with 0.1970 for the whole sequence) to an increase of 3.37% for βB2 (dn/dc = 0.2004 for β-hairpins compared with 0.1939 for the whole sequence).

Discussion

The concept of a refractive increment and the property of contributing to refraction have obvious relevance to the crystallins given the predominant function of the eye lens. It should be remembered that dn/dc is not an immutable property but depends on the solvent28, wavelength29 and, to a much lesser extent, temperature29. Given that experimental crystallin samples prepared for measurement of dn/dc need to be constituted in a solvent that replicates fluid found in the eye lens, the greatest variability in dn/dc values from different studies comes from the wavelength used for measurement. Most experimental studies on dn/dc of crystallins have used proteins from the bovine lens13,30,31,32 and some have concentrated on α-crystallins31,32 and/or γ-crystallins30. Where all three broad classes of crystallins have been measured, dn/dc was found to be highest for γ-crystallins13. Most importantly experimental studies did not isolate particular isoforms within the crystallin classes. Theoretical studies that have calculated molar refractivities of individual amino acids26 or used these to calculate dn/dc were able to compile these values for many different proteins14,15 and provide a deeper insight into the reasons why a protein may have a relatively low or high dn/dc value. Aromatic (tyrosine, tryptophan and phenylalanine) as well as sulphur containing amino acids (methionine, cysteine) have relatively high dn/dc values, whilst alanine, proline and serine have the lowest values14,15. The high content of aromatic amino acids in the crystallins33,34 coupled with the relatively high cysteine content of γ-crystallins24,35,36,37 provides some explanation for the high dn/dc of this protein class. The γ-crystallin, γM, a protein found in certain aquatic species, has the highest dn/dc (0.209 ml/g) thus far found in a crystallin resulting partly from its high level of methionine14,15. The high content of methionine may facilitate denser packing35 which would be advantageous for a high refractive index (reviewed in3).

The mean dn/dc of the βγ-crystallin dataset studied (0.1999 ml/g; SD: 0.0036 ml/g) is significantly higher than the mean dn/dc for other human proteins (mean 0.1899 ml/g; SD: 0.0030 ml/g)14,15 and sequence length was found to be inversely correlated to dn/dc. Comparison of proteins with double Greek key motifs shows a similar trend: that the double Greek key motif in crystallins has a higher dn/dc than those in other proteins. A Greek key motif is therefore not necessarily indicative of a high dn/dc value. Sequence length in Greek keys was not found to be correlated with dn/dc. The dn/dc values of the β-hairpins in the Greek keys from isoforms γB, βB2, βB3, βB1 and βA4 are slightly higher than the dn/dc values of the whole sequence. Whether the amino acids in these structural regions have a dual role in contributing to refractive index and to structural stability requires further investigation.

Tighter packing of proteins will increase the refractive index as proteins have a higher refractive index than water. The substitutions of arginine for lysine and aspartic acid for glutamic acid, not only increase the dn/dc value but could also result in a more compact protein structure. The guanidinium group on the side chain of arginine has a geometry and charge distribution that renders it able to form multiple hydrogen bonds; the shorter, less flexible side chain of aspartic acid compared to glutamic acid may also facilitate compaction.

Both in the βγ-crystallins and in the S-crystallins, the higher the dn/dc value of a protein, the fewer lysine and glutamic acid residues and the more arginine and aspartic acid residues it contains. Since lysine/arginine and glutamic/aspartic acid residues are associated with the formation of salt bridges38, the observed substitutions may be constrained by the need to maintain existing salt bridges. Analysis of representative βγ crystallin protein structures revealed that mammalian βγ-crystallins have a much higher proportion of salt bridges per domain than would be expected given the domain length of about 90 residues (5 salt bridges compared to a standard of <2 salt bridges for a domain of that length38). This pattern is not found among the three non-mammalian crystallins investigated in this study; these display only one salt bridge per domain on average. It is notable that salt bridges are more frequent in α-helical structures38, whereas βγ-crystallins have a relatively high proportion of β-pleated sheet.

A cross species comparison showed that arginine consistently replaced lysine with progression from lower to higher dn/dc value proteins within a species. The glutamic acid/aspartic acid correlation was not borne out in this comparison. Instead another trend was observed: a decrease in phenylalanine and a concomitant increase in tyrosine with increase in dn/dc value. As both phenylalanine and tyrosine have relatively high dn/dc values, (0.244 and 0.240 ml/g respectively) which are very close in magnitude, such a substitution makes little difference to the refractive index.

The findings of this study show relatively higher numbers of salt bridges in mammalian βγ-crystallins than in other proteins38 and there is a predominance of salt bridges formed between amino acids separated by more than 10 residues. Additionally, the relatively high proportion of interstrand bridges in the crystallins compared to what has been found in other proteins38,39 may be indicative of long range structural stability. This is of particular importance to the crystallins which remain in the cytoplasm of lens fibre cells from their synthesis to death of the organism. In the case of cells from the central regions of lenses, protein synthesis has taken place during gestation and maintenance of optical quality is required for decades. The highest content of γ-crystallins is found in the central regions of mammalian lenses where refractive index reaches maximum magnitude3,16,17; in cephalopods the predominant proteins which contribute to the high refractive index are S-crystallins3,14,15. Both protein classes have relatively high refractive increments.

Whilst the optical function of the lens is to provide refractive power, the quality of the optics relies on transparency. Cataract results in a loss of transparency and it has recently been shown that congenital mutations in human γD-crystallin can cause cataract to develop with or without disruption to the Greek key structure40. Mutations such as that which results in substitution of arginine by serine at position 77 (R77S) do not destabilise the tertiary structure yet result in cataract in the cortical regions of the lens40. Other mutations such as the one that leads to substitution of proline for alanine (A36P), disrupt the Greek key structure and cause nuclear cataract40. Single substitutions similar to the aforementioned do not produce any substantive change in the value of dn/dc.

The lenticular crystallins are organised to ensure that the lens meets the refractive demands of the eye. The βγ-crystallins and the S-crystallins contain residues that contribute to a high dn/dc when compared to non-lenticular proteins and the Greek key motif which exists in many proteins, is linked with a higher dn/dc only when it is found in crystallins. Salt bridge interactions that stabilise protein structure and provide interactive potential, are relevant to structural longevity of the crystallins and are necessary for maintenance of transparency over decades. The crystallins have not only evolved with a primary sequence that optimises their contribution to refraction, they have higher order arrangements that are conducive to its preservation.

Methods

Analysis of βγ-crystallin refractive index increment

Refractive index increments (for 589 nm at 25°C) were predicted for all available sequences belonging to the βγ-crystallin family. Those sequences were retrieved using relevant seed sequences as Psi-Blast queries41. In order to ensure the widest coverage while avoiding the introduction of unrelated proteins, seeds were defined as all sequences whose annotations, based on experimental data, specify the molecular function as “the action of a molecule that contributes to the structural integrity of the lens of an eye” (GO:000521242) and identify them as belonging to the βγ-crystallin family. As results, 13 seed sequences were used: 7 from mouse - β-crystallin A1/A2/B2/S and γ-crystallin B/C/D/E - and 6 from rat - β-crystallin A4/B1/B3 and γ-crystallin C/D/E.

All sequences returned by Psi-Blast with an e-value below 1 were mapped to UniRef10043 to remove duplicates and fragments. Since the βγ-crystallin superfamily contains a few non-crystallin members, such as absent in melanoma 1 (AIM1)44, their associated sequences were removed from the initial list. This was performed by generating a phylogenetic tree using FastTree45 from a multiple alignment46 and eliminating sequences belonging to non-crystallin branches. Eventually, the diversity of the βγ-crystallin family was represented by 292 entries. Prediction of their dn/dc values was performed following the computational method outlined by McMeekin et al.26 and described in previous studies14,15.

Analysis of S-crystallin refractive index increment

Refractive index increments (for 589 nm at 25°C) were predicted for relevant sequences belonging to the S-crystallin family. Those sequences were retrieved using a seed sequence - squid S-crystallin (P18426) - as Psi-Blast query40. All sequences returned with an e-value below 1 were candidates for further filtering. Sequences not belonging to the S-crystallin family such as its homologous glutathione S-transferases were discarded. Fragments, predicted and hypothetical sequences were also not considered. Finally, among the 38 remaining S-crystallin sequences, only those belonging to cephalopod species were selected for our study, ie 24 from Squid and 8 from Octopus; 6 Oyster sequences were discarded.

Analysis of Greek key sequences

Most βγ-crystallins contain a double Greek key motif that has a length of 170–180 residues. To study the constraint this motif confers on protein evolution, sequences outside that length range were removed from the dataset. Hence, 116 βγ-crystallin sequences were used for analysis. Predicted dn/dc values of βγ-crystallins were compared to those of a set of 52 double Greek key domain representatives27, whose sequences were retrieved from the Protein Data Bank47.

Salt bridge analysis

Salt bridge analysis was performed on a set of 3D protein structures that are representative of βγ-crystallins. These structures were extracted from the Protein Data Bank using a 90% sequence similarity filter to eliminate duplicates and non-wild type copies. In addition to non-crystallin members, two AIM1 proteins and a crystallin whose domains were artificially permuted48 were also removed from the list. A total of 19 structures were studied: 7 human, 5 cow, 3 mouse, 1 rat, 1 sea squirt, 1 bacterium and 1 archaea. Descriptions of salt bridges were computed by the Salt Bridges Plugin of the molecular graphics program, VMD49; their classification as interdomain and interstrand and calculation of β-hairpins was performed using descriptions produced by PROMOTIF50.