Universality of Form: The Case of Retinal Cone Photoreceptor Mosaics

Cone photoreceptor cells are wavelength-sensitive neurons in the retinas of vertebrate eyes and are responsible for color vision. The spatial distribution of these nerve cells is commonly referred to as the cone photoreceptor mosaic. By applying the principle of maximum entropy, we demonstrate the universality of retinal cone mosaics in vertebrate eyes by examining various species, namely, rodent, dog, monkey, human, fish, and bird. We introduce a parameter called retinal temperature, which is conserved across the retinas of vertebrates. The virial equation of state for two-dimensional cellular networks, known as Lemaître’s law, is also obtained as a special case of our formalism. We investigate the behavior of several artificially generated networks and the natural one of the retina concerning this universal, topological law.


Introduction
The principle of maximum entropy provides an estimation for the underlying probability distribution of the observed data that corresponds best to the currently available information about the system [1,2]. It has been applied to fields as diverse as physics [3,4], biology [5], ecology [4,6], and natural language [7]. The philosophy behind the maximum entropy inference approach is to explain and predict experimental observations by making the fewest number of assumptions (i.e., constraints) while assuming no explicit underlying mechanisms.
One of the prime and dreadfully arduous challenges in applying the principle of maximum entropy to a given system is to find out relevant constraints that should be imposed on the system [8]. The authors of [9] have suggested that, in situations where the experiments are repeatable, the expected value of the entropy of the likelihood function is relevant information that should be considered a constraint. However, for a given system, its value is largely unknown. Solving the corresponding Lagrange problem leads to the so-called entropic probability distribution [10,11]. Entropic distributions have been exploited mainly within the context of data classification and theoretical physics [12]. Yet, the consequences of such an approach are not fully explored in biology and life sciences. In the context of biology, due to the rigid structure of DNA, most experiments must be repeatable.
In the present paper, we adopt the approach of [9] and apply it to a complex multicellular biological system, namely, cone photoreceptor cells in the retina; for an earlier attempt in this direction, see [13]. Cone cells are wavelength-sensitive receptors in the retinas of vertebrate eyes, and their different sensitivities and responses to light of different wavelengths mediate color vision. The spatial distribution of these cells, so-called cone mosaic [14], varies among different species, which, in each case, may reflect the evolutionary pressures that give rise to various adaptations to the lifestyle of a particular species and its specific visual needs. However, in most cases, the adaptive value of a particular cone mosaic is unknown [15]. From the perspective of gene regulatory mechanisms, the

Entropy Maximization
In statistical mechanics, to obtain the Boltzmann distribution from the principle of maximum entropy, one has to assume a constraint on the mean energy value as, in the context of physics, the expected value of energy is crucial information about the system. This approach leads to a formalism in which thermodynamic temperature emerges as a free parameter and should be determined later from the experiment [24]. In a general setting, the challenge is to find out the relevant constraints that should be imposed on the system. A. Caticha and R. Preuss in [9] have assumed a set of data generated by some experiment, where the only requirement is the experiment to be repeatable. If, for example, the experiment is performed twice, with the corresponding outcomes of z 1 and z 2 , in the case that we discard the value of z 2 , the resulting situation should be indistinguishable from if we had done the experiment only once. They have argued that a constraint on the expected value of the entropy of the likelihood codifies this information. Inspired by this idea and since biological experiments must be repeatable because of the robustness of DNA, we adopt this specific approach to entropy maximization and apply it to multicellular biological systems; see also [13] and the references cited therein. Generally, an experiment does not need to be repeatable; for instance, this may be the case at the atomic scale [25]. For non-repeatable experiments, the Einstein fluctuation formula is applicable [9,26]. Note that, in cellular biology, each cell is composed of a large number of atoms, and the experiments are robust and repeatable.
We denote sensory neurons S and their local environment, which consists of other cells Y. We assume the following information about the system: where S denotes the support set of S. Equation (1) is a normalization condition of the probability mass function (in this paper, the frequency of the appearance) of neurons, Equation (2) assumes the knowledge of the numerical value H of H(Y | S), and where Y denotes the support set of Y, which is defined in terms of the probabilities f (Y | S = s). By the method of Lagrange multipliers, we maximize the Shannon entropy of neurons, H(S) = − ∑ s∈S p(s) ln p(s), while taking (1) and (2) into account. The corresponding Lagrangian reads where λ and β are Lagrange multipliers. By solving ∂L/∂p(s) = 0, we obtain the so-called entropic probability [9][10][11]: where Assuming β > 0, Equation (4) implies that neurons with lower entropy H(Y | S = s) have a higher probability or frequency of appearance, which is confirmed in the case of cone photoreceptors in Section 3. The probability distribution in (4) is the most likely and the least-biased one, where the only assumed knowledge about the system is the repeatability nature of the experiments. Other available information about the system can be incorporated as additional constraints in (3); an example of such a scenario is given in Section 4. A couple of remarks are in order. The application of the principle of maximum entropy strongly depends on how we specify the system configuration, which by itself depends on the nature of the problem at hand. Different ways of describing the configuration of the same system may lead to different outcomes; for a detailed discussion of this issue, see [27]. The second remark deals with (2). Although we have assumed the knowledge of H, we do not know its value in most cases, but rather, it is a quantity whose value should be known; thus, we have formulated our problem as if we had this information. For a detailed discussion of this matter, see [9]. By calculating the free parameter, β, from the experimental data, one can infer the value of H. In analogy with statistical mechanics where thermodynamic temperature emerges as the inverse of the Lagrange multiplier in the derivation of the Boltzmann distribution, we interpret β as the biological coldness (the reciprocal of temperature) of neurons. As in thermodynamics, where energy is an extensive quantity, here entropy is also extensive. Note that thermodynamic temperature is a statistical property of matter in bulk, and thus β can be viewed as an emergent quantity at a tissue level.

Spatial Distributions of Cone Photoreceptors in the Retinas of Vertebrates
In this section, we apply the principle of maximum entropy, culminated in Equation (4), to retinal cone mosaics of various vertebrates. Besides predicting the frequency of the appearance of cones across the retina, we demonstrate that the application of the maximum entropy inference leads to the introduction of a new parameter, which we call retinal coldness, that is conserved in divergent species of rodent, dog, monkey, human, fish, and bird. In Section 3.1, we elaborate on the details of our calculations in the case of human cones; other species are summarized in Section 3.2.

Spatial Distributions of Human Cone Photoreceptors
Human color vision is mediated by three types of cones, which are sensitive to (blue) short-, (green) medium-, and (red) long-wavelength light. The spatial distributions of these cells in a living human eye are shown in Figure 1. The image in the top-left corner is the first image of the spatial arrangement of living human cones, reported in [28]. In the following, we show how Equation (4) can be used to predict the frequency of the appearance of blue, green, and red cones in a retinal field of a human eye given in Figure 1. From (4), we have: where p s is the probability of the occurrence or the frequency of the appearance of cone subtypes: blue (b), green (g), and red (r). We consider the local environment of blue cones that consists of other blues and exclude green and red cones; the local environment of green cones comprises only greens and the same for the local environment of red cones. This is justified as it is suggested that most cone cells form independent mosaics, and there are no spatial interactions between two mosaics [29]. Moreover, cone mosaics are explicitly shown to be spatially independent in the case of avian cones [30]. To calculate H b , H g , and H r , we need to consider some kind of probability distribution or density function. Our choice is to construct the nearest-neighbor-distance (NND) distribution for each cone subtype and calculate its corresponding entropy. The rationale behind choosing this specific distribution is as follows. (I) The choice of probability distribution should reflect the frequency with which each cone subtype appears in the retina, which is related to the mean distance between cones of the same type. The scattering of the NND distribution, which is quantified by its entropy, decreases with decreasing the average value of the NND distribution [31] and implies a higher frequency of appearance of cones, based on (5).
(II) In general, the methods based on the concept of the nearest-neighbor distance have been extensively used to quantify cone mosaics, see for example [14], which turns out to be a simple but powerful concept to analyze spatial patterns. As an illustration, we have shown searching for the nearest neighbors in the case of blue cones in Figure 2. The NND distribution for each cone subtype is presented in Figure 3. The nearestneighbor distances follow a peaked distribution in each case. Note that, to obtain the optimal bin widths of the histograms, we have used a data-based procedure proposed by M. P. Wand [32], to its first-order approximation, which is called one-stage rule (the zeroth-order approximation, i.e., the zero-stage rule, of the method reproduces Scott's rule of binning [33]).   Nearest-neighbor distances (µm) To calculate entropies of distributions in Figure 3, we use the notion of differential entropy, which is defined as h s = − dx f s (x) ln f s (x), where f s (x) is a probability density function, with the property that dx f s (x) = 1. We use the notation H to designate the Shannon entropy and h for the case of differential entropy. Note that, as f s (x) has units, it cannot be used as the argument of logarithm; however, by the transformation x → x/x reference , where x reference = 1 µm, we make the distance and subsequently f s (x) dimensionless. For each histogram in Figure 3, the density function can be constructed, and the differential entropy can be calculated accordingly; we obtain: From the image in the top-left corner of Figure 1, we can determine frequencies of appearance of blue, green, and red cones, which are the ratios of these cells in the retinal field. We set the value of β in (5) to reproduce the observed values of frequencies. To obtain β, we employ the Kullback-Leibler divergence, D KL = ∑ s q s ln(q s /p s ), where q s corresponds to the observed cone subtype frequency of appearance and p s corresponds to the prediction of the theory in (5). The left panel of Figure 4 illustrates the Kullback-Leibler divergence as a function of β, with the global minimum of 0.001 at β = 1.284. The observed cone ratios are compared to the predictions of the theory for β = 1.284 in the right panel of Figure 4. Incidentally, these calculations demonstrate that neurons with a lower entropy have a higher probability of occurrence.

Spatial Distributions of Vertebrate Cone Photoreceptors: From Rodent to Bird
We apply the procedure explained in Section 3.1 to various vertebrates, namely, rodent, dog, monkey, human, fish, and bird. Rodent and dog are dichromats; monkey, like human, is trichromat; and fish and bird are tetrachromats, which, in the case of bird, there is also a significant number of double cones. Our results are summarized in Figures 5-10. Although cone mosaics of these diverse species are significantly different from each other, the values of β in all species are in the same order, where 1 < β < 2. In Section 3.3, by using statistical analyses and the fact that the NND distributions of cone subtypes in vertebrate retinas are peaked-and thus can be approximated by Gaussians-we estimate the value of β in a general case. Nearest-neighbor distances (µm) Figure 5. The image in the top-left corner (scale bar = 50 µm) illustrates the spatial distribution of cone photoreceptors in the dorsal mid-peripheral retina of a diurnal rodent called the agouti. It is adapted with permission from [34]. Copyright 2009, Cambridge University Press. In this image, the shortwavelength-sensitive-cone opsin is represented as green and the long-wavelength-sensitive-cone opsin as violet; next to it, in the digitized image, we have reversed the colors. Nearest-neighbordistance distributions in the third row have the entropies of h v = 3.310 and h g = 1.787; next to them, we have shown a comparison between the experimental observation of cone ratios in the agouti retina and the predictions of the theory (5) evaluated at the global minimum of the Kullback-Leibler divergence, that is, β = 1.310. The colors of the histograms and the bar chart correspond to their respective cone subtypes. Nearest-neighbor distances (µm) Figure 6. The image in the top-left corner, adapted from [35], shows the spatial distribution of cone photoreceptors in the inferior peripheral retina of a dog; the short-wavelength-sensitive-cone opsin is represented as green and the long-/medium-wavelength-sensitive-cone opsin as red. The entropies of the NND distributions in the third row read h g = 3.933 and h r = 2.440. The colors of the histograms correspond to their respective cone subtypes. Next to the NND distributions, we have shown a comparison between the experimental observation of cones' frequencies of appearance and the predictions of the theory for β = 1.127.

Bounds on Retinal Coldness
We are in a position to address the issue raised at the end of Section 2: although we lack the knowledge of the numerical value H of H(Y | S), we have considered it as crucial information about the system and have represented it in terms of the Lagrange multiplier β (i.e., retinal coldness). In this subsection, we study the bounds on the value of H, which lead to the estimation of β. To this end, we use the fact that, for a given cone subtype, the distribution of the nearest-neighbor distances is peaked-see figures in Section 3.2-and thus, it can be approximated by a Gaussian.
We consider the nearest-neighbor distances as random variables, X s , where X s ∼N µ s , σ 2 s and s denotes cone subtypes. Note that, in the case of a normal distribution, where the differential entropy is h s = (1/2) ln 2πeσ 2 s , Equation (4) becomes: where Z = ∑ j σ −β j . To estimate the bounds on the value of entropy, first, we define the random variable W as W = ∑ s π s X s , where π s is the weight of the contribution of each cone subtype (i.e., the frequency of the appearance) and ∑ s π s = 1. Since W has a normal distribution, its entropy is related to its variance, σ 2 , as (1/2) ln σ 2 + const. Thus, we can obtain the lower and upper bounds of the entropy by minimizing and maximizing the variance, respectively. In the following, we study these two extreme cases.
Variance of W reads σ 2 = ∑ s π 2 s σ 2 s . By the method of Lagrange multipliers, we minimize σ 2 , subjected to the constraint ∑ s π s = 1. It turns out that [38], for π s ∝ σ −2 s , σ 2 is minimized, which implies the minimization of the entropy of W. By comparing this π s with (7), we establish the upper bound of β as 2. To obtain the lower bound of β, we maximize the entropy of W, which implies the maximization of σ 2 , subjected to ∑ s π s = 1. This happens by letting π s corresponding to the largest σ s be 1 and all the other π s s vanish. This scenario is not desirable, as the contributions of various colors vanish. To obtain an acceptable maximum value for the entropy of W, we consider the uncertainties associated with random variables π s X s to be equal, that is, we equalize variances of π s X s by considering π s ∝ σ −1 s , which results in σ 2 = C 2 ∑ s = C 2 N. C is a proportionality constant, i.e., π s = Cσ −1 s , and N is the number of cone subtypes. By comparing this π s with (7), we establish the lower bound of β as 1.
Among the species studied in Section 3.2, fish and bird have more ordered retinal cone mosaics, where, in the former, it is highly regular, and in the latter is semi-random [16]. These two species' corresponding βs are closer to 2 than the other species. Thus, more ordered patterns correspond to a lower retinal temperature or a higher coldness, as expected in thermodynamics. More irregular mosaics-like in rodent, dog, monkey, or human-have higher retinal temperatures. Overall, for vertebrate retinas, under the assumption that the NND distributions are peaked, we always have: 1 < β < 2.

Lemaître's Law
Lemaître's law is the virial equation of state for two-dimensional cellular networks, which relates two measures of disorder (i.e., thermodynamic variables), namely, the fraction of hexagons to the width of the polygon distribution [11,19,20,[39][40][41]. Although at first proposed for two-dimensional foams, it has been shown that a wide range of planar cellular networks in nature obey Lemaître's law, ranging from biology such as avian cones [30], epithelial cells [42], and mammalian corneal endothelium [43], to physics such as amorphous graphene [41], the Ising model [44], Bénard-Marangoni convection [45], silicon nanofoams [46], and silica bilayers [47]. It can be obtained by maximizing the entropy, H = − ∑ n≥3 p n ln p n , where p n is the probability, or the frequency of the appearance, of an n-sided polygon, while considering the following information: ∑ n≥3 p n = 1, ∑ n≥3 np n = 6, ∑ n≥3 r n p n = const.
The first relation is the normalization condition, and the second one is a consequence of Euler's relation concerning the topology of the structure, which assumes only three lines meet at a vertex. Networks with higher vertices can be transformed into trivalent vertices by appropriate transformations [48]. The function r n in the last relation depends on the geometry or the underlying dynamics of cells (polygons). Lemaître and colleagues assumed r n = 1/n as an empirical observation made by measuring the areas of cells in a two-dimensional mosaic produced by hard discs moving on an air table [19,20]. At first glance, the choice of r n = 1/n seems not applicable in a general setting. Indeed, it was already mentioned in [20] that this particular form of r n cannot be valid for all cellular mosaics, as, for instance, it is incompatible with the well-known Lewis' law [49], which assumes that the average area of polygons is linear in n. However, the authors of [20] speculated that the remarkable universality of Lemaître's law suggests that the particular choice of r n = 1/n has probably a deeper meaning than expected. Without considering any ad hoc constraint, we derive Lemaître's law as a special case of our formalism explained in Section 2. To this end, first, we generalize the Lagrangian introduced in Equation (3) as [13], where we have assumed the following additional information: is the average number of cells in local environment and µ is a Lagrange multiplier. By solving ∂L/∂p(s) = 0, we obtain: where We simplify the notations in (10) and write: where we have replaced s by n. p n is the probability of having an n-sided polygon, or its frequency of appearance, and Z = ∑ n≥3 exp[−βH n − µn]. To calculate H n , we consider a general standardized discrete distribution, which its density can be expanded as [50], with constants and H k (·) is the kth Hermite polynomial. Note that, as n → ∞, g n (x) approaches the standard normal distribution. Now that we have g n (x) at our disposal, we can calculate its differential entropy, h n = − ∞ −∞ dx g n (x) ln g n (x). Since H 3 (x) = 8x 3 − 12x is an odd function of x, its integral vanishes, and thus, the first nonzero correction term is of the order 1/n. We obtain: where the first term is the entropy of the standard normal distribution. By plugging (13) into (11), we arrive at where Z = ∑ n≥3 exp[−β/n − µn] and we have absorbed the constants included in O(1/n) in β. Equation (14) sheds light on the origin of r n = 1/n, which Lemaître and colleagues had obtained for a specific two-dimensional mosaic [19,20]. Since the calculations leading to (14) only assume a general discrete distribution, the universality of Lemaître's law becomes evident.
The variance, µ 2 , of the distribution p n in (14) reads where we have used Euler's relation, n = 6. The second moment of p n , µ 2 , demonstrates a deviation from the hexagonal configuration and can be interpreted as a measure of topological disorder. By exploiting (14) and (15), Lemaître's law, as a relation between two measures of disorder, µ 2 and p 6 , has been obtained as [11,19,20,41,46], We present a simple and intuitive derivation of (16) and (17), which is inspired and developed in discussion with C. Beenakker and I. Pinelis [51]. For (16) to hold, p 6 should be large and thus p n in (14) should peak at n = 6. This allows us to approximate p n near n = 6 by a normal distribution, P n , centered at n = 6 while ignoring the discreteness of n. We can let n vary from −∞ to ∞ since only those ns close to the peak have notable contributions, provided that p 6 is not too small. Thus, we have: P n = 1/ 2πµ 2 exp −(n − 6) 2 /2µ 2 , which results in µ 2 P 2 6 = 1/2π. For (17) to hold, the probabilities p n s for n / ∈ {5, 6, 7} should be negligible compared to p n s for n ∈ {5, 6, 7}; as a result, the discreteness of n cannot be neglected in this case, since only three ns contribute. The constraint n = 6 implies that p n should sharply peak at n = 6, leading to µ 2 → 0 as p 6 → 1, and thus: µ 2 + p 6 → 1. Note that, although in (9), we have assumed information about seemingly unrelated quantities H(Y | S) and N(Y | S) represented in terms of their corresponding Lagrange multipliers β and µ, the peakedness of p n and thus h(n) ≡ −β/n − µn at n = 6 gives us a relation between β and µ. Since h (6) = β/6 2 − µ = 0, we have: β = 36µ.
To obtain regions of validity of µ 2 p 2 6 = 1/2π and µ 2 + p 6 = 1, numerical analyses are performed and the results are shown in Figure 11. The left panel illustrates µ 2 as a function of p 6 , where the red points are obtained from (14), subjected to the constraint n = 6, and the dashed blue and yellow curves correspond to µ 2 p 2 6 = 1/2π and µ 2 + p 6 = 1, respectively. Simulations suggest that the known lower bound of (16) can be relaxed to 0.27, that is, µ 2 p 2 6 = 1/2π, 0.27 ≤ p 6 < 0.66.
In the right panel of Figure 11, we have shown β as a function of µ, where the dashed brown curve represents β = 36µ and the green points depict the values of (µ, β) obtained from (14), subjected to the constraint n = 6.  Figure 11. In the (left panel), the dashed blue and yellow curves correspond to µ 2 p 2 6 = 1/2π and µ 2 + p 6 = 1, respectively. The red points are obtained from (14), subjected to the constraint n = 6. This plot suggests that the known lower bound of (16) can be relaxed to 0.27. The (right panel) presents a comparison between the analytical result of β = 36µ, shown as a dashed brown curve, and the values of (µ, β), shown as green points, obtained from (14).
As p 6 decreases, going from 0.27 to 0.25, the peak of p n shifts from n = 6 to n = 5 and remains so up to p 6 = 0.16, see the left panel of Figure 12. Again, we can approximate p n by a Gaussian, which this time peaks at n = 5. In the right panel of Figure 12, we have shown the values of (p 5 , µ 2 ), obtained from (14) and subjected to the constraint n = 6, in red points, and the Gaussian as a dashed blue curve.  (14) from n = 6 to n = 5 as p 6 decreases from 0.27 to 0.25. In the (right panel), the red points depict (p 5 , µ 2 ) obtained from p n and subjected to the constraint n = 6, and the Gaussian is shown as a dashed blue curve.
By decreasing p 6 further, the peak shifts from n = 5 to n = 4, and eventually, p n becomes monotonically decreasing, see the left panel of Figure 13. For small values of p 6 , going from 0.09 to 0.07, p n becomes a U-shaped distribution, as is shown in the right panel of Figure 13.  Figure 13. (Left panel) shows that as p 6 decreases, the peak of p n shifts from n = 5 to n = 4, and eventually, p n becomes a monotonically decreasing distribution. The (right panel) depicts a change in the shape of p n from a monotonically decreasing to a U-shaped distribution for small values of p 6 . In both panels, it is assumed that n = 6.
Most two-dimensional cellular networks in nature have an abundance of hexagons, and they likely obey (17) and (18). Low values of p 6 may correspond to amorphous or artificially generated networks. In the following, we examine several cases of mosaics that are artificially generated: random fragmentation, Feynman diagrams, the Poisson network, and semi-regular Archimedean tiling. We demonstrate that all these networks still obey p n in (14) with the constraint n = 6.
In [52], specific artificial, two-dimensional cellular structures are generated by a fragmentation process. One way to construct these networks is by a random selection of a cell among all cells, and then this cell is to be fragmented into two cells by adding an edge randomly. The side number distribution of cells in this system is obtained by a mean-field model as [52], P Fragmentation, c (n) = P Fragmentation, c (n − 1) where α = 0.356 and P Fragmentation, c (6) = 0.125. Equation (19) can be solved as P Fragmentation, c (n) = 108063 |Γ(n − 0.691 + 2.35i)| 2 Γ(n + 6.236)Γ(n − 2) . (20) In the top-left corner of Figure 14, we have shown P Fragmentation, c (n) as a blue curve and p n in (14) as a dashed orange curve.  (14) with the constraint n = 6. The blue curves in P Fragmentation, c (n), P Fragmentation, e (n), P Feynman (n), and P Poisson (n) correspond to (20), (21), (22), and (23), respectively.
Another way to construct such networks is by a random selection of an edge among all cell edges followed by selecting one of the cells which shares this edge, and then this cell is to be fragmented into two cells as in the previous case [52]. The probability distribution of the number of cell sides reads [52], with P Fragmentation, e (4) = 0.196 and P Fragmentation, e (6) = 0.134. In the top-right corner of Figure 14, a comparison between P Fragmentation, e (n) and p n is shown. The ensemble of planar Feynman diagrams with a cubic interaction (i.e., planar φ 3 diagrams with a fixed number of vertices) is equivalent to the ensemble of polygons with trivalent vertices [53,54]. The probability distribution of the number of cell edges is obtained as [53,54], P Feynman (n) = 16 (n − 2)Γ(2n − 1) Γ(n)Γ(n + 1) See the bottom-left corner of Figure 14 for a comparison between P Feynman (n) and p n .
The two-dimensional Poisson network studied in [55] can be obtained from a tessellation of a surface based on Poisson point distribution. The distribution of the number of cell sides reads [55], A comparison between P Poisson (n) and p n is shown in the bottom-right corner of Figure 14.
The Archimedean tilings, obtained by Kepler, are the analogs of the Archimedean solids. Eight of them are semi-regular and consist of regular polygons at each vertex [56].
In the left panel of Figure 15, we have shown one of these semi-regular tilings, known as truncated hexagonal tiling, consisting of two dodecagons and one triangle at each vertex. The right panel of Figure 15 shows p n in (14) as p 6 → 0 and n = 6. This plot corresponds to a pattern that comprises an abundance of triangles with dodecagons amongst them and is in agreement with truncated hexagonal tiling.

Human Cone Mosaics
In this subsection, we examine Lemaître's law in the case of the human retina, which can be viewed as a natural, two-dimensional cellular network. To partition the retinal field of Figure 1 into polygons, we construct the corresponding Voronoi tessellation. Each Voronoi polygon is generated by a cone cell in a way that all points in a given polygon are closer to its creating cone cell than to any other [57]. In the top row of Figure 16, we have shown Voronoi tessellations of the spatial arrangements of blue, green, and red cones in a living human retina. At the bottom, the Voronoi tessellation of the whole pattern of cones is presented. The fractions of n-sided bounded polygons are reported in the figure caption.
If we assume a high value of p 6 indicates the regularity of the corresponding cone mosaic, Figure 16 demonstrates that the spatial arrangement of blue cones is more random than those of green and red cones, where for blue cones we have: p b 6 = 0.143 while p g 6 = 0.360 and p r 6 = 0.378 for greens and reds, respectively. This finding is in agreement with [28]. Note that, as is shown at the bottom of Figure 16, in contrast to the cone subtypes, the whole spatial arrangement of human cones is highly ordered, with p 6 = 0.718.
We have shown Lemaître's law as applied to human cone mosaics in Figure 17. In the left panel-the case of blue cone mosaic-the experimental value of (p b 5 , µ 2 ) is depicted as a blue point, and the dashed dark-gray curve corresponds to µ 2 p 2 5 = 1/2π and the dashed light-gray curve to µ 2 + p 5 = 1. The cases of greens, reds, and the entire pattern of cones (in black) are shown in the right panel.
As another illustration, the behavior of cones in a different subject is shown in Figure 18. The image in the left panel, adapted from [58], shows human cone mosaics at six different retinal locations: two, four, six, eight, ten, and twelve degrees of retinal eccentricities, temporal to the fovea. The right panel shows the agreement between human cone mosaics and Lemaître's law.  Figure 17. Blue, green, red, and black points depict the experimental values of (p b,g,r n , µ 2 ), n = 5, 6, for human cone mosaics (the black point represents the whole pattern of cones). Lemaître's law is shown as dashed gray curves.

Vertebrate Cone Mosaics: From Rodent to Bird
Here, we apply the approach of Section 4.1 to rodent, dog, monkey, human, fish, and bird. The results are summarized in Figures 19-24. In each case, the experimental value of (p color n , µ 2 ) is depicted in the color of its respective cone subtype, and the black point represents the whole pattern of cones in a given retinal field.  , µ 2 ) as points with their respective colors. The black point corresponds to the whole retinal field.

Concluding Remarks
In this work, we have applied the principle of maximum entropy to explain various forms of retinal cone mosaics in vertebrate eyes and established a parameter called retinal temperature or coldness, which is conserved throughout different species as diverse as rodent, dog, monkey, human, fish, and bird, regardless of the details of the underlying mechanisms, or physical and biological forces. This approach has enabled us to predict the frequency of the appearance of cone cells only by tuning a single parameter. The only constraint of the Lagrange problem stems from the repeatable nature of the experiments in biology.
Lemaître's law, which relates the fraction of hexagons to the width of the polygon distribution in numerous two-dimensional cellular networks in nature and is usually obtained by assuming an ad hoc constraint, here is derived as a special case of our formalism. We have shown that various networks, whether artificially generated or natural, obey this universal law.
Since we have considered a completely general constraint in the entropy maximization procedure, the approach of the current paper can be exploited to explain other patterns or processes in nature. In the case of failure, it implies that either additional information, which stems from the knowledge of the underlying mechanisms, needs to be considered, or that the assumed information is incorrect. Indeed, this is one of the pitfalls of the maximum entropy approach as it is not falsifiable, and there are no criteria for its validity within itself [8,59].
Although in many cases, as in this paper, we can explain and predict the phenomena without knowing the details of the underlying dynamics, the principle of maximum entropy can still lead us to a better understanding of the involved mechanisms by validating the assumed information about the system.

Data Availability Statement:
We made secondary use of published data that can be found in the corresponding cited publications.