Maxima of log-correlated fields: some recent developments

We review recent progress relating to the extreme value statistics of the characteristic polynomials of random matrices associated with the classical compact groups, and of the Riemann zeta-function and other $L$-functions, in the context of the general theory of logarithmically-correlated Gaussian fields. In particular, we focus on developments related to the conjectures of Fyodorov \&Keating concerning the extreme value statistics, moments of moments, connections to Gaussian Multiplicative Chaos, and explicit formulae derived from the theory of symmetric functions.

Our aim in this review is to survey some recent lines of research motivated by conjectures of Fyodorov, Hiary, and Keating [80] and Fyodorov and Keating [81]. These conjectures relate to 1 the extreme value statistics of the characteristic polynomials of random matrices associated with the classical compact groups, and to the extreme value statistics of the Riemann zeta-function and other L-functions. The past decade has seen considerable progress towards proving these conjectures, although many aspects remain to be understood. Much of this recent progress has been stimulated by establishing connections with ideas and techniques that sit at the interface between statistical mechanics, mathematics physics, probability, combinatorics, and number theory. For example, links with Gaussian Multiplicative Chaos, the general theory of logarithmically-correlated Gaussian fields, the theory of symmetric functions, and methods from analytic number theory have played an important role. We have endeavoured to make this overview as accessible as possible and so have attempted to introduce the wide range of ideas and techniques at the level of a colloquium, rather than focusing on the technical details.
This section introduces the relevant theory so that this review can be as self-contained as possible. Section 2 presents the three main conjectures and discusses progress towards resolving the first two of them. Different approaches have been taken to understand the third conjecture, and these are the subject of section 3. where Ω is the skew-symmetric block matrix (1) Ω := 0 I N −I N 0 .
The special orthogonal subgroup SO(N) is the set of orthogonal matrices with unit determinant. The coset 1 O − (N) contains the orthogonal matrices with determinant −1. We will focus mainly on random unitary matrices as a canonical example, denoting by A * ≡ A T the conjugate transpose of A. For a more general account of these topics, and many more, see [1,116,117]. A key property of matrices from the classical compact groups is that their eigenvalues lie on the unit circle in the complex plane, as illustrated in figure 1. We introduce the following general notation. Let A ∈ G(N) be a matrix from one of groups G(N) ∈ {U(N), Sp(2N), SO(2N)}. We write for its characteristic polynomial (2) P G(N ) (A, θ) := det I − Ae −iθ , Since the eigenvalues of A lie on the unit circle, we write the polynomial variable as e −iθ and consider real θ. Furthermore, if A ∈ Sp(2N) or A ∈ SO(2N) then, whilst the matrix size is 2N, the eigenvalues of A come in N complex conjugate pairs, hence the N in the subscript rather 1 Clearly, O − (N ) is not a group. than 2N. Whenever we focus on one particular group, for simplicity we will often omit the label, writing P N ≡ P G(N ) . Since U(N), Sp(2N), and SO(2N) are compact Lie groups, one can endow each with a translationinvariant measure, namely a Haar measure (for numerous constructions of the measure see [116]). Taking U(N) together with its Haar measure corresponds to the Circular Unitary Ensemble, or CUE. This ensemble was first introduced by Dyson [71], together with the Circular Orthogonal and Circular Symplectic Ensembles, COE and CSE respectively 2 . There is also a more general notion of the Circular β ensemble, cf. (154).
One way to express the Haar measure µ Haar on a particular compact group is via the explicit formulae of Weyl [144]. We use the unitary group as an example. Firstly, we define a class function where ∆(z 1 , . . . , z N ) is the Vandermonde determinant 3 1≤i<j≤N (z j − z i ). The proof relies on the fact that translation-invariance of the Haar measure also implies invariance under similarity transformations, i.e. (4) dµ Haar (A) = dµ Haar (UAU * ) for U ∈ U(N). As unitary matrices are diagonalizable by such a transformation, integrals with respect to dµ Haar (A) can be expressed as integrals over the eigenphases of A. Similar formulae hold for symplectic and orthogonal matrices, see Weyl [144]. Henceforth, for ease of notation, we write (5) dA := dµ Haar (A) whenever the context allows. 2 It is important to emphasize that, unlike with the CUE, the COE and CSE are not simply O(N ) or Sp(2N ) together with their respective Haar measures. See, for example, Mehta [117]. 3 By convention, ∆(z 1 ) = 1.
This integral can be evaluated [18,110], giving the following. where Γ(z) is the usual extension of the factorial function.
By (11), one observes that for β ∈ N, M N (β) is a polynomial in N of degree β 2 .
The key tool in the proof of theorem 1.1 is the celebrated Selberg integral, see for example [117].
Hence, on average log |P N (A, θ)| ∼ (1/2) log N. Both the Gaussian nature of log |P N (A, θ)| and the scaling in theorem 1.3 will be important for the remainder of this review, particularly in section 1.3. Large deviations are a focal point of section 2.3. See also (207) and the surrounding discussion for results concerning large deviations of (16).
1.2. The Riemann zeta function. One of the main applications of random matrix theory over the past twenty years has been to number theory. We now review the relevant number theoretic concepts, beginning with the definition of one of the central functions of number theory. In general, expressions of the type (17) ∞ n=1 a n n s for s, a n ∈ C are known as Dirichlet series. Hence, ζ(s) can be defined by a Dirichlet series with a n = (1, 1, 1, . . . ). Equivalently, ζ(s) can also be expressed as a product over primes, known as an Euler product, (18) ζ(s) = The moment generating function for the imaginary part exp(it Im(log P N (A, θ))) can be similarly constructed, and one can write down the generating function of the joint moments, see [110]. also for Re(s) > 1. Products of the form appearing in (18) are over primes p, unless otherwise explicitly stated. The equality between the Dirichlet series and the Euler product formulation follows from the fundamental theorem of arithmetic. One can analytically continue ζ(s) to all of the complex plane, with the exception of a simple pole at s = 1 (many proofs of this fact can be found, for example, in [72]). A consequence of the meromorphic continuation is the functional equation, (19) ζ(s) = 2 s π s−1 sin πs 2 Γ(1 − s)ζ(1 − s). Using (19), it is easy to see that there are zeros at the negative even integers coming from the sine function. These are known as trivial zeros. The functional equation also implies symmetries for the remaining non-trivial zeros. Denoting by ρ n any zero of ζ(s) other than those at the negative even integers (i.e. ρ n are the non-trivial zeros), then ρ n must lie in the critical strip, 0 < Re(s) < 1. Additionally, if ρ n is a non-trivial zero of ζ(s), then so are 1 − ρ n , ρ n , and 1 − ρ n .
The Riemann hypothesis states that the non-trivial zeros lie in the centre of the critical strip, on the critical line as illustrated in figure 2. Conjecture 1.1 (Riemann hypothesis). Re(ρ n ) = 1 2 ∀ n. The proportion of non-trivial Riemann zeros that lie on the critical line is currently known to be at least 0.41729 ('more than 5/12') [126]. This improves a result of Conrey [49], which gave 'more than 2/5' of zeros on the critical line, and the subsequent improvement of Bui et al. [36] to 'more than 41%'.
Moving beyond the non-trivial zeros of ζ(s), a related question is the size of ζ(1/2 + it) for t ∈ R, either on average or the exceptionally large values. Selberg proved the following central limit theorem for ζ(s) [134]. Theorem 1.5 (Selberg's Central Limit theorem [134]). For any rectangle B ⊂ C, Thus, both the real part and the imaginary part of the logarithm of the zeta function independently tend to a Gaussian random variable. Note that this means that the typical size of log |ζ(1/2 + it)| is O( √ log log t); the Lindelöf hypothesis states that ζ(1/2 + it) = o(t ε ), for any ε > 0. The Riemann hypothesis implies the Lindelöf hypothesis. The similarities between (16) and theorem 1.5 are noteworthy.
More generally, one is interested in determining both the size of the moments of ζ(s) over long stretches of the critical line 6 , (20) 1 T There are natural generalizations of the Riemann zeta function known as L-functions. These will feature in section 3.3. Most broadly, an L-function is a Dirichlet series with an Euler product and a functional equation (cf. (17), (18), and (19)). In order to keep this review self-contained we focus on two canonical extensions. For a comprehensive account, we refer to [96].
Clearly, the Riemann zeta function is an L-function. The simplest extension to ζ(s) is the Dirichlet L-function for the non-trivial character of conductor 3, defined as follows.
One can generalize definition 1.6 to moduli other than 3. Take an integer d such that if k ≡ 1 mod 4, k square-free, or, 4m if m ≡ 2 or 3 mod 4, and m is square-free. 6 Notice that the Lindelöf hypothesis is equivalent to the statement that the moments in (20) are o(T ε ) for all ε > 0 and β ∈ N. A more precise form of such a moment conjecture is discussed in section 1.3, see in particular conjecture 1.2. 7 The reason for using −3 rather than 3 in the notation will become apparent shortly.
Define for such d where d n is the Kronecker symbol, the generalization of the Legendre symbol. Explicitly, for an integer n with prime decomposition n = u · p e 1 1 · · · p e k k , with u = ±1 and p j prime, then In the right hand side of (25) a p is the Legendre symbol, which take the values for p = 2, and Finally, a 1 := 1, a −1 := −1 if a < 0 and 1 otherwise, and a 0 := 1 if a = ±1 and 0 otherwise. Then χ d is called a real Dirichlet character. When d = 1, χ d is the trivial character (taking the value 1 for all n), and for other fundamental discriminants d, χ d is a real, primitive, quadratic Dirichlet character of modulus d. Notice that for d = −3 (the first negative fundamental discriminant), χ −3 (n) = −3 n which matches (21). Given χ d , a real, quadratic Dirichlet character modulo a fundamental discriminant d, the associated Dirichlet L-function is Such L-functions again have an Euler product, a functional equation, and a meromorphic continuation to the full complex plane (see [96]). Further, they have an associated Riemann hypothesis which conjectures that the non-trivial zeros of L(s, χ d ) are also on the critical line Re(s) = 1/2. The second example of an L-function that we give is associated to elliptic curves. Consider an elliptic curve E defined over Q, (29) E : for a, b ∈ Z such that the discriminant ∆ = −16(4a 3 + 27b 2 ) = 0 (which ensures that E has distinct roots, or equivalently, is non-singular). If the pair x, y ∈ C form a solution to the equation 8 The name is due to the fact that such d are the discriminants of quadratic number fields, with d = 1 being the 'degenerate' quadratic field Q.
defining the curve E, then we say they lie on E and sometimes write (x, y) ∈ E. Given a prime p ∤ ∆, one can consider the number of points on E modulo p. This leads to the following definition (30) a p := p + 1 − |{(x, y) ∈ E : x, y ∈ Z/pZ}|.
These coefficients a p are used when constructing the L-function for the curve E, which is defined by its Euler product 9 where ½ p∤∆ is 1 for the 'good primes' not dividing the discriminant, and 0 otherwise. Just as with ζ(s), from the Euler product one can derive the appropriate Dirichlet series, and in turn the meromorphic continuation and the functional equation [97].
1.3. Random matrix theory and number theory. The origins of the connection between random matrix theory and number theory can be traced back to a conversation in 1971 between Hugh Montgomery and Freeman Dyson, introduced at tea at the Institute for Advanced Study by Sarvadaman Chowla. The conversation turned to the study of pair correlations of eigenvalues and of Riemann zeta zeros, which turn out to have an identical form. We shall focus in this section on the unitary case, noting that there are more general connections between more general number theoretic functions -L-functions -and the other classical compact matrix groups (see [17,52,103,104,109,110,118,132]), the discussion of which we postpone until later. For brevity, we temporarily write P N ≡ P U(N ) for the characteristic polynomial of a matrix in the CUE.
The two-point correlation function of the eigenphases of A ∈ U(N) has an expression due to Dyson (see [71,117]). Denote by e iθ 1 , . . . , e iθ N the eigenvalues of A, and rescale the eigenphases θ j so that on average they have unit spacing, (32) φ j := θ j N 2π .
The two-point correlation function for A is Dyson established the following result.
Theorem 1.7 (Dyson's Pair Correlation [71]). Take A ∈ U(N) and let φ 1 , . . . , φ N be the normalized eigenphases of A, see (32). For test functions f such that f (x) → 0 as |x| → ∞, Hence, taking the test function to be f (x) = 1 for x ∈ [α, β] and 0 otherwise one finds (35) lim The equivalent calculation for the non-trivial zeros of the Riemann zeta function was performed by Montgomery [119].
The non-trivial zeros of ζ(s) are ρ n = 1/2 + it n , with Re(t n ) > 0 and where the ordering on the zeros is by height 10 . Define to be the number of non-trivial zeros up to height T . Then it can be shown, see for example [142], that as T → ∞. This proves that there are infinitely many non-trivial zeros. Further, the mean density increases logarithmically with height T . We now assume the Riemann Hypothesis to be true, so that Im(t n ) = 0 ∀ n. Following the construction for the pair correlation of eigenvalues, we rescale the zeros so that they have unit mean spacing, (38) w n := t n 2π log t n 2π .
Montgomery's pair correlation conjecture is that, This conjecture is motivated by a theorem of Montgomery [119], which can be stated as follows for a test function f with Fourier transform supported on (−1, 1), and such that the left and right sides of (40) converge. The pair correlation conjecture corresponds to choosing f as the indicator function on [α, β) (whose Fourier transform does not vanish outside (−1, 1)).
The similarity between (39) and (35) is apparent. Important numerical evidence for the conjecture has come from the work of Odlyzko [123], and heuristics and generalizations of Montgomery's theorem have been developed for the general k-point correlation function [25,26,92,132]. A visual comparison can be found in figure 3. There we have compared plots of 50 points taken from a uniform distribution on the unit circle, with the eigenvalues of a random matrix drawn from U(50), and 50 consecutive non-trivial zeros of ζ(s), scaled to wrap around the unit circle. The figure shows that the zeta zeros display similar 'repulsion' to that seen for the eigenvalues (see (3)), and also that their distribution appears far from uniform on the scale of the mean spacing. 10 If ρ n is a non-trivial zero of ζ(s), then so is ρ n . Thus, it suffices to only consider zeros with positive imaginary part.
(A) 50 points drawn from the uniform distribution on the unit circle.
(C) 50 consecutive non-trivial zeros of ζ(s), from 201 st to 250 th , data from [141], scaled to lie on the unit circle. Further evidence for a connection can be found by comparing theorem 1.3 with Selberg's central limit theorem 1.5 for the Riemann zeta function. In both cases, the real and imaginary parts of the respective logarithms tend independently to Gaussian random variables. Additionally, if one sets then the scalings in both theorems agree. The same identification in the unit mean scaling (38) reveals that the average density of zeros matches the average density of eigenvalues in (32). In light of this apparent connection, one may try to model, in a statistical sense, the Riemann zeta function by unitary characteristic polynomials. To this end, we focus on a long-standing number theoretic conjecture.
Let us now compare Using theorem 1.1, Keating and Snaith [110] conjectured that which matches with the all known and conjectural cases (45)- (49). In (43), the leading order asymptotics of the moments involves the product of c ζ (β), which is conjecturally given by a random matrix calculation, and the arithmetic factor a ζ (β). That it should split in such a way is motivated by a formulae due to Gonek, Hughes and Keating [83] according to which ζ(s) can be written a truncated product over the primes multiplied by a product over zeros close to s.
As commented above, Keating and Snaith's work implies, for integer β, that the 2βth moment of ζ(1/2 + it) is of the order of (log T 2π ) β 2 . Conrey et al. [52] extended this conjecture to This polynomial has a multiple contour integral representation of a kind that we shall discuss in more detail in section 3. We note in passing that the conjectures of Keating and Snaith and of Conrey et al. extend to other L-functions [52,109]. The classical number theoretic approach to understanding moments of the Riemann zeta function uses Dirichlet polynomial approximations of ζ(s) and higher powers. Analysis in this direction, as discussed above, gives the second and fourth moments and strong conjectural forms for the sixth and the eight moments. However, this method fails for β ≥ 5 since it predicts negative values.
Recently, Conrey and Keating [44][45][46][47][48] have demonstrated heuristically why the Dirichlet polynomial method in its traditional form fails: for higher β it is essential that one uses much longer Dirichlet polynomials than is conventional to avoid missing important terms. When one does so, the resulting formulae match exactly the polynomials Q β (x) conjectured in [52] for all β ∈ N.
The recipe that they derive is based on divisor sums and can be viewed as a type of 'multidimensional Hardy-Littlewood circle method', or 'Manin-type stratification'. Consider where A, B are sets of size β of 'shifts' and ψ is a smooth function of compact support. Note that by letting all α, β → 0 we recover the desired 2βth moment. Conrey and Keating demonstrate that (53) can be evaluated heuristically either by multiple contour integrals of the type appearing in (52), or by examining the Dirichlet series where the arithmetic divisor function τ A (n) is defined by the Euler product expansion of the left hand side of (54).
1.4. Extreme value theory. We will be interested here in the extreme values of various functions (for example, characteristic polynomials, or ζ(s)). The introduction provided within this section is tailored to support the ideas of section 2; for a general introduction to the topic see Leadbetter, Lindgren, and Rootz [114], and de Haan and Ferreira [57]. A central result within the field is the Fisher-Tippett-Gnedenko Theorem.
If there exists a n > 0 for all n, and b n such that where F is non-degenerate, then F is an extreme value cumulative distribution function and belongs to one of three classes: Type (I), Type (II), and Type (III) are known as Gumbel, Fréchet, and Weibull distributions respectively. Of particular interest within the context of this exposition is the following result (see for example Leadbetter et al. [114], Theorem 1.5.3). Theorem 1.9. Let {Z 1 , Z 2 , . . . } be independent and identically distributed standard Gaussian random variables. As in theorem 1.8, let M n := max{Z 1 , . . . , Z n }. Then define (55) a n := 1 √ 2 log n , and (56) b n := 2 log n − log log n + log 4π 2 √ 2 log n . Then Thus, after rescaling, the maximum of a collection of standard Gaussian random variables has a Gumbel (Type (I)) distribution.
Theorem 1.9 shows that for standard Gaussian random variables the approximate size of the maximum is where M is a Gumbel random variable. It will soon be useful to consider a dyadic number of variables with non-unit variance. Take Y 1 , . . . , Y 2 n independent Gaussian random variables, centred and with variance σ 2 n and call their maximum M 2 n . By rescaling the resulting a 2 n from theorem 1.9, one has where c = 2σ 2 log 2 and M has a Gumbel distribution. For comparative purposes, we remark that the leading order is linear in n, and the subleading term is logarithmic in n with a coefficient of 1/2. Notice that both theorems 1.8 and 1.9, and the subsequent discussion regarding the form of the maxima, importantly assume independence of the random variables. The central question of this review, however, concerns variables that instead are logarithmically correlated.
1.5. Log-correlated fields. Here we introduce key results and ideas that will underpin many of the areas covered in this review. For a more thorough overview of the wider research area see Duplantier et al. [70], and Arguin [2] for an excellent survey with similar aims to this exposition.
One can define a stochastic process X n = {X n (l) : l ∈ L n } on a metric space L n with a distance | · |, so that the dimension of the space dim L n depends on n. Then, the defining feature of log-correlated fields is the form of the covariance 11 , for l, l ′ ∈ L n . 11 Here we write '≈' to encompass any covariance structure which has a logarithmic singularity at l = l ′ .
As in section 1.4, choosing L n to be a discrete field with 2 n points will prove to be a particularly instructive example. Take a log-correlated field X n = {X n (l), l ∈ L n } with |L n | = 2 n , so that X n has covariance as defined by the right hand side of (60), and such that the X n (l) are centred with variance E[X n (l)] 2 = σ 2 n.
In general in this case, the maximum M 2 n = max l∈Ln X n (l) has the following form where c = 2σ 2 log 2. The distribution of M is expected no longer to conform to the Gumbel distribution 12 .
We here highlight the similarities and differences between the maximum of log-correlated processes (61) and the maximum of independent Gaussian random variables, see (59). For both, the leading order of the maximum is linear in n, and the subleading term is of the order log n, but the subleading coefficient differs between the cases: −1/2 versus −3/2. If the random variables are all independent, then the maximum isn't 'pulled down' as much as when the variables are log-correlated. Such behaviour is expected to be universal within each class.
Important examples of log-correlated processes are branching random walks, the logarithm of the characteristic polynomial of a random unitary matrix, and the Gaussian free field in two dimensions.
We now discuss in more detail two models exhibiting such logarithmic correlations: the generalized Random Energy Model (GREM) and branching random walks. These will lay the groundwork for the branching model that will play a central role in section 2.
The Random Energy Model. The 'Random Energy Model' (REM) was introduced by Derrida [62] in the study of spin glasses. The REM is a stochastic process on the hypercube {−1, 1} n . For each point in the state space, one associates the independent random variable X n (l) ∈ N (0, n).
Translating in to the language of log-correlated fields, we have the process X n = {X n (l) : l ∈ {−1, 1} n }. The partition function for the REM is defined as (62) Z n (β) := l∈{−1,1} n exp(−βX n (l)), where β > 0 represents the inverse temperature of the system and X n (l) plays the role of the energy. Note that the maximum of the REM will follow (59) with σ 2 = 1 due to the assumption of independence of the X n (l). Additionally, the free energy associated to Z n (β) is The free energy is related to the extreme values of the energy, M 2 n := max l∈{−1,1} n X n (l), via One can generalize the REM so that the random variables X n (l), l ∈ {−1, 1} n are no longer independent, but instead depend on the distance |l − l ′ | for l, l ′ ∈ {−1, 1} n . This is a generalized random energy model (GREM). Adaptations in particular which introduce logarithmic correlations of the form (60) have inspired considerable research interest; see for example [38,63,77,81]. Once again, a similar process to that described above yields a connection between the free energy for the GREM and its maximum, which should instead now follow (61).
Branching Random Walks. Perhaps the simplest example where the '3/2' coefficient can be proven to appear is the case of a Gaussian random walk on a binary tree. This is also sometimes referred to as the hierarchical Gaussian field. Take a rooted binary tree of depth n and, in the language introduced at the start this section, let L n be the leaves of such a tree. For a fixed leaf l ∈ {1, . . . , 2 n }, define the random variable X n (l) by where the Y m (l) ∼ N (0, σ 2 ) are independent and identically distributed. Thus, X n (l) is a random walk from root to leaf l. Clearly, the distribution of Y m (l) does not depend on the level m, nor the leaf l, but we retain the notation to make the connection with the binary tree clear. Note also for a comparison with the (G)REM that X n (l) ∼ N (0, nσ 2 ) and, whilst the Y m (l) are independent, two walks X n (l) and X n (l ′ ) generally will not be independent for l = l ′ . Figure 4 illustrates the process.
. An example of random walks on a binary tree of depth n = 4, from root to leaves l and l ′ . Some weightings Y j (l) are highlighted, where Y j (l) ∼ N (0, σ 2 ). The last common ancestor of leaves l, l ′ is illustrated by the 'hollow' (red) node and occurs at level 1.
We shall see that the binary tree model is central to the progress detailed within sections 2 and 3 To demonstrate that X n is log-correlated, one determines the covariance structure. Firstly, one has To determine the covariance, one requires the notion of the last common ancestor of two leaves l, l ′ . We also henceforth consider the tree embedded in the interval [0, 1] with the 2 n leaves equally spaced, cf. figure 4. Definition 1.10 (Last common ancestor). The last common ancestor of two leaves l and l ′ of a binary tree is the first point at which the paths from root to l and from root to l ′ diverge. This is illustrated in figure 4. The level of the least common ancestor of l and l ′ is denoted by L(l, l ′ ).
Then, one calculates that Thus, the covariance of X n (l) and X n (l ′ ) depends on the last common ancestor of l, l ′ : One concludes that X n (l) displays the properties of a log-correlated process. Further, take some 0 ≤ r ≤ 1 such that rn ∈ N. Then consider, for a fixed leaf l ∈ {1, . . . , 2 n }, the proportion of neighbours l ′ whose covariance with l is at least σ 2 nr. It is precisely those l ′ whose last common ancestor is at level rn: Such a property is indicative of a log-correlated system, see for example [2].
Given that the branching random walk has a log-correlated structure, a natural question is to investigate its maximum. Bramson [34] was the first to identify to subleading order the maximum of the above field X n with, notably, the subleading constant 3/2. Theorem 1.11 (Bramson [34]). Let X n (l) be a branching random walk from root to leaf l on a binary tree of depth n, with independent, centred, Gaussian increments of variance σ 2 . Set M 2 n := max l∈{1,...,2 n } X n (l). Then where c = 2σ 2 log 2 and x is a bounded fluctuating term.
1.6. Gaussian multiplicative chaos. Within this section, we give an overview of the ideas and results from the theory of Gaussian multiplicative chaos (GMC) that will be relevant in the present context. For an excellent general review of the topic, we direct the reader to the paper of Rhodes and Vargas [131].
The origins of GMC trace back to the work of Kahane [102], who introduced the theory for understanding the exponential of a Gaussian field whose covariance has a logarithmic singularity. To this end, take D ⊂ R d a subdomain, and X = {X(v) : v ∈ D} a Gaussian field so that as v → w, and for g some bounded function over D × D. Clearly the covariance (75) implies a connection to the log-correlated fields discussed previously.
The log-singularity present in (75) is precisely the cause of the difficulty when constructing the measure associated with the exponential of the field, for some γ ∈ R. A natural solution is to 'regularize' the field X: introduce a smooth cut-off X n (v) so that in the large n limit, X n (v) → X(v). For such a cut-off, if one is able evaluate for some limiting measure µ γ (dx), then (76) is defined to be said measure. Kahane constructed such an X n and showed that the limiting measure µ γ is non-trivial for a certain range of γ.
Theorem 1.12 (Kahane [102]). Let v, w ∈ D and assume that there exists a continuous and bounded function g : D × D → R such that and further that the covariance has a decomposition for K i continuous and positive definite covariance kernels. Let Y i be the Gaussian field with mean 0 and covariance given by Then for γ ∈ R, the measures converge almost surely in the space of Radon measure (for topology given by weak convergence) to some random measure µ γ (dx). This convergence is independent of the regularization of X.
The measure µ γ is the zero measure for γ 2 ≥ 2d (where recall our field is defined with respect to D ⊂ R d ), and non-trivial for γ 2 < 2d.
Frequently, the range [0, 2d) (where the measure is non-trivial) is broken up in to two sections, known as the L 1 − and L 2 −phase: For the cases considered in this review, d = 1. When demonstrating convergence to the GMC measure for a particular field X, it is often the case that the two ranges require different techniques (see section 2.3). Additionally we remark that, as formulated above, the measure µ γ (dx) is trivial for γ 2 ≥ 2d. It is possible to construct a GMC measure for γ 2 ≥ 2d. The case of γ 2 = 2d yields a phase transition and is known as critical chaos. Conjecturally, see for example [131], all constructions for the critical chaos measure are the same. In the super-critical regime of γ 2 > 2d, one can define a new class of chaos known as atomic multiplicative chaos. We refer the reader to [131], and the references therein, for further details. 1.7. Symmetric function theory. A function f in n variables is symmetric if it is invariant under permutations of its arguments. That is, if σ ∈ S n (the group of permutations on n symbols) and if (83) f then f is a symmetric function.
It is sometimes useful to extend partitions with finitely many zeros.
In this case, we identify all partitions which share the same non-zero portion, has m i = m i (λ) elements equal to i, then we can write λ in multiplicative notation, Given a partition λ, one can pictorially represent λ using a Young diagram (sometimes called a Ferrers diagram).
i=1 be a partition. The Young diagram of λ is a collection of |λ| boxes arranged in l = l(λ) left-justified rows. The first row has λ 1 boxes, the second has λ 2 , and so on until the lth row which has λ l boxes. Figure 5a shows an example of a Young diagram. Definition 1.15 (Semistandard Young tableau). Given λ, a semistandard Young tableau (SSYT) is a Young tableau of shape λ with the additional requirement that entries must strictly increase down columns and weakly increase across rows. This means that the alphabet must have at least max{λ 1 , l(λ)} symbols. An example is given by figure 5b. We denote by SSYT n (λ) the set of semistandard Young tableaux of shape λ with entries in {1, . . . , n}.
where t j = t j (T ) is the number of times j appears in T . Note that if n > l(λ) then we extend λ with zeros until it has length n.

THE CONJECTURES OF FYODOROV & KEATING
We will now turn our attention to a series of conjectures due to Fyodorov, Hiary, and Keating [80], and, in more detail, Fyodorov and Keating [81]. These tie together the statistical mechanics of log-correlated fields (introduced in section 1), the characteristic polynomials of random unitary matrices, and properties of the Riemann zeta function. They form the platform for much of the material we shall review here.
We first state the conjectures in section 2.1, and then we explain their motivation in section 2.2. The remaining part of the section is dedicated to reviewing recent progress towards proving them.
This section only concerns unitary characteristic polynomials. Thus, until we explicitly state otherwise, P N (A, θ) ≡ P U(N ) (A, θ) will represent the characteristic polynomial of A ∈ U(N).

Statement of the conjectures.
All of the following are, directly or indirectly, related to maxima of log-correlated fields. For context, we recall (61) here specialized with σ 2 = 1 2 log 2. Specifically, we take V n to be a metric space and X n = {X n (v), v ∈ V n } to be a log-correlated field with mean zero and E[X n (v) 2 ] = σ 2 n = n 2 log 2. We further write N = 2 n . The maxima of the field is where M is an O(1) random variable. With these choices of σ 2 and N, the 'log-correlated constant' is renormalized to 3/4, versus the 'independent constant' of 1/4 (compared to the 3/2 vs. 1/2 between (61) and (59)). The first, most general, conjecture is the following [81].
, and a matrix A sampled uniformly from U(N), and where N L := NL/2π is the average number of eigenvalues of the associated The similarity between conjecture 2.1 and (89) is related to the fact that log |P N (A, θ)| exhibits structure similar to the log-correlated fields described above with respect to θ. Similar links have been discussed for the imaginary part of log P N (A, θ), see for example Fyodorov & Le Doussal [78], but we shall concentrate here on the real part.
The 'full circle' case (i.e. L = 2π) is arguably the most interesting, and is where the most progress has been made. Conjecture 2.1 then takes the following form.
is a sequence of random variables which converge in distribution 13 .
Fyodorov and Keating further conjecture that the random variable y A,N should converge in distribution to y = G 1 + G 2 , a sum of two independent Gumbel random variables. Thus p(y), the probability density for y, decays like y exp(−y) as y → ∞: where K 0 is a modified Bessel function of the second kind. In light of section 1.3, a link between conjecture 2.2 and the maximum of the Riemann zeta function might be expected.
Here the random variable x t is expected to have a limiting value distribution as t → ∞ similar 14 to (93). In section 2.2, we review the heuristic calculation that is used to justify conjecture 2.2 (and hence also conjecture 2.3). The final conjecture of Fyodorov and Keating we shall focus on in this review is also based on the same heuristic calculation (and, if established in full generality, would provide a method for proving conjectures 2.2 and 2.3). We define first the moment, 13 The reason for introducing a factor of −2 in (92) is explained in section 2.2.
Note, this differs from the moments (6) in that the average is over the spectrum for a fixed matrix A. Taking A now to be a random matrix, g N (β; A) becomes a random variable. Thus, we may additionally define the moments of g N with respect to the Haar measure on U(N). Performing this double average is called taking the moments of moments: The final conjecture of Fyodorov and Keating [81] that we consider gives the asymptotic behaviour of MoM U(N ) (k, β).
and at the transition point k = β −2 , the moments of moments grow like N log N, see section 3. Conjectures 2.2 and 2.3 will be the focus of the rest of this section. Justification for them is given in section 2.2. Significant progress has been made on both fronts and is covered in section 2.3. Conjecture 2.4 is the main motivation behind the work of section 3.

2.2.
Justification for conjectures 2.2, 2.3, and 2.4. Since conjecture 2.2 concerns the maximum of the real part of the logarithm of characteristic polynomials, it will be convenient to define (98) V The reasons for the −2 coefficient will become clear in what follows. Theorem 1.3 implies that V N (A, θ) satisfies a central limit theorem, Furthermore, Hughes et al. [94] show that The following result 15 of Diaconis and Shahshahani [66] concerning powers of traces of A (see as well [85]) is also useful.
Theorem 2.1 (Diaconis and Shahshahani [66]). Take a sequence (X j ) ∞ j=1 of independent and identically distributed complex random variables whose real and imaginary parts are centred Gaussians with variance 1/2. Then, for any fixed k and A ∈ U(N), as N → ∞,

15
Their result can also be deduced from the Strong Szegö theorem [138,140].
Hence, the coefficients Tr A n / √ n in (100) tend to independent and identically distributed complex Gaussian variables. By (100) together with theorem 2.1, see for example [81], one can show that as N → ∞, Hence V N (A, θ) (and so P N (A, θ)) behaves like a log-correlated Gaussian random function, and so one might expect its maximum to fall into the corresponding universality class. Similarly, for the Riemann zeta function, one defines Moreover, the correlation for two points h 1 , h 2 can be calculated in the following way (see [30] or the appendix of [81] for details). Firstly, using the Euler product for zeta to expand each V ζ (t, h j ), where the expectation is the average over [t−h/2, t+h/2] for some h satisfying 1/ log t ≪ h ≪ t. Note that in the large t limit this interval will contain an increasing number of zeros. Then by expanding the product of cosines in (105) and using that the main term comes from the diagonal contribution (p 1 = p 2 , n 1 = n 2 ) with standard prime estimates, one finds The similarity to the covariance of V N (A, θ) is evident. Once again there is a dependence on the distance between the points h 1 , h 2 . If h 1 is very close to h 2 on the scale of mean zero spacing, then essentially V ζ (A, h 1 ) and V ζ (t, h 2 ) are perfectly correlated. However, if they are separated on the same scale as θ 1 and θ 2 must be in (102) (i.e. making the usual identification N ∼ log t), then instead one sees the logarithmic correlation. The fact that V ζ and V N are both log-correlated has important ramifications. Recall that Selberg's central limit theorem reveals that on average log |ζ(1/2 + it)| is on the order of (1/2) log log t. The Lindelöf Hypothesis states that (107) |ζ( 1 2 + it)| = o(t ε ) for any ε > 0. Under the Riemann hypothesis, one has that (108) |ζ for some constant c 1 (see for example [142]). However, it is also known (unconditionally) that 16 Hence, the extreme values of the zeta function must lie between (108) and (109).
)| were independent of each other, then the Fisher-Tippett-Gnedenko theorem (see section 1, section 1.4, and theorem 1.9) could be applied. Such an assumption was examined by Montgomery in the context of estimating the typical size of log |ζ(1/2 + it)|. If one sets X 1 , . . . , X n to be n local maxima of log |ζ(1/2 + it)|, assumed to be independent, then theorem 1.9 would imply that (110) max The log of the right-hand side here is considerably larger than (1/2) log log t, the typical value of log |ζ(1/2 + it)|, and is closer to (109) than (108). However, as demonstrated, the assumption of independence is incorrect.
Bondarenko and Seip [27,28] have shown that This has recently been improved to (1)) log T log log log T log log T by de la Bretèche and Tenenbaum [35]. Both results are unconditional, and more generally cover intervals of length [T α , T ] for α ∈ [0, 1). Using techniques from random matrix theory, Farmer, Gonek and Hughes [74] have conjectured that (113) max Comparatively, the maximum in short intervals (ranges of length O(1)) is more tractable, both theoretically and numerically. In particular, since the ranges of (92) and (94) are considerably shorter than the O(T ) lengths considered in (110)-(113), experimental calculations are feasible [3,79].
We now focus our attention on the techniques used by Fyodorov and Keating to construct the precise form of conjectures 2.2, 2.3, and 2.4. This technique is inspired by a class of problems from statistical physics.
In a statistical mechanics context, Fyodorov and Bouchaud [77] studied a related circularlogarithmic model. Rather than defining V N (A, θ) or V ζ (t, h) (cf. (98) and (103)), one takes a sequence {V 1 , . . . , V M } of random, centred Gaussian variances sampled from the two-dimensional Gaussian Free Field equidistantly along the unit circle. Once again, this is a log-correlated process. 16 In (109), f (x) = Ω(g(x)) means f (x) takes the value g(x) infinitely often.
In [77], Fyodorov and Bouchard showed that in the large M limit, the (positive, integer) moments 17 of the partition function can be calculated using the Dyson-Morris version of the Selberg integral. Explicitly, they show that leading order coefficient α k,β . Fyodorov and Bouchard were also able to use the moments (115) to construct the probability density of the partition function in the 'high-temperature' range |β| < 1. It is instructive to compare (97) to (115).
where β > 0. In light of the above discussion, g N can be viewed as a partition function. Then, V N (A, θ) is the 'energy' for the system and β is the inverse temperature. The reason for including −2 in the definition of V N (A, θ) and V ζ (t, h) is now also apparent. A related, important function (cf. section 1.5) is the corresponding free energy for the system: The maximum of log |P N (A, θ)| can be recovered as the large-β limit A similar construction can be made for log |ζ(1/2 + it)|.
Fyodorov and Keating demonstrate that the free energy (117) has a 'freezing' property (related to the temperature parameter β). Define the normalized free energy to be By considering the average of F (β) with respect to the Haar measure, they argue (see [81]) that for β small (i.e. high temperature) the average of F (β) is governed by the typical values of P N (A, θ). By theorem 1.1, the moments behave like N β 2 . Hence, for small β one expects However, as β grows large (i.e. as temperature decreases), the free energy will instead be governed by the extreme values (cf. (118)), which by conjecture 2.2 scale as log N to leading order. Hence in the large N limit for large β, instead The expectation in (114) is with respect to the Gaussian random variables. This is the meaning of freezing for this system: as the temperature moves from small to large β (i.e. temperature decreases), the free energy reaches a critical temperature and thereafter remains constant. Here, the critical temperature is β = 1: The moments of the random variable g N (β; A) with respect to the Haar measure on U(N) are, by (118), related to the maximum of log |P N (A, θ)|. The exact quantity of interest is . The values z = e iθ j correspond to so-called 'Fisher-Hartwig' singularities. Using Widom's result [145] on the Fisher-Hartwig asymptotic formula, the integrand 18 of (123) can be seen to be as N → ∞. Thus, provided that the Fisher-Hartwig singularities at e iθ 1 , . . . , e iθ k can be treated as distinct, one can apply Selberg's integral and find that In order to ensure that any coalescences between the Fisher-Hartwig singularities do not influence the leading order asymptotics, one requires the restriction k < 1/β 2 , see for example [81]. This is the justification (and indeed, an outline of the proof) for this regime of conjecture 2.4. Outside of this range, i.e. k > 1/β 2 , the Fisher-Hartwig singularities coalesce, and so computing E[g N (β; A) k ] proves to be much more difficult. In fact, one either requires a uniform Fisher-Hartwig asymptotic formula valid for when the singularities coalesce, or an alternative approach. It is precisely the contributions from the coalescing singularities that leads to the different leading order behaviour in conjecture 2.4. See section 3 for a more detailed discussion of the moments of g N (β; A).
This concludes the review of the heuristics behind conjectures 2.2, 2.3, and 2.4. We now summarize recent advances towards proving the two 'maxima' Fyodorov-Keating conjectures. The final conjecture (conjecture 2.4) concerning the leading order of the moments of moments is the subject of section 3.

2.3.
Progress towards conjectures 2.2 and 2.3. There has been a concerted effort to prove the conjectures outlined in the previous section, resulting in considerable progress in both the random matrix and number theoretic cases. 18 For k ∈ N, and with the order of integration switched.
Typically problems in random matrix theory are more tractable than their equivalent formulations in number theory. Thus, one often uses random matrix results to inform the number theoretic calculation. However, in this case, a model of the Riemann zeta function, introduced by Arguin, Belius, and Harper [6], was the catalyst for much of the subsequent progress. We begin by presenting this model and the main results relating to it. We sketch how these results are proved, focusing in particular on the identification of the approximate branching structure. These techniques are fundamental not only to the result of Arguin, Belius, and Harper [6], but also to the proofs of the ensuing random matrix and number theoretic results, a discussion of which forms the remaining part of this section.
Identification of a branching structure in a model of ζ(1/2 + it). The primes behave in many respects like a random set of integers. To try to understand the Riemann zeta function, and in particular the behaviour of its maximum in short intervals, one might try to exploit this pseudorandom structure. Using work of Soundararajan [136], Harper [87] constructed a randomized model of the zeta function which captures many of its main statistical features, especially those relating to extreme values. Subsequently, Arguin et al. [6] developed this model further and established to subleading order an adaptation of the conjecture of Fyodorov and Keating. A crucial part of this work is the identification of an approximate tree structure. As will be discussed, it is also possible to demonstrate such structure within the Riemann zeta-function itself, and characteristic polynomials.
In [87], under the Riemann hypothesis, Harper proves that there exists a set H of measure at least 0.99 with H ⊆ [T, T + 1] such that (126) log for all η ∈ H. The set of small measure [T, T + 1]\H where the result fails essentially covers those points close to the zeros of zeta (where the logarithm is not well-defined). In order to capture the 'quasi-randomness' of the primes, Harper introduces the following random variables. Let (U p , p prime) be a sequence of independent random variables distributed uniformly on the unit circle 19 . Heuristically speaking 20 , U p models p −iτ , see also figure 6. It is then easy to justify, see for example [6], that the random variable Arguin et al. [6] prove the following. 19 In the literature, these are sometimes referred to as Steinhaus random variables. 20 One can make the argument more rigorous, see [87] 21 In fact in (127), one could for example replace U p with G p , a Gaussian random variable; see the discussion near (208).  Theorem 2.2 (Arguin, Belius, Harper [6]). Let (Ω, F , P) be a probability space and (U p , p prime) be independent random variables distributed uniformly on the unit circle. Then The error term o P (log log log T ) converges to zero in probability when divided by log log log T .
We now outline the key technique that enabled Arguin et al. to prove the above result, concentrating in particular on the identification of the approximate branching structure. We then discuss the general method of proof that would be employed if one instead had an exact branching structure. It is this recipe that proves key to the success of the proof of theorem 2.2, and to the progress towards conjectures 2.2 and 2.3 detailed later within this section.
In order to make the comparison with the branching random walk clear, set without loss of generality T = e 2 n for some large n ∈ N, where T is the height at which the interval is situated up the critical line. Now set so M 2 n is the expected maximum of a log-correlated process with 2 n ≡ log T points.
Define the random process Thus, theorem 2.2 follows if one can show To unveil the branching structure within X n (h), denote the summands in (130) by Straightforwardly, The first calculation (133) follows by symmetry. For (134), one first rewrites Re(U p j p −ih j j ) for j = 1, 2 using U p and U p and uses the fact that all off-diagonal terms are zero in expectation. The final equality follows immediately from (134) and (130).
Thus the covariance of X n (h) depends on the distance between the points h 1 and h 2 in a logarithmic way: That is, if |h 1 − h 2 | ≥ 2 −n (the scale of separation on average) then one can estimate 1 2 log |h 1 − h 2 | −1 from (135). However, if the two points lie closer together than 2 −n , i.e. they lie 'unusually' close, then effectively they are almost perfectly correlated The above calculation will be compared to the situation with exact branching shortly.
To exhibit the source of the branching structure, break the sum in (130) in to dyadic-like partitions, defining Hence, by (134), Thus the model of the Riemann zeta function can be written in the form of a branching random walk with increments Y m , The following lemma of Arguin et al. gives the distance at which two walks X n (h 1 ) and X n (h 2 ) become essentially uncorrelated. For ease of notation write cf. also definition 1.10.
Lemma 2.3 (Arguin et al. [6]). For h 1 , h 2 ∈ R, m ≥ 1, and some constant c, The proof follows from a strong form of the prime number theorem and integration by parts. Often, one has to handle the case of m = 0 (i.e. the contribution from small primes) separately, which is the cause of the requirement m ≥ 1 in the statement of lemma 2.3. Lemma 2.3 shows that the Y m (h), which act as the increments of the branching random walk, are essentially perfectly correlated when h 1 and h 2 lie close, relative to m (which is the analogue of depth in the binary tree). Otherwise, effectively, they are perfectly uncorrelated.
The proof of theorem 2.2 is inspired by the techniques that one would use if X n (h) were an exact branching random walk. Thus, we now outline the method in this precise setting, and comment how Arguin et al. are able to adapt this proof to the approximate situation for the model of the Riemann zeta function. As mentioned above, this method is also used in a number of other key results relating to conjectures 2.2 and 2.3.
General method of proof. Take a binary tree of depth n so that its 2 n leaves are equally spaced within the interval [0, 1]. To each branch attach an independent, centred Gaussian random variable with variance σ 2 = 1 2 log 2. The m levels, m = 1, . . . , n correspond to the 'dyadic' decomposition of the primes for the Riemann zeta model (138) (again, the contribution from small primes, m = 0, is handled separately).
Usually we write L(l 1 , l 2 ) for the level of the lowest common ancestor of two leaves l 1 , l 2 (cf. definition 1.10), but to emphasize the analogy with the model of ζ(s), we here write l 1 ∧ l 2 (cf. (142)).
The general method used to prove theorem 2.2 is based on the following recipe, applied to the exact branching random walk (see also [2] and the introduction of [6]).
One first identifies a branching structure (either exact, or approximate). For the branching random walk, this is trivially where Y m (l) ∼ N (0, 1 2 log 2). Consider now the number of exceedances, The relationship between maxima and Z(t) is clearly the following For questions relating to maxima over a continuous range, it is often necessary at this stage to additionally show that one can discretize the interval. The discretization will depend on the (approximate) branching structure. In order to identify the correct size of the maximum of X n (h), one finds t so that P(Z(t) ≥ 1) = o(1) (cf. theorem 1.11). Thus, one proceeds to bound P(Z(t) ≥ 1). An upper bound is attained through a union bound, where X n ∼ N (0, 1 2 log 2 n ). Standard Gaussian tail estimates give Recall from section 1.4, that this is approximately the correct order of the maximum for independent Gaussian random variables. Instead here the random variables are log-correlated and hence the maximum should be lower. Shortly, we discuss an adaptation to this upper bound that delivers the correct size of the maximum. The Paley-Zygmund inequality delivers a lower bound, One can show for the branching random walk that the second moment is exponentially larger than the first moment squared (see [4,6]); it is inflated by those exceeding walks 'pulling up' neighbours.
Altogether, this implies that Z(t) is not the right quantity to consider. Instead, one alters the definition of Z(t) to take into account the additional structure underpinning the model. In [6], it is shown that with high probability a walk X m (l) up to level 1 ≤ m ≤ n lies below a linear barrier log 2 m + B for some B growing slowly with n. Hence, if instead one works with (151)Z(t) := |{1 ≤ l ≤ 2 n : X n (l) ≥ t, and X m (l) ≤ log 2 m + B, ∀m ≤ n}| then (with some modifications) calculating the upper and lower bounds (148) and (150) withZ(t) replacing Z(t) is precisely the right approach.
Making such a method rigorous for functions with approximate branching structure, such as the model of the Riemann zeta function described above, is where the technicalities lie.
Progress Towards Conjecture 2.2. Conjecture 2.2 states that, for A ∈ U(N) sampled uniformly with respect to the Haar measure, we expect where (x A,N , N ∈ N) is a sequence of random variables which converge in distribution. The first step towards a proof of this conjecture was made by Arguin, Belius, and Bourgade [4]. The following is their main theorem (theorem 1.2 in [4]), establishing the conjecture to leading order. Soon after, the work of Paquette and Zeitouni [124] verified that the subleading term in the conjecture is correct. Precisely, their main result (theorem 1.2 in [124]) is as follows.
Theorem 2.5 (Paquette and Zeitouni [124]). For N ∈ N, let A ∈ U(N) be sampled uniformly. Then Importantly, their work establishes the constant −3/4, the characteristic coefficient in the subleading term for processes with logarithmic correlations. Finally, the best result currently is due to Chhaibi, Madaule, and Najnudel [39] who prove the conjecture up to tightness (theorem 1.2 in [39]). Theorem 2.6 (Chhaibi, Madaule, Najnudel [39]). If A ∈ U(N) is chosen uniformly with respect to the Haar measure, and {θ 1 , . . . , θ N } is the set of eigenphases of A, then the family of random variables is tight.
Chhaibi et al. in fact prove a stronger statement concerning the CβE, the Circular β Ensemble, the probability distribution on N points on the unit circle with density, Theorem 2.6 is a specialization to β = 2 (the CUE) of a more general result that holds for all β 22 .
In [39], the imaginary counterpart to (153) is also handled. Hence only the identification of the distribution of the fluctuating term of conjecture 2.2 remains. Recall that this is conjectured to be a sum of two independent Gumbel random variables. Such a statement has been proved [130] for yet another related model, which we discuss shortly, but not for characteristic polynomials. At time of writing, the state of the art for the number-theoretic maxima conjecture is stronger than theorem 2.6 with the correct form of the right-tail being identified (cf. theorem 2.12 and the subsequent discussion).
Branching structure for log |P N (A, θ)|. Following the progress made by identifying the approximate branching structure in a model of the Riemann zeta function by Arguin et al. [6], a crucial part of proving the various results towards conjecture 2.2 is the identification of a quasi-tree structure within the logarithm of the characteristic polynomial.
The proof of theorem 2.4 follows a similar procedure to that outlined in (138). The authors show that the tree-like structure emerges from a multiscale decomposition. Define f : R → R by f (θ) = log |1 − e iθ |, and observe that the Fourier series of f is Re(e −ijθ ) j . 22 Associated matrix models for general β have been constructed [112].
This means that where 23 θ ∈ [0, 2π). Compare this to (100), where such a decomposition was used to justify the log-correlated nature of log |P N (A, θ)|. We make two remarks on traces of powers of unitary matrices. Firstly, due to orthogonality of characters of the unitary group (equivalently, the rotational invariance of the Haar measure), traces are uncorrelated (see for example [66]): provided that k ≤ N. Secondly, theorem 2.1 (see as well [85]) described the convergence of powers of traces, Further, the speed of the convergence is superexponential [98]. Hence, one may truncate the sum in (156)  for n ∈ {0, . . . , log N}. X n (θ) should be compared to (141) under the usual dictionary N ≡ log T 2π . The increments W m (θ) are, by the above discussion, uncorrelated (for different m) and have variance approximately 1 2 . If one takes two points θ 1 , θ 2 , then the covariance of the increments for a fixed 'level' is calculated (see [4]) to be (160) (cf. (142)). The decoupling occurs at level θ 1 ∧θ 2 , the logarithm of the inverse distance between the points on the circle. Therefore, X n (θ) is approximately a branching random walk with N leaves and one can restrict one's attention to θ in the discrete set By (160), if θ 1 ∧ θ 2 ≤ e −m , then the walks X m (θ 1 ) and X m (θ 2 ) differ only by an order-one term. Thus, the appropriate analogous branching structure is a tree with e m particles at level m; i.e. not a binary tree but a tree with branching rate e. 23 When θ is an eigenphase of A, define both sides of (156) to be −∞.
It turns out that it is first easier to work with a shorter walk for some large integer K. By creating room at the top of the sum, Arguin et al. prove sharp large deviation estimates. The upper bound on X (1− 1 K ) log N follows from a union bound and the expected number of exceedances, Since the result establishes leading order, a modification of Z as in (151) is not necessary. This is not the case for the lower bound. In particular, via a Chebyshev inequality, they show that a Gaussian-type bound holds for the truncated walk where C is some constant, and σ 2 = 1 2 Such a bound is proved using a Riemann-Hilbert analysis. The top of the sum is handled separately (again using Riemann-Hilbert techniques), and the upper bound follows by taking the large N limit.
The lower bound requires more work and in particular a truncated second moment argument is required (a modification of the Paley-Zygmund inequality described previously). This controls the second moment, which as discussed following (150), would otherwise dominate the square of the first moment. Once again, Riemann-Hilbert techniques are used to determine the second moment of the appropriate counting function. Combining the upper and lower bound delivers the claimed leading order.
Swiftly following the result of Arguin, Belius, and Bourgade, conjecture 2.2 was verified to subleading order by Paquette and Zeitouni. The key new techniques are a careful comparison between the field log | det(I − zA)| and a centred Gaussian field inside the unit circle on optimal scales, and an adaptation of the barrier akin to (151). Here, the lower bound is determined by studying the field log | det(I − zA)| inside the unit circle, i.e. |z| < 1. It turns out that this is particularly convenient since there log | det(I − zA)| is harmonic, so almost surely One is then motivated to understand the behaviour of the field log | det(I − zA)| at N equally spaced points (similar to the discrete set T N , (162)), shifted to lie just inside the unit circle: The estimates involved in the proof of the upper and lower bounds are delicate. The route that Paquette and Zeitouni take is to prove that log | det(I − zA)| is very close to a real-valued Gaussian field. Hence, one can work directly with the Gaussian process and reap the associated benefits.
Determining the upper bound follows from a first moment argument. In particular they show that the probability that the field lies below an adapted barrier for most of the 'walk' and yet achieves a large value at the end of the walk is small. In fact for this argument, a full barrier is not needed. In comparison, more control is required for the lower bound (including the full barrier). Concealing many of the technical details, the argument broadly follows the usual lines: applying a truncated second moment argument.
Finally, the most recent improvement towards fully establishing conjecture 2.2 is due to Chhaibi, Madaule, and Najnudel [39] (though see theorem 2.12 for a comparatively stronger result in the number theoretic case). The result shows that the conjectured family of random variables is tight.
Let D be the unit circle in C, and µ a probability measure on D. By applying the Gram-Schmidt procedure to {z n : n = 0, 1, . . . }, one may create a sequence of monic polynomials {Φ k (z), k = 0, 1, . . . } orthogonal with respect to µ. These polynomials can be generated using the Szegö recurrence relation, and Φ 0 (z) = 1. The * operator reverses the order of the polynomial coefficients. The numbers α k are known as Verblunsky coefficients. OPUC can be used to understand the behaviour of unitary characteristic polynomials.
Lemma 2.7 (Chhaibi, Madaule, Najnudel [39]). The following family of random variables is tight, Hence, theorem 2.6 is equivalent to the tightness of the family One can express log Φ * N (e iθ ) as a sum of logarithms of a function of Verblunsky coefficients and continuous real functions called Prüfer phases. This introduces a martingale structure which is particularly useful when determining the extreme values of polynomials (Φ * k ) k≥0 . With some careful construction, one can then define a new field for θ ∈ R, where X C j is a complex Gaussian of variance 1, and ψ j (θ) are so-called Prüfer phases. The problem is thus reduced to showing that the family is tight. Finally, the authors show that where O(1) is a tight family of random variables, and hence conclude. As with the previous two results, in essence the proof follows from applying a discretization result, an asymptotic upper bound (using a first moment argument) and an asymptotic lower bound (by Paley-Zygmund). For the upper bound, after showing that the usual discretization holds, the authors use a (full, increasing, non-constant) barrier. The lower bound, as usual, requires more careful handling and the authors move from (172) to a new process in order to achieve more independence to make the first and second moment estimates more tractable.
At time of writing, it remains to identify the distribution of the fluctuating term (conjectured to be the sum of two independent Gumbel random variables). Remy [130] considered a related model, discussed below, and there proved the conjectured fluctuations. Alternatively, recently Arguin, Bourgade and Radziwiłł [7] have established an upper bound on the right tail for the Riemann zeta function of the predicted size, see theorem 2.12 and the discussion thereafter. It is possible that their techniques could be adapted to the random matrix setting.
Gaussian Multiplicative Chaos and Gumbel random variables. In [130], Remy establishes the sum of two independent Gumbel random variables in an analogous problem. Before discussing the result, we briefly survey some relevant background. Gaussian multiplicative chaos measures were introduced in section 1.5.
In two successive works [122,143], it was determined that for α ∈ (−1/2, 2) and as N → ∞, Thus, if one considers the left hand side as a sequence of measures on the unit circle, by (175) the sequence converges in law to the GMC measure found on the right hand side of the statement. Recall that we write X(θ) for a centred and logarithmically correlated Gaussian field, where explicitly 24 It was first shown by Webb [143] that the convergence (175) is true in the L 2 −phase α ∈ − 1 2 , √ 2 (cf. (81)). In fact, Webb proves a more general version allowing for certain twists of the characteristic polynomial. In a subsequent work due to Nikula, Saksman, and Webb [122], the convergence (175) was extended to include the L 1 −phase α ∈ [ √ 2, 2) (cf. (82)). Motivated by the theory of multiplicative chaos, it is conjectured that the limiting object for α > 2 will be zero, thus 24 Whilst the normalization constant of 1/2 appearing here is not standard within the log-correlated literature, its role is to mimic the random matrix setting. the interesting behaviour has now been categorized. Similar results have been established for the Riemann zeta function [133].
In [130], Remy also works with the field X(θ), though with a different normalization in the covariance. For consistency, we have translated his results to the notation of this paper. Define for α ∈ (0, 2). As usual, Y α is rigorously defined as a limit using an appropriate cut-off X ε (θ) of X(θ), (see [130] or section 1.5). It is instructive to compare this to (116). The main theorem of Remy proves a conjecture of Fyodorov and Bouchaud [77].
Theorem 2.8. Take α ∈ (0, 2). For all ρ ∈ R such that ρα 2 < 4, one has Setting α = 2β and ρ = k, one may combine theorem 2.8 with the convergence to the GMC measure. Hence, one expects that for kβ 2 < 1, and 2β ∈ (0, 2), and as N → ∞, Thus, by theorem 1.1, (assuming that the convergence (181) holds), which is precisely the first regime of conjecture 2.4. Additionally, Remy considers the 'critical' GMC at α = 2 (i.e. β = 1). In this case, the measure is denoted −X(θ)e X(θ) dθ and is found via again for a suitable cut-off X ε . Such a construction gives a non-trivial random positive measure, see [68,69,125]. Now define It was shown by Aru, Powell, and Sepúlveda [10] that Y ′ is related to the limit as α → 2 of Y α , in probability. One can hence deduce (see [130]) the density f Y ′ of Y ′ , so log Y ′ has a standard Gumbel law. Finally, recent results (see [23,67]), have shown that for suitable cut-offs X ε , there exists a constant C such that where G is a Gumbel random variable independent from Y ′ . Hence, combining (187), (186), and matching N with 1/ε and log |P N (A, θ)| with X ε (θ), this analogy would precisely imply conjecture 2.2. This approach also suggests that it may be easier to instead establish  25 . For ζ, the range for the maximum need not be constrained to [0, 2π) (since unlike with unitary matrices, one has no periodicity). The conjecture (189) should hold for any interval that is O(1) as T grows. For intervals that grow or shrink with T , see theorems 2.13 and 2.14. Najnudel [120] established the conjecture to leading order under the Riemann hypothesis (and also proves an analogous result for the imaginary part of log ζ(1/2 + it)). Theorem 2.9 (Najnudel [120]). Take ε > 0. Assuming the Riemann hypothesis and as T → ∞ The random walk structure originates in the Euler product of zeta. In particular, for Re(s) > 1, for t ∈ R, and where l(n) = 1/k if n = p k and 0 otherwise. Establishing an upper bound, i.e. max |t−h|≤1 log |ζ(1/2 + ih)| < (1 + ε) log log T is straightforward and doesn't require the Riemann hypothesis. Najnudel uses a method inspired by [39] followed by an application of Markov's inequality and classical estimates on the second moment 25 For more detail on the distribution x, see theorem 2.12 and the subsequent discussion. of ζ. An alternative (simpler) proof can be found in section 2 of [5]. For both and subsequent approaches, one first discretizes the interval. An important observation is that the zeta function fluctuates with approximate frequency 1/ log T (seen for example via its Fourier series). This permits one to discretize the order-one interval into approximately log T points 26 .
For the lower bound, Najnudel works with a Dirichlet series related to (191): where ψ is chosen to be the Fourier transform of a function φ : R → R satisfying, for Re(s) > 1, In order to conclude information from (193) about the values of log ζ inside the critical strip, Najnudel assumes the Riemann hypothesis in order to define an analytic continuation. Hence, under the Riemann hypothesis and control on the regularity of φ, Najnudel shows for τ ∈ R, Najnudel then proves (propositions 5.1, 5.2 in [120]) that, without too big an error term, and by making careful choices for τ, H in (192), the maximum of Re(Λ ψ (τ, H)) provides a lower bound on the maximum of log |ζ(1/2 + iτ )| in the relevant short intervals.
Working with Λ ψ allows one to apply the branching techniques discussed previously. Indeed, after applying a multiscale-type decomposition one sees that Λ ψ (τ, H) has an approximate branching structure. By comparing Λ ψ to a randomized version (à la Harper's model (127)) one can, via many technical lemmas, apply the usual Paley-Zygmund approach.
Soon after Najnudel's result, Arguin, Belius, Bourgade, Radziwiłł and Soundararajan were able to remove the assumption of the Riemann hypothesis. Theorem 2.10 (Arguin et al. [5]). For any ε > 0, as T → ∞, we have By examining ζ off-axis, Arguin et al. avoid assuming the Riemann hypothesis. As mentioned above, the upper bound is much simpler to prove. For the lower bound, the authors establish that large values just off the critical line imply large values lying on the critical line, thus permitting them to work slightly to the right of the line 1/2 + it. They then construct a 'mollifier' (i.e. a function M(s) so that, just to the right of the critical line, M(s)ζ(s) ≈ 1). This allows them to show that for almost all t ∈ [T, 2T ], one instead may work with a significantly shorter Dirichlet polynomial for an X much smaller than T . In particular they show that a large value of the maximum of (196) (over |t − h| ≤ 1) implies a large value of max |t−h|≤1 log |ζ(1/2 + ih)|. The sum (196) provides the approximate branching structure. By breaking up the length of the 'walk' X via the usual style of multiscale decomposition 27 , one applies the Paley-Zygmund inequality (which requires precise large deviation estimates on the Dirichlet polynomials in question) to prove the lower bound.
Improving further on the leading order, Harper provided a nearly sharp upper bound.
The analytic proof still follows the same road-map from the probabilistic approach. Rather than an explicit discretization, Harper shows that the maximum question can be replaced using Cauchy's Integral Formula by an average over a small rectangle of height 1/ log T ) which will eventually be summed over all shifts (the analogue of a union bound). This is essentially a 'moments of moments' style result (see section 3). Harper's proof also differs in how ζ(s) is approximated by Dirichlet polynomials. Now working with an integral, the zeta function is approximated in mean square by the product of two Dirichlet polynomials: one over smooth numbers and one over rough numbers. Typically, one discards the latter in favour of short Dirichlet series, see for example (196). However, in order to achieve the desired subleading term, one should work with the full range. Additionally, Harper requires estimates on the fourth moment of Dirichlet polynomials to achieve the subleading bound. Compare this to the leading order upper bound which only required the second moment. This approach delivers the claimed bound.
Finally, the strongest result to date is that of Arguin, Bourgade and Radziwiłł [7]. There they determine the upper bound of conjecture 2.3, including the predicted tail of the random variable (93). Recall from (93) that the right tail of the fluctuating term should be like xe −2x (using the notation of (189)). This is stronger than what is currently known for the random matrix case. Theorem 2.12 (Arguin, Bourgade, Radziwiłł [7]). There exists C > 0 such that for any T ≥ 3 and x ≥ 1 one has Clearly, the expected sharpness of theorem 2.12 depends on the size of x. For x = O( √ log log T ), the bound in theorem 2.12 is expected to be sharp. For x = O(log log T ) however (i.e. x can grow at the same rate as the leading order term), it is expected that instead the sharp decay rate should also feature the Gaussian behaviour: 1 log log T . Notice that (198) can be rephrased in terms of a particular Lebesgue measure of points. 27 The contribution from the very small and very large primes has to be handled separately.
As is now routine, the first step is to discretize the interval, focusing on log T discrete points. The authors then make a comparison between the logarithm of the Euler product and a random walk , for k ≤ log log T . In (200), one has essentially the first two terms in the Taylor series of the log Euler product. Taking the length of the walk close to k = log log T is a source of most of the difficulty of the proof. The reason is that one requires precise large deviation estimates.
To achieve these, one needs high moments of long Dirichlet sums. Current number theoretic techniques produce the estimate For this situation, one needs α ≈ log log T . Notice that if k > log log T − c log log log T (the level of the maximum), then the error becomes unmanageable. Therefore Arguin et al. instead reduce the number of hs to be considered (i.e. rule out many paths). They achieve this by introducing both an upper barrier (à la Bramson, see also the discussion prior to (151)) and a new lower barrier. Call the space in between the barriers the corridor which, by construction, narrows as one approaches k = log log T . Arguin et al. iteratively (but with a finite number of steps) build this corridor, progressively discarding escaping paths. By reducing the number of walks, they are able to circumvent the aforementioned large deviation problem. In their estimates, like Harper, they require estimates on (twisted) fourth moments of Dirichlet polynomials. A natural extension to conjecture 2.3 would be to consider maxima over intervals which grow or shrink with T . It turns out that the branching analogy extends to the case with intervals of length O(log θ (T )) for some real θ > −1.
Notice that for θ = 0, even at leading order, the asymptotic behaviour of the maximum has changed compared to conjecture 2.3.
The reason that the theorem requires the Riemann hypothesis for θ > 3 is that 2 √ 1 + θ corresponds (via an application of Markov's inequality) to the βth moment of the Riemann zeta function. Since precise unconditional moment asymptotics are only available for β ≤ 4 (i.e. θ ≤ 3), one must assume the Riemann hypothesis for higher moments, see for example [136].
In the same work, Arguin, Ouimet and Radziwiłł also consider moments of ζ over intervals of length O(log θ T ), (akin to (116) for θ = 0). They demonstrate that above a certain temperature (i.e. some critical moment), the moments exhibit freezing (cf. (122)).
The interpretation of theorem 2.13 is as follows. For θ ≤ 0, one is in the situation discussed at length within this section already: the branching tree picture and relevant theory still holds for a tree with (1 + θ) log log T increments and log 1+θ T leaves on an interval of width 2 log θ T , with branching rate e. Using the usual heuristic, this would imply that the maximum at leading order grows like the logarithm of the number leaves, i.e. (1 + θ) log log T . This matches theorem 2.13 for θ ≤ 0 (and clearly theorems 2.9, 2.10, 2.11, 2.12 corresponding to θ = 0).
However, for θ > 0 (i.e. intervals larger than those considered in conjecture 2.3), one has to alter the branching picture. Instead of considering just one tree, the correct analogy is to consider log θ T independent copies of the branching trees on intervals of order one. This leads to a leading order like √ 1 + θ log log T . Similar models, known as Continuous Random Energy Models, studied by Bovier and Kurkova [33] led to the following generalization.
For θ > 0, the sequence of random variables (y θ,T , T ≥ 1) converges in distribution to a Gumbel In [3], the authors provide numerical evidence in support of (203). The parameter m in (206) contains two constants of interest. Firstly, 4C is the Meissel-Mertens constant. Secondly, f k is the leading order coefficient of the 2kth moment of ζ (43). The appearance of the moment coefficient is related to large deviations of Selberg's central limit theorem [3,75,127]. Indeed, one can show 28 that the equivalent moment parameter appears in the large deviations of the Keating and Snaith central limit theorem 1.3. For β ≥ 0, any θ ∈ [0, 2π), and V ∼ k √ 2 log N , . Notice there is a correction to (16) -which holds for fixed V ∈ R -for V growing at the same rate as the standard deviation of log |P N (A, θ)|.
Conjecture 2.5 also reveals how the subleading term varies according to the branching structure present. At θ = 0, the process is log-correlated. As the interval grows beyond O(1), instead one sees the IID statistics emerging. One can interpolate between the ranges on the scale θ ∼ (log log T ) γ . Resolving this discontinuity as θ → 0 is work of Arguin, Dubach, and Hartung. 28 See [75] for the original statement, and [3] for an alternative proof.
In [8] they study the associated question for a random model of ζ(1/2 + it) over intervals of length log θ T , where θ is either fixed or can tend to zero at controlled speed.
For this model, Arguin et al. are also able to show a precise upper bound for the right tail in an interval of order one of the shape given by (199).
This concludes the review of the recent progress towards proving both conjectures 2.2 and conjecture 2.3. As has been demonstrated, the identification throughout of an approximate branching structure is essential.

CONJECTURE 2.4 AND GENERALIZED MOMENTS
Within this section, we focus on the third conjecture of Fyodorov and Keating, conjecture 2.4. Recall from section 2 that the heuristic calculations described in [81] supporting conjecture 2.2 are based on an analysis of the random variable The moments of moments were hence defined as the moments of g N with respect the an average over A ∈ U(N), Recall from section 2.2, that one justification for conjecture 2.4 follows from a heuristic calculation of the moments of moments when k is an integer [80,81,105], in which case one has Before we present the recent developments towards a full understanding of (97), we first investigate a related construction for branching random walks. As demonstrated in section 2, exploiting the connection between branching processes and log-correlated processes was instrumental to the results towards the maxima conjectures 2.2 and 2.3. 29 One can show, see [8], that (X T (h), h ∈ [0, 1]) is a Gaussian log-correlated process, just as when the summands instead involve U p , uniform random variables on the unit circle.

3.1.
Branching moments of moments. Take a binary tree of depth n, and a choice of leaf l. Load to each branch in the tree an independent centred Gaussian random variable with variance 1 2 log 2. We write for the branching random walk from root to l where Y m (l) ∼ N (0, 1 2 log 2) are independent and identically distributed. Then X n (l) ∼ N 0, n 2 log 2 and the distribution has no dependence on the leaf l. Such labels (as for Y m (l)) will play an important role later, however. The last common ancestor of two leaves l, l ′ is written L(l, l ′ ), and is the furthest node from the root having both l and l ′ as descendants. We also require the level of the last common ancestor. Hence, the last common level lcl(l 1 , . . . , l k ) is the level of L(l 1 , . . . , l k ).
Conjecture 2.2 was motivated by studying the partition function (210) for log |P N (A, θ)|. As previously shown, X n (l) is a log-correlated process. It is natural here, therefore, to investigate the associated partition function (or moment generating function) for X n (l): From a statistical mechanical perspective, the parameter 2β acts as inverse temperature.
In particular, we are interested in the moments of the partition function (214), where the expectation in (215) is with respect to the Gaussian random variables. These are the moments of moments for the branching random walk. They are the analogues of (96). Such moments of partition functions were also considered for more general branch weightings by Derrida and Spohn [64].
In [16], the following results are established.
For small values of k, one can calculate exact and explicit formulae for the moments of moments; such examples can be found in the appendix of [16].
Furthermore, for integer values of the moment parameters, the branching moments of moments are polynomials.  Thus, the branching moments of moments exhibit asymptotic behaviour identical to that predicted by conjecture 2.4 for the random matrix moments of moments, if the identification N = 2 n is made. However, the leading-order coefficients in each case are (unsurprisingly) provably different.
The proof of theorem 3.1 is based on an inductive argument. The simplest case k = 1 follows from a moment generating function calculation, When k = 2, one has to handle the (typical) case of when the paths are not independent. To do so, one splits the moments of moments at the level of the last common ancestor. Using the branching structure, one is able explicitly to determine that The statement of theorem 3.1 for k = 2 hence follows by analysing (218) as β varies. Inductively, one can handle general k ∈ N. A similar argument reveals the polynomial structure of corollary 3.1.1. It is also interesting to examine the leading order coefficients ρ k,β , σ k,β , and τ k,β . Through the explicit computations of the branching moments of moments in [16], in all cases computed 30 , the leading order coefficient fails to be analytic in β in the limit n → ∞. This property is shown in figure 7 for k = 2, 3. 30 With the exception of the simplest case k = 1.

3.2.
Unitary moments of moments. Common to all approaches described hereafter, one assumes that the moment parameter k in (211) is an integer. Such an assumption allows one to exploit the rich structure of the resulting average over the Haar measure (after an application of Fubini), Here and henceforth, E[·] represents the average over U(N). It would be interesting to understand (211) for general k. Indeed, conjecture 2.2 implicitly assumes that k ≥ 1 (but not that k is integral). If the analogy with number theory holds, then one should expect a correction to conjecture 2.2 for k ∈ (0, 1) and β = 1 after the result of Harper [89], cf. theorem 2.11.
We now present the progress towards a proof of conjecture 2.4. At the time of writing, the conjecture has been established to leading order for k ∈ N and positive β. The leading order coefficient in the critical (kβ 2 = 1) and supercritical (kβ 2 > 1) regimes in general are unknown, but there are results for k = 2, β > 0 as well as for k, β ∈ N.
Results towards conjecture 2.4. The case k = 1, β ∈ N in conjecture 2.4 follows immediately from the moment formula of Keating and Snaith [110] (cf. theorem 1.1, also [18]). Specifically, which additionally is clearly a polynomial in N of degree β 2 . Bump and Gamburd [37] later gave an alternative proof using symmetric function theory. In this second approach, the leading order coefficient of MoM U(N ) (1, β) was obtained by counting certain semistandard Young tableaux. These parallel stories of symmetric function theory and complex analysis continue for higher values of k.
A proof of conjecture 2.4 when k = 2, β ∈ N follows directly from formulae given by Keating et al. [107]. It differs from the proof given by Claeys and Krasovsky [42] (discussed below), which establishes the conjecture for k = 2 and all β, but without identifying a polynomial structure when β ∈ N.
For A ∈ U(N), the secular coefficients of A, written 31 Sc n (N), are the coefficients of its characteristic polynomial The following theorem is proved Keating et al. [107].  31 Clearly, Sc n also depends on the specific matrix, A. where with p η,l (c − l) being polynomials in (c − l).
Immediately, theorem 3.2 shows that MoM U(N ) (2, β) is a polynomial in N for β ∈ N, and asymptotically is Theorem 3.2 was proved by two methods: symmetric function theory and complex analysis. The former determines an equivalent structure for α η (c) to that given in (224) coming from a standard lattice point count, which proves that I η (m; N) is a polynomial in N and makes it clear that α 2β (β) = 0. By using complex analysis the result regarding the leading order in N can be established and the form for α η (c) given in (224) is found. Claeys and Krasovsky instead use Toeplitz determinants to understand MoM U(N ) (2, β). The N × N Toeplitz determinant for the symbol f is defined by where f is a real-valued, 2π-periodic, integrable function with Fourier coefficients The connection between Toeplitz determinants and random matrix averages is the following Heine-Szegö identity [139] (228) Thus, one can write (219) as D N (f ) for the particular symbol f (z) = k j=1 |z − e iθ j | 2β . The symbol f has k Fisher-Hartwig singularities at e iθ 1 , . . . , e iθ k .
In the simplest case, k = 1, there is a single Fisher-Hartwig singularity. Without loss of generality, due to the rotational invariance of the Haar measure, one writes the symbol as f * (z) = |z −1| 2β in this case. Then, one can show (see for example (1.13) in [42]), provided that Re(β) > −1/2. One sees that this agrees with the asymptotic form for MoM U(N ) (1, β) calculated by Keating and Snaith. For MoM U(N ) (2, β), Claeys and Krasovsky must handle two Fisher-Hartwig singularities. Their results are uniform in that they describe the transition between the two singularities being distinct and when they are permitted to merge. There is also a connection to a solution of a certain nonlinear second-order ordinary differential equation, known as Painlevé V. [42]). Take β > −1/4. Let

Theorem 3.3 (Claeys and Krasovsky
Then for 0 < t 1 < π and as N → ∞, The constants c 1 , c 2 , c 3 are explicitly given in [42], and are additionally related to a solution to the Painlevé V differential equation. Thus, (231) agrees 32 with conjecture 2.4 when k = 2. Theorem 3.3 is proved using Riemann-Hilbert techniques.
After the publication of [15], Fahs extended the results of Claeys and Krasovsky from two merging singularities to k ∈ N merging singularities. Fahs' result 33 captures the leading order behaviour in each regime, as well as at the transition point k = 1/β 2 , though is not precise enough to capture information about the leading order coefficients beyond k = 2.
This verifies conjecture 2.4 for k ∈ N and β ≥ 0. To prove theorem 3.3 and theorem 3.4, the authors first relate the Toeplitz determinants to a system of polynomials orthogonal on the unit circle. Then, both characterize these polynomials as a Riemann-Hilbert problem and use the associated techniques in order to determine the respective results. The topic of Riemann-Hilbert problems is outside the scope of this review, but the interested reader may seek out [29,59,61] for further details, and [24] for an overview of connections to random matrix theory.
There has also been recent progress on understanding the behaviour of MoM U(N ) (k, β) at the critical point kβ 2 = 1. In [111], Wong and Keating study so-called 'critical-subcritical' moments of GMCs, in dimensions d ≤ 2. For the random matrix models, d = 1. 'Critical-subcritical' here refers to the ordered parameter pair (k, β). 'Critical' kth moments occur at k = 1/β 2 . The GMC measure is defined in the 'subcritical' regime for β 2 < 1 by (178). Wong and Keating determine the asymptotic leading order, including coefficient, for MoM U(N ) (k, 1/ √ k) provided that k ∈ N. Recall that the result of Remy shows that the kth moment of GMC exists in the (subcritical) regime k < 1/β 2 , cf. (179). Wong and Keating can extend to the (critical) case where k = 1/β 2 , 32 Within their proof it is clear how to deal with the constraint on t 1 , see the comments after (1.47) in [42]. 33 Adapted from [73] for notational consistency.
i.e. beyond where GMC moments exist. By performing an argument akin to (180)-(182), they apply their GMC result to moments of moments.
By theorems 3.3 and 3.4, the leading behaviour in N of MoM U(N ) (k, 1/ √ k) is known for k ∈ N, but the leading coefficient was only identified for k = 2 (cf. (231), also [42]). Through analysing the critical moments of GMC measures as one approaches kβ 2 = 1 from the subcritical GMC regime (kβ 2 < 1), Wong and Keating identify the asymptotic leading order coefficient of Theorem 3.5 (Keating and Wong [111]). Take k ∈ N, k ≥ 2. Then as N → ∞, Although the proof of theorem 3.5 requires k to be integral (so that one can perform the usual switching of order of integration in (211)), one expects that (234) should hold for general k > 1. Wong and Keating also give precise conjectures for the leading order coefficient at the critical point for the symplectic, orthogonal, and CβE cases 34 , with appropriate parameter restrictions.
Integer moments of moments. Prior to the extension of the result of Claeys and Krasovsky by Fahs, the present authors employed formulae developed in [37,51,106,107,110] to verify conjecture (2.4) for integer moment parameters. We separately discuss these results and extensions here. As shown, the results of Claeys and Krasovsky, Fahs, and Keating and Wong [42,73,111] have now established 35 conjecture 2.4 for k ∈ N. The techniques of these section require a further restriction on β. However, this results in more access to the leading order coefficient, potentially lower order terms, and allows one to appeal to new methods.
In particular, the two themes of complex function theory and symmetry function theory play an important role. Conjecture (2.4) can be reformulated in terms of particular symmetric polynomials and a lattice point count function. This gives a polynomial bound on MoM U(N ) (k, β) at integer values of k, β, and N. Additionally, a representation in terms of multiple contour integrals is employed; this furnishes an expression for MoM U(N ) (k, β) as an entire function of N and yields the following theorem. Theorem 3.6 (Bailey and Keating [15]). Let k, β ∈ N. Then where γ k,β can be written explicitly in the form of an integral.
Using a combinatorial sum equivalent to the multiple contour integrals due to [51], one can deduce the following result.  35 Up to an explicit leading order coefficient in the supercritical regime.
It had been previously established [37,107] that in the cases k = 1, 2 and β ∈ N, one can rephrase (219) in terms of symmetric function theory.
Bump and Gamburd rederived theorem 1.1 provided 36 β ∈ N. For ease of presentation we restate this as a separate theorem in the form in which they proved it, even though it coincides precisely with theorem 1.1. [110]; Bump & Gamburd [37]). Let β ∈ N. Then In order to prove theorem 3.8, Bump and Gamburd re-express the moments of characteristic polynomials as an average over Schur polynomials. They derive the following proposition 37 . Proposition 3.9 (Bump and Gamburd [37]). Take K, L, N ∈ N and α 1 , . . . , α K+L ∈ C, then In (237), the Schur polynomial s λ (x) was defined in definition 1.16. Theorem 3.8 then follows from proposition 3.9 by setting L = K = β and α 1 = · · · = α 2β = 1 (using the rotational invariance of the Haar measure). Thus Using (88), the Schur polynomial s λ (1 n ) is equal to the number of semistandard Young tableaux of shape λ with entries in {1, . . . , n}. By the following lemma, the 'Hook-content formula' (see [137]), one has the statement of the theorem. Lemma 3.10. Take a partition λ and n ∈ N. Then which manifestly is a polynomial in λ i − λ j .
The following result (proposition 2.1 in [15]) is an extension of proposition 3.9 to handle (219).
Proposition 3.11 (Bailey and Keating [15]). For N, k, β ∈ N, we have 36 Whilst their proof relies on β ∈ N, one can analytically extend to Re(2β) > −1. 37 The statement of proposition 3.9 is more generally stated in [37], giving a further equality for (237) as a permutation sum. The representation of the moments as a permutation sum had also already been proved by Conrey et al. [51], though again using a different method. Therefore, by proposition 3.11 and (88), one can rewrite MoM U(N ) (k, β) at k, β ∈ N in terms of Schur functions, where (243) τ j = t 2(j−1)β+1 + · · · + t 2jβ for j = 1, . . . , k, and t n = t n (T ) is the number of entries in the tableau T equal to n. By computing the θ integrals in (242), one finds that where the sum is now over T ∈ SSYT 2kβ ( N kβ ) subject to the additional restriction (243). When  e N αq · · · e −N n l=m+1 z l ∆(z 1 , . . . , z n ) 2 dz 1 · · · dz n where the contours enclose the poles at α 1 , . . . , α n and ∆(z 1 , . . . , This multiple contour integral is nearly identical to the conjectural form for shifted moments of the zeta function, as discussed in section 3.3. Choosing the α j correctly and analysing the multiple contour integral for large N, one verifies the statement of theorem 3.6. Theorem 3.7 requires yet another representation of (219) (again due to Conrey et al. [51]), involving a permutation sum.
Following the publication of [15], Assiotis and Keating [13] extended the symmetric function theoretic analysis using Gelfand-Tsetlin patterns, still with the restriction that k, β ∈ N. They are able to recover theorem 3.6 using this combinatorial approach, yielding an alternative interpretation of the leading order coefficient γ k,β . This permits a further connection to a particular Painlevé differential equation. Such an approach is well-adaptable to the case of symplectic and orthogonal moments of moments, which is discussed further in the next section.
It is also interesting to remark that Barhoumi-Andréani [19] has recently proposed a general framework rooted in symmetric function theory for understanding various averages of unitary characteristic polynomials. Through this lens, one can rephrase and rederive many classical results in unitary random matrix theory. This includes moments of moments, but also encompasses other interesting averages. We refer to section 5.4 of [19] for a summary table of applicable problems. Symplectic and Orthogonal groups. Assiotis and Keating [13] rederived the asymptotic result of [15], theorem 3.6, via Gelfand-Tsetlin patterns. These ideas are extended 39 by Assiotis et al. [12] to compute the asymptotic behaviour of (245).
where the leading order term coefficient γ k,β is the volume of a convex region and is strictly positive.
Theorem 3.14 (Assiotis, Bailey, Keating [12]). Let G(N) = SO(2N). Let k, β ∈ N. Then, MoM SO(2N ) (k, β) is a polynomial function in N. Moreover, where the leading order term coefficient γ k,β is given as a sum of volumes of convex regions and is strictly positive.
(For a precise description of the convex regions involved in the definition of γ  [12].) By comparing theorems 3.13 and 3.14 to theorem 3.6, one sees that the asymptotic behaviour differs between all three matrix groups, though the polynomial property is retained. 38 SU(N ) is the set of all N × N unitary matrices with unit determinant. 39 Since there is a generalization of lemma 3.12 to averages over Sp(2N ) or SO(2N ), the results described in this section could also be arrived at via the method of proof of theorem 3.6.
Examples of interlacing of signatures. Figure 8a shows signatures whose length differs by 1, and figure 8b shows two signatures of the same length. Note that in both cases, the numbering is from right to left, as to be in-keeping with future definitions. The inequalities are explicitly shown, though it is not standard to do so.
In order to describe the key aspects of the proof of theorems 3.13 and 3.14, we first introduce some new notions. The first generalizes the definition of a partition, to allow for general integer entries.
The next definition describes how two signatures interact. Definition 3.16 (Interlacing). Signatures λ ∈ S n and ν ∈ S n+1 interlace, written λ ≺ ν, if: Similarly, λ ∈ S n and ν ∈ S n interlace (retaining the notation λ ≺ ν) if: It is common to draw interlacing signatures, so to emphasize their interaction, see figure 8. In this review, our convention is to draw signatures from right to left.
The next definition introduces a 'full' Gelfand-Tsetlin pattern. These are required for unitary moments of moments, see [13]. Hereafter we will instead focus on 'half' Gelfand-Tsetlin patterns.
Definition 3.17 (Gelfand-Tsetlin pattern). A non-negative Gelfand-Tsetlin pattern of length/depth n is a sequence of signatures λ (i) n i=1 such that λ (i) ∈ S + i and (251) GT + n is the set of all such patterns. Given a signature ν ∈ S + n , it is often useful to additionally consider those Gelfand-Tsetlin pattern with fixed top row ν. The set of such patterns is written GT + n (ν). It is common to draw Gelfand-Tsetlin patterns as a triangular array, essentially a generalization of interlacing diagrams, see figure 9.
For a signature ν ∈ S + n , there is a well-known bijection between semistandard Young tableaux of shape ν with entries in {1, . . . , n} and non-negative Gelfand-Tsetlin patterns of length n with 40 Note that this is distinct from the definition of a partition since trailing zeros are recorded, whereas partitions ν are identified with any other partition which has the same non-zero entries.   fixed top row ν, see for example [13,84]. It is this bijection that was exploited by Assiotis and Keating [13] when rederiving the asymptotic result of theorem 3.6. When handling symplectic and orthogonal moments of moments, we instead require half Gelfand-Tsetlin patterns.
Definition 3.18 (Half (Gelfand-Tsetlin) pattern). Let n be a positive integer. A half (Gelfand-Tsetlin) pattern of length n is given by a sequence of interlacing signatures λ (i) n i=1 such that λ (2i−1) , λ (2i) ∈ S i with the interlacing condition: The first entries on the odd rows, namely λ (2i−1) i , are the odd starters.
We now arrive at the definition of a symplectic (Gelfand-Tsetlin) pattern, see figure 10a for an illustration. i=1 is a half pattern of length 2n all of whose entries are non-negative integers. For fixed complex numbers (x 1 , . . . , x n ) we associate to the pattern P a weight w sp (P ) (dependence on x 1 , . . . , x n is suppressed from the notation and will be clear from context in what follows) given by: with λ (0) ≡ 0. Given ν ∈ S + n , SP ν is the set of all (2n)-symplectic Gelfand-Tsetlin patterns with top row λ (2n) = ν.
We now give the combinatorial definition of the symplectic Schur polynomial as a sum of weights over symplectic patterns. This should be seen in the context of the definition of a (unitary) Schur polynomial, see definition 1.16.
Orthogonal patterns are similarly defined, though slightly more involved than the symplectic case since some of the elements are now permitted to be negative. We write sgn(x) = 1 if x ≥ 0 and sgn(x) = −1 if x < 0.    For fixed complex numbers (x 1 , . . . , x n ) we associate to the pattern P a weight w o (P ) given by: with λ (0) , λ (−1) ≡ 0. Given ν ∈ S n , OP ν is the set of all (2n − 1)-orthogonal Gelfand-Tsetlin patterns with top row λ (2n−1) = ν.
See figure 10b for an example of an orthogonal Gelfand-Tsetlin pattern. As in the symplectic case, we have the following combinatorial definition of the orthogonal Schur polynomial as a sum of weights over orthogonal patterns.  41 It transpires that for the problems at hand, the entries of (2n − 1)-orthogonal Gelfand-Tsetlin patterns are always all integers.
To prove the polynomial part of theorems 3.13 and 3.14, one uses another (equivalent) representation of averages of characteristic polynomials, due to Conrey et al. [51]. This is the same technique employed in the unitary case.
To prove the asymptotic part of theorems 3.13 and 3.14, one translates (245) (via a generalization of proposition 3.11) to an average over the respective Schur polynomials. By applying the bijection to symplectic or orthogonal half Gelfand-Tsetlin patterns, one has hence determined that (245) is equal to a count of (symplectic or orthogonal) half Gelfand-Tsetlin patterns with constraints. By finally translating these discrete structures to a continuous setting and applying lattice point count asymptotics, one arrives at the stated results.
Naturally, one might be interested in Fyodorov-Keating type conjectures for the symplectic and orthogonal cases. The outline given in sections 2.3 suggests that, to leading order at least, the maxima conjecture would still hold 42 .
Notice that in passing, theorems 3.13 and 3.14 prove the asymptotic growth and polynomial structure of certain restricted combinatorial counts, as theorem 3.6 did in the unitary case. For example, one could read the argument of [12] after the statement of proposition 4.2 agnostic to the random matrix motivation. We defer to the next section a discussion on the moments of moments conjecture.
Related results. Recently, Assiotis extended the results of [12] to handle moments of moments of characteristic polynomials of the Circular β ensemble (CβE) (cf. (154)) for β > 0. There is a natural generalization of (245) to the entire CβE, where, temporarily, we have changed the notation for the moment parameters. The expectation in (254) is with respect to the β-dependent CβE measure (154), and P N is the characteristic polynomial for the appropriate ensemble 43 .
Theorem 3.23 (Assiotis [11]). Let β > 0 and k, q ∈ N. If moreover β is such that Further, the leading order coefficient is strictly positive and finite. As in [12,13], it can be described as an integral of a weight over constrained, continuous lacing arrays 44 .
way -the description of γ k,β found in [15]). Theorem 3.23 is proved using generalizations of the ideas in [12]; in particular a result of Matsumoto [115] generalizes proposition 3.9 to the CβE using Jack polynomials. It is expected that the result should hold for β < 2kq 2 , for (possible noninteger) k > 1. As suggested by [111] (see the discussion after theorem 3.5), at β = 2kq 2 there should be a transition point and the leading order in N of the moments of moments should be like N log N, as it is in the unitary β = 2 case.
Recall that the asymptotic statement of theorem 3.6 may be rederived by using asymptotics of Toeplitz determinants 45 , see section 3.2. Therefore, it would be interesting to extend this approach developed by Claeys and Krasovsky in [42] and Fahs in [73] to the more general moments of moments. For the orthogonal and symplectic cases, this would require uniform asymptotics for determinants of the form Toeplitz ± Hankel as the singularities merge. Such an approach would almost certainly have the benefit of not requiring the integrality conditions of theorems 3.13 and 3.14. It would also be interesting to describe the full moments of moments behaviour, akin to (97).
Recently, Claeys et al. [41] proved such uniform asymptotics, using a Riemann-Hilbert analysis. Their work allows for the associated Fisher-Hartwig singularities to vary with the matrix size, and extends previous work on Toeplitz ± Hankel determinants [14,[20][21][22]60]. They first write averages over the orthogonal and symplectic ensembles 46 in terms of unitary group averages and certain orthogonal polynomials on the unit circle.
By analysing these asymptotically (provided the singularities do not approach the symmetry points faster than the scale of mean eigenvalue spacing), they derive the relevant uniform asymptotics. In the case of two merging singularities, the leading order coefficient can be determined explicitly using theorem 3.3. More generally, the asymptotics are only known up to a multiplicative constant. To prove stronger results, one would need to prove a stronger version of theorem 3.4.
Finally, we discuss recent work of Najnudel, Paquette, and Simm [121]. It is now wellestablished that characteristic polynomials can be normalized to converge to GMC measures. Let 45 As noted, these techniques additionally permit the β parameter to be a positive real number. 46 Their analysis includes SO(2N ), SO(2N + 1), Sp(2N ) as well as O − (2N + 1) and O − (2N + 2), the complementary cosets of the orthogonal group of matrices with determinant −1. It is useful to record that the joint probability density for Sp(2N ) is the same as for O − (2N + 2) -see for example [116]. 47 The result extends to the orthogonal and symplectic groups results of Hughes et al. [93] showing that the real and imaginary part of the logarithm of unitary characteristic polynomials jointly converge to a pair of Gaussian fields on the unit circle, cf. (100). See appendix A of [76] for a statement of the results. us briefly set the notation for the CβE characteristic polynomial where e iθ 1 , . . . , e iθ N distributed according to the CβE measure (154). If β = 2 then P 2 N coincides with P U (N ) . Then we recall that the secular coefficients Sc n (N) (cf. (221)) of P β N are exactly the polynomial coefficients Sc n (N)z n .
Clearly Sc n (N) also has a β-dependence, but we suppress it in favour of a cleaner notation. In [43,107], unitary secular coefficients are related to various number theoretic averages. Again for β = 2, moments of secular coefficients have been related to certain combinatorial objects: namely magic squares [65]. Haake et al. [85] determined that E[Sc n (N)] = 0 and E[| Sc n (N)| 2 ] = 1 (where the averages are taken over U(N)). Diaconis and Gamburd extended the analysis to higher moments using the combinatorial approach. Their result gives the limiting distribution for the nth secular coefficient as N grows: it is a polynomial of degree n in independent Gaussian random variables. However, such as statement cannot handle (for example) the middle secular coefficient Sc ⌊ N 2 ⌋ (N). In [121], the authors use Holomorphic multiplicative chaos to show that in the CUE the central secular coefficient tends to zero as N → ∞.
are both tight families of random variables. If n N → ∞ such that 2n N ≤ N then Sc n N (N) The relationship of theorem 3.24 to this review, beyond being an important result in random matrix theory in its own right, is the introduction of holomorphic multiplicative chaos. Define on the unit disk G C (z) = ∞ k=1 z k √ k N k , for N k independent standard complex normals. On the circle |z| = 1, this is the usual log-correlated Gaussian field. Set θ := 2/β. Then, if φ is a trigonometric polynomial and for z on the unit disk z → φ(z) is the extension to the disk, then is holomorphic multiplicative chaos HMC. Such a random distribution first implicitly appeared in [40]. If φ(x) = e inx for n = 0, 1, 2, . . . , then the limit (260) is simply the nth coefficient in expansion of exp θG C (z) at z = 0. Set c n to be the nth Fourier coefficient of the HMC θ . Using a result of Chhaibi and Najnudel [40], which connects GMC measures to the CβE, the authors show that Sc n (N) almost surely converges to c n , as N → ∞ (for β > 0). This is shown via the help of Verblunsky coefficients, cf. (169). Hence, one can work directly with the HMC θ and its 'secular coefficients' c n (which usefully have an approximate Martingale structure).
Najnudel et al. also prove interesting results regarding the limiting distributions of the secular coefficients in the case β > 4, freezing-style properties associated with the HMC θ , as well showing that the combinatorial magic-square results of Diaconis and Gamburd [65] extend to the β > 0 regime. We direct the interested reader to their paper [121].
Modelling families of L-functions. One reason for specializing to SO(2N) and Sp(2N) is that these groups, along with U(N), are conjectured to model various number theoretic families [50,52,103,104,109,110]. The canonical example, as demonstrated throughout section 2, is using unitary characteristic polynomials to model ζ(1/2 + it). Explicitly, the set (261) {ζ( 1 2 + it)|t > 0} is said to display unitary symmetries in that, for example, the moments {L( 1 2 + it, χ d )| d a fixed, fundamental discriminant, t ≥ 0}, see Rudnick and Sarnak [132]. However, one can instead take the value of L(s, χ d ) at a fixed symmetry point s = 1/2 and average over d. One then should recover symplectic symmetries (see for example [50,109]). To this end, define the family (265) {L 1 2 , χ d | d a fundamental discriminant, χ d (n) = d n }. This is an example of a symplectic family. To demonstrate this, one can turn to moments. It is conjectured that (see for example [50]) as D → ∞. The sum is only over fundamental discriminants d; D * is the length of the sum; and a L D (β) has a similar form to (44) (see Conrey et al. [52] for example for the full formulation). The conjecture is based on work of Jutila [101] and Soundararajan [135], and so the values of c L D (β) are known for β = 1, 2, 3 and conjectured for β = 4. Keating and Snaith [109] determined the asymptotic behaviour of the 2βth moments of symplectic characteristic polynomials at the symmetry point, Unlike U(N), one no longer has rotational invariance of the measure. The 'symplectic' behaviour is captured at the symmetry points: ±1. See [106] for an asymptotic description of the moments (also for the orthogonal case) for a fixed point away from ±1, where there is a transition back to unitary statistics on the scale of mean eigenvalue spacing. As with the comparison between ζ(s) and unitary polynomials, one associates N, the matrix size, with the logarithm of the 'height' of the family: N ∼ log D. With this dictionary in place, the symplectic form of (266) is evident and suggests that c L D (β) = c Sp (β). All known values of c L D (β) indeed satisfy such a relation.
Finally we give an example of an orthogonal family. In this case, one considers two categories: even and odd 48 . The even families are related to the matrices from SO(2N), and the odd to those from SO(2N +1). However, since the 'odd' L-functions take the value 0 at their symmetry point 49 , we only consider 'even' families. In general, the L-functions displaying orthogonal symmetries are more complicated than the Dirichlet L-functions, see (28). Arguably the simplest example is derived from L-functions attached to elliptic curves. Recall these were defined by (31).
In fact, we are interested in certain twists by a Dirichlet character χ d (n). Define for Re(s) > 1 and by analytic continuation otherwise. The coefficients a n are the weightings from the L-function associated to the elliptic curve E, cf. (31). Hence, fix E an elliptic curve over Q such that the sign of its functional equation is +1 (i.e. provided the L-function is even). Then, one can define the family (270) {L E (1, χ d )| d a fundamental discriminant, χ d (n) = d n }. This is an example of an orthogonal family. The corresponding Riemann hypothesis for L E (s, χ d ) places the critical line at Re(s) = 1, rather than the conventional 1/2. This is why the L-function is evaluated at 1 in (270). However, this difference is merely due to the conventional normalization one chooses when defining elliptic curve L-functions (cf. (31)); it is simple to redefine L E so that its critical line is shifted to Re(s) = 1/2.
It is conjectured that (see for example [50]) as D → ∞. The sum is only over fundamental discriminants d; D * is the length of the sum; and a L E (β) again has a similar form to (44) (see Conrey et al. [53]). Once again, small moments, and hence small values of c L E , have been computed, see [50]. 48 The parity comes from the sign of the functional equation for the L-function, see for example [52]. 49 This is a consequence of their functional equation, see for example [52]. Keating and Snaith showed that the asymptotic behaviour of the 2βth moments of special orthogonal characteristic polynomials at the symmetry point is asymptotically where c SO (β) is the leading order moment coefficient depending on β, As with the symplectic case, one no longer has rotational invariance of the orthogonal measure compared to unitary moments. Like in the case of Dirichlet L-functions, by comparing N with log D, the orthogonal symmetry is evident. Additionally, this furnishes the conjecture that c L E (β) is equal to c SO (β) [109].
Given these families, one can define the associated moments of moments. For (265) where the sum is only over fundamental discriminants d, and D * denotes the number of terms in the sum. We would expect that the asymptotic behaviour of (274) should be given by theorem 3.13. For (270), the moments of moments are where again the sum is only over fundamental discriminants d, D * denotes the number of terms in the sum, and 1 is the symmetry point for L E . Theorem 3.14 should determine the asymptotic behaviour of (275). We now turn to moments of moments of ζ(1/2 + it). For T > 0 and Re(β) > −1/2 Choosing to integrate over intervals of length 1 in the integrand of (276) may be easily extended to any interval of O(1), with the appropriate normalization. Harper [89] determined an asymptotic upper bound on (276) for a particular parameter range.
Harper argues that this upper bound is likely to be best possible. Notice that for k ∈ (0, 1), the growth is slower than one would conjecture using (97). It would be interesting to prove if the equivalent result holds on the random matrix side.
Conjecture 3.1 (Conrey et al. [52]). Take k, β ∈ N, and h = (h 1 , . . . , h k ) with h j ∈ R. Then Under conjecture 3.1, MoM P k,β (T ) should approximate MoM ζ T (k, β) up to a power saving in T . One may note that the structure of (280) is very similar to the multiple contour integral appearing in lemma 3.12, used to prove theorem 3.6. The main result of [17] is the following. where the term α k,β = A kβ (0, . . . , 0) contains the arithmetic information and A kβ is as defined in (282), and γ k,β denotes the remaining, non-arithmetic, contribution (for further details see [17]).
The proof of theorem 3.26 follows by shifting the contours of (280) and then applying complex analytic techniques to determine the correct asymptotic behaviour in log T . It is also evident from the proof that the result will still hold (albeit with a simple modification to the leading order coefficient) if the ranges of the inner integrals of (283) are over any O(1) interval. Additionally, since conjecture 3.1 generalizes to other L-functions, the proof techniques of [16] would additionally apply more generally.
Finally, the theory of moments of moments is not confined to just studying the classical compact matrix groups. There are natural extensions to other matrix ensembles and here we mention some related results. 50 The conjecture in [52] is more general in that it permits more general families of L-functions, non-uniform shifts, as well as integrating against different weight functions.
For Hermitian ensembles, Jonnadula, Keating, and Mezzadri [99,100] proved a version of proposition 3.9 using multivariate orthogonal polynomials. They subsequently calculated the asymptotics of the moments of the associated characteristic polynomials, and , in the GUE case, identified a subtle dependence on the parity of the matrix size. Bourgade, Mody, and Pain [32] have recently established a local law and central limit theorem for general β-ensembles using loop-equations. These are akin to the Gaussian β-ensembles, but with a generic potential. There a log-correlated field also naturally arises. Similarly, if one considers Wigner matrices 51 , then in [31] Bourgade and Mody show that in the bulk the characteristic polynomial 52 also has logarithmic correlations. Originally this result was conditional on the equivalent result holding for the GOE (resp. GUE for complex Wigner), but following [32] now holds unconditionally.
Number theoretically, one is also not restricted to studying averages of L-functions as evidenced by the recent work of de la Bretèche and Fiorilli [58] on moments of moments of primes in arithmetic progressions.

Explicit examples.
In [15], explicit examples of the polynomials MoM U(N ) (k, β) were computed for small, integer values of k and β. There are numerous methods available for computing moments of moments, as outlined previously in this section. Specifically, one can compute the first couple of moments by hand using either the combinatorial Schur function approach 53 or via the multiple contour integral (3.12). However, as soon as k, β increase (and in particular, the k parameter) this technique becomes infeasible.
Further progress is possible using the ratios theorem of Conrey, Farmer and Zirnbauer [53,54] with the aid of computer algebra software. Keating and Scott [108] produced initial code based on the ratios conjecture which produced the first four novel 54 unitary moments of moments. However, this technique also proves to be inefficient, especially as k increases.
The method used to compute the polynomials presented here was instead the Toeplitz determinant representation 55 . By (228), one can write (219) as a Toeplitz determinant D N (f ) for a particular symbol f , depending on θ 1 , . . . , θ k . After computing the circle integrals of D N (f ), we recover the moments of moments polynomials. A selection are given here, for a more complete list see [15].

51
(Real) Wigner matrices generalize the GOE; they are N × N random Hermitian matrices such that the entries above the diagonal are independent centred random variables some fixed common variance. 52 This result holds for both the real and imaginary part. 53 Equivalently, the Gelfand-Tsetlin pattern approach of [12,13]. 54 Recall that the Keating-Snaith result, theorem 1.1, gives a full description of MoM U(N ) (1, β). 55 We thank Chris Hughes for first highlighting this approach to us. Similar techniques can be adapted to calculate moments of moments for the symplectic and special orthogonal groups (see (246) and (248)), and such computations can be found in [12].

ACKNOWLEDGEMENTS
ECB is grateful to the Heilbronn Institute for Mathematical Research for support. JPK is pleased to acknowledge support from ERC Advanced Grant 740900 (LogCorRM). We would like to thank the anonymous reviewers for their helpful suggestions and questions.