Worm Algorithm for CP(N-1) Model

The CP(N-1) model in 2D is an interesting toy model for 4D QCD as it possesses confinement, asymptotic freedom and a non-trivial vacuum structure. Due to the lower dimensionality and the absence of fermions, the computational cost for simulating 2D CP(N-1) on the lattice is much lower than that for simulating 4D QCD. However, to our knowledge, no efficient algorithm for simulating the lattice CP(N-1) model has been tested so far, which also works at finite density. To this end we propose a new type of worm algorithm which is appropriate to simulate the lattice CP(N-1) model in a dual, flux-variables based representation, in which the introduction of a chemical potential does not give rise to any complications. In addition to the usual worm moves where a defect is just moved from one lattice site to the next, our algorithm additionally allows for worm-type moves in the internal variable space of single links, which accelerates the Monte Carlo evolution. We use our algorithm to compare the two popular CP(N-1) lattice actions and exhibit marked differences in their approach to the continuum limit.


Introduction
The classical CP N −1 model was introduced in 1978 in different contexts [1,2,3]. Shortly afterwards, also the two-dimensional quantum theory was discussed independently in [4] and [5], where it was shown that (among other interesting properties) the model possesses a non-trivial vacuum structure with stable instanton solutions and that it incorporates the phenomena of confinement and asymptotic freedom, which are also key features of fourdimensional Yang-Mills theories. The two-dimensional CP N −1 model is presumably the simplest model which possesses all of these properties and is therefore an ideal toy model to study their interrelations. After the model was studied perturbatively [4,5] and by means of a 1/N expansion in the continuum [4,5,7] and on the lattice [6,7] as well as by means of a strong coupling expansion [10], a crosscheck of some predictions by direct Monte Carlo simulations was attempted in [8]. Although a sophisticated over-heat bath algorithm was used in the latter work, it was found that the simulations suffer from an exponential critical slowing-down of topological modes. About the same time, a cluster algorithm for the CP N −1 model was proposed in [9] and tested for N = 4, 5, but in contrast to the Ising or to O(N ) models, where cluster algorithms solve the critical slowing-down problem almost completely, it was found that for CP N −1 , the cluster algorithm does not help in overcoming critical slowing-down, which would be necessary in order to perform further non-perturbative checks of the large N and continuum predictions. After the introduction of the worm algorithm [14] in 2001 as an alternative to cluster algorithms in order to overcome critical slowing down, a reformulation of the lattice CP N −1 partition function in terms of O N 2 dual, integer-valued flux-variables per link was proposed in [12], which at first sight seemed to be suitable to be updated by a worm algorithm. However, it was later found in [11] that an ordinary worm algorithm gives rise to an ergodicity problem when applied naively to this dual flux-variable formulation, as soon as N > 2. The reason for this problem has been identified only recently and one of the main purposes of this paper is to present and test our solution to it, which consists of an extension of the ordinary worm algorithm, so that the worm does not just move from site to site but also moves in internal space (in a particular way) at intermediate steps.
Meanwhile two alternative flux-variable representations [13,15] for the CP N −1 partition function have been proposed: the version in [13] comes out with just a single flux variable per link and the system is updated with a so-called loop algorithm (more precisely [13] describes two formulations and algorithms: one for positive integer N and one for real N ). Unfortunately, just as the cluster algorithm, also this loop algorithm was found to perform no better than the over-heat bath algorithm. The other dual formulation [15], which describes configurations in terms of O 2 N flux-variables per link, has not been implemented and tested so far, which is why we will also shortly discuss and test an algorithm to simulate this dual formulation and compare it to our new algorithm for the O N 2 d.o.f. per link version. The reason why these dual formulations are interesting is the following: they do not give rise to a sign problem when introducing one or more chemical potentials and allow therefore numerical studies of finite density effects [15]. But there is also another reason: as already mentioned, it has been found in [8] that in standard simulations of the CP N −1 model, topological modes suffer from exponential critical slowing down, which is most likely caused by quickly decreasing tunneling rates between different topological sectors for increasing 2 where z ∈ C N is an N -component complex scalar field subject to the constraint z † z = 1, D µ = ∂ µ + i A µ is the covariant derivative with respect to an auxiliary U(1) gauge field A µ and g is the corresponding coupling strength. Using the Euler-Lagrange equations for A µ , the classical solution is found to be which can be substituted back into (1.1) to obtain which contains a term quartic in the fields z [3,5,7]. There is no kinetic term for the U(1) gauge field A µ , so classically, it is just a dummy field. In the quantum theory however, quantum fluctuations generate a kinetic term for A µ [5] and turn it into a dynamic field.
After the second equality sign, we made use of the well known identity for modified Bessel functions of the first kind, (1. 6) as was already done in [6], from which, setting t = e i θ , it follows that 2 π 0 dθ 2 π e 2 a cos(θ) = I 0 (2 a) . (1.7) By discretizing (1.3), we find the quartic action (1. 8) and the corresponding partition function The two partition functions Z A and Z Q are obviously not identical but closely related: by expanding the link weights appearing on the last lines of (1.5) and (1.9) in power series, we find for the one in (1.5), β nx,µ z † (x) · z(x + µ) 2 nx,µ n x,µ ! w nx,µ (β) , (1.10) with w n (β) = β n e −2 β n! , (1.11) and for the one in (1.9) respectively (1.12) So, (1.5) differs from (1.9) only by the presence of an extra weight factor (1.11) for each term of the power-series expansion of the link weights.
As expected from universality, and as argued in [7] on the basis of a large N study: in the continuum limit (β → ∞), in (1 + 1) dimensions, the two partition functions (1.5) and (1.9) seem to give rise to the same physics. In the strong coupling regime however, the two lattice formulations can behave differently: the system described by (1.9) (still in (1 + 1) dimensions) develops a first-order transition in the limit (N → ∞), which separates the strong coupling phase (β/N 1) from the weak coupling one (β/N 1) [7]. For the partition function (1.5) on the other hand, the transition is absent in the limit (N → ∞) and only the weak coupling phase survives in the infinite N limit [7]. We will verify these statements in our numerical simulations in Sec. 4 .

Dual Formulation
After having introduced the two most-widely used lattice formulations of the CP N −1 model, (1.5) and (1.9), we continue in Sec. 2.1 by giving a quick review on how the fluxvariable representation of (1.9), as introduced in [12], can be obtained, and how this representation can also be used for the partition function (1.5). In Sec. 2.2 we briefly discuss how chemical potentials can be incorporated into the flux-variable representations of (1.5) and (1.9) without giving rise to a sign problem. Finally, in Sec. 2.3 we define some observables that will be used later on.

Flux-Variable Representation of the Partition Function
For simplicity, we describe the dualization procedure on the example of the quartic action version (1.9) of the lattice CP N −1 partition function which was also the one that was used in [12]. We follow the derivations in [12,11] and start by explicitly writing out all sums in the exponential of (1.9) : (2.1) Now we write the exponential of the summed terms in (2.1) as the product of exponentials of the individual terms and then use the power series representation for each of these exponentials to find: ∞ n a b x,µ =0 β n a b x,µ n a b x,µ ! z a (x)z b (x) z b (x + µ)z a (x + µ) n a b x,µ , (2.2) 5 which can be written as with N an unimportant normalization constant and F (q, p) being the result of the integration over the z a (x) = r a x e iφ a x on each site, i.e. (2.5) The path integral has now turned into an infinite sum of terms which are labeled by different values of the d · V · N 2 non-negative, integer-valued n a b x,µ variables. The n a b x,µ are called fluxvariables as they live on the links of the lattice and, according to (2.2), can be interpreted as enumerating the number of hoppings of pairs (mesons) z a z b between the sites x and x + µ.
The partition function (2.2) is the one that was used in [11]. In the given form, all the n a b x,µ variables in (2.2) are subject to the discrete delta-function 1 constraints in (2.4) (via (2.5)): each of the n a b x,µ appears in two such constraints on each site that it touches. This, in combination with the large number of d · V · N 2 distinct n a b x,µ variables, makes it difficult to recognize the true structure of the constraints and correspondingly, what freedom is left in the choice of values for the n a b x,µ .
To reduce the number of constrained variables, we decompose the N × N matrices n a b x,µ into symmetric and anti-symmetric pieces, parametrized by new variables k a b x,µ ∈ Z and l a b x,µ ∈ N 0 , such that: i.e. (2.7) After carrying out the substitutions, we find: where now only the N (N − 1)/2 independent components of the anti-symmetric k a b x,µ are still subject to the delta-function constraints while the N (N + 1)/2 independent components of the symmetric l a b x,µ are free. By using the symmetry properties of the k a b x,µ and l a b x,µ , (2.8) can also be written as: (2.9) in which form the dependency of the weights on the flux variables, as well as the constraints that are imposed on the k variables, are most easily recognized (the constraints will be discussed further in Sec. 3.1). Note that (2.9) possesses global Z 2 and Z N symmetries, as all weights are both, invariant under a collective sign-flip (k → −k) of the k-variables, and under collective cyclic shifts of the internal space indices (a → a + 1) of all the k and l-variables.
To obtain also Z A from (1.4) in terms of k and l-variables, we note that using the multinomial expansion, and therefore, since according to (2.7), we have that n a b x,µ , a configuration of k and l-variables represents on each link the contribution of a monomial that is part of a multinomial (2.10) with 11) and picks up the corresponding link weight w nx,µ (β), given by (1.11) when changing from 7 the quartic action (1.3) to the auxiliary U(1) action (1.1). We therefore find for Z A : which is also invariant under a collective sign-flip (k → −k) and under collective, cyclic shifts of the internal space indices (a → a + 1) of all the k and l-variables.
Before we continue with the discussion of the properties of the flux-representation of Z Q and Z A , we introduce the following vocabulary or naming for the different indices of the flux-variables k a b x,µ and l a b x,µ : as usual, x refers to a site on the lattice and µ to a direction. The indices a, b, we call internal space indices, but it should be kept in mind that "internal space" does not mean "flavor space": a, b are not flavor indices, i.e. the matrix k x,µ does not transform like U k x,µ U † under a global flavor symmetry transformation U , as should be clear from the fact that k x,µ is always a matrix of integers which cannot undergo smooth changes. Flavor space has already been integrated out completely in the flux representations (2.9), (2.12) of the two partition functions Z Q and Z A , and the fact that a particular combination of values for the k a b x,µ variables gives rise to a non-vanishing weight for the corresponding configuration in Z A or Z Q , just means that the product of all the z a and z b in the lattice, with the multiplicities that are given by the values of all the k a b x,µ matrices, contains a "singlet" which survives the integration over the flavors.

Conserved Currents and Chemical Potentials
The classical CP N −1 model possesses (N − 1) conserved currents, generated by the (N − 1) diagonal generalized Gell-Mann matrices, and a chemical potential can be coupled to each of the corresponding conserved charges, as has already been shown in [15]. To see how these chemical potentials enter in our fluxvariable representation (which is based on O N 2 variables per link), we will first give an alternative derivation of the flux-variable formulation given in [15] (which is based on O 2 N variables per link), from which we can then deduce how also our k a b x,ν variables should couple to these chemical potentials.
The starting point is again the auxiliary U(1) lattice partition function (1.5), to which we 8 add in the usual way the coupling to the chemical potentials: where, after the second equality sign, we have again used the identity (1.6), as well as that x and U ν (x) = e iθx,ν , with corresponding measure Furthermore we introduced the N "single flavor" chemical potentialsμ a = i µ iλi,a a . By using that 2 x |k|+2 l (|k| + l)! l! , (2.16) and by integrating out the angles θ x,ν and φ a x , the partition function (2.14) becomes (we add a tilde to Z A in order to distinguish it from (2.12)) which is our desired equation.
As in the previous section, we can use the relation between (1.10) and (1.12) to write down an expression forZ Q (a version of Z Q with 2 N instead of N 2 degrees of freedom per link): we simply have to divide each link-weight in (2.17) by where this time, in terms of the k a x,ν and l a x,ν we have (2.19) 2 We interpret factorials in terms of Gamma functions, i.e. n! = Γ(1 + n), so that 1 n! = 1 Γ(1+n) = 0 for integers n < 0.

9
The resulting expression forZ Q then reads: In contrast to (2.8) and (2.12) which contain only one type of delta-function constraints, namely for each site a product x,ν . The on-link constraints (2.23) are then automatically satisfied due to the antisymmetry of the k a b x,ν in the internal-space indices (a, b), which is the reason why this constraint is absent in the formulations (2.8) and (2.12). The k a b x,ν in (2.12) should therefore couple toμ a through an extra weight factor Note that forμ a = 0, the partition functions (2.17) and (2.20) are also invariant under a collective flip of the signs of all k-variables or a collective, cyclic rotation of the internal space indices, just as (2.8) and (2.12). But a non-zero value of one of the chemical potentials in general breaks this global Z 2 and Z N symmetries explicitly in all four versions.

Observables
When working with a dual, flux-variable representation of a partition function, the definition of meaningful observables that can be measured during a Monte Carlo simulation is much less intuitive than when working with the original configuration variables. The only safe way to get correct expressions for physical observables in terms of flux-variables is either to define and insert the observable before the dualization process and dualize directly the resulting expression (see the derivation of the two-point function (3.6) in Sec. 3.3), or to define the observable in terms of derivatives of the logarithm of the partition function.
The only observables we are interested in, which cannot be defined directly in terms of derivatives of log(Z), are related to the two-point function 3 (3.6) defined below in Sec. 3.2, namely the magnetic susceptibility, , and the so-called second moment correlation length (see e.g. [8]), where it should be noted that in this form, (2.25) and (2.26) are only valid as long as no non-trivial condensate develops, as otherwise the corresponding disconnected pieces would have to be subtracted from the two-point function first. In any case, as we are interested in comparing our results with those from the literature [6,7,16] where the magnetic susceptibility χ m and the second moment correlation length ξ G have been defined as (2.25) and (2.26) respectively, we will do so as well.
The remaining observables that will be considered are the average energy per site, (2.27) and the specific heat, (2.28) 3 The two-point function (3.6) and the magnetic susceptibility could of course be defined in terms of derivatives of Z or log(Z) if we would add appropriate source terms ∼ i J a b (x) z a (x)z b (x) to the action, or alternatively the adjoint form generators. In the latter case the magnetic susceptibility reads χ m = 1 2 V x,y,i , which, according to the Fierz identities for the generators of SU(N ) and because z † (x) · z(x) = 1, is the same as (2.25), provided that φ a b vanishes.

Simulation Methods
Because of the constraints imposed on the k-variables in the dual versions (2.8), (2.12), (2.17) and (2.20) of the CP N −1 -partition functions (1.5) and (1.9), a worm algorithm has to be used to generate the configurations required for Monte Carlo estimates for expectation values of observables. In this section, we first give an explanation for the apparent ergodicity problem [11] that occurs when using a "naive" worm algorithm to simulate the CP N −1 model with N > 2 in the flux-variable representation from [12] (i.e. our Z Q from eq. (2.8)) and then introduce our "internal space sub-worm algorithm" which solves the problem.

Constraints
The structure of the constraints imposed on the k-variables in our flux-variable formulation (2.9) of the CP N −1 model is more involved than what one encounters for example in the flux-variable formulation of the O(N ) [15] or the principal chiral SU(2) [22] model. The kvariables in (2.9) are on each site subject to the following discrete delta-function constraints: As the k-variables enter (3.1) always in the form of a sum over µ and b, it seems at first that the constraint is in fact not that restrictive and that a defect in one of the delta functions in (3.1), coming from a change in, say k 5 6 x− µ,µ could be compensated not only by changing k 5 6 x,µ (i.e. by propagating the defect from site x to site x + µ) but also by e.g. changing k 5,2 x− µ,µ instead. However, as the k a b x,µ are anti-symmetric in the indices (a b), each k-variable appears in fact in two of the delta-function constraints for the two sites that it touches. This means that a change, for example an increase of k 5 6 x− µ,µ , introduces in (3.1) not just a defect in the constraint for a = 5 but also another one in the constraint for a = 6. A decrease in e.g. k 5 2 x− µ,µ would therefore just remove the defect in the constraint for a = 5 but not the one in the constraint for a = 6, and instead would introduce another defect in the constraint for a = 2. The net effect would just be to move the defect from the constraint for a = 5 to the constraint for a = 2. As illustrated in Fig. 1 the defects in the delta-functions for a = 2 and a = 6 are however of such a form that we can remove them simultaneously by updating a third k-variable, namely k 2,6 x− µ,µ ! This trivially extends to arbitrarily long chains of k-variables that live on the same link. So, although it is not possible to freely update pairs of k-variables that live on the same link, one can update arbitrary cycles consisting of at least three such k-variables, which naturally gives rise to what we will call an internal space worm.
In Sec. 3.3 we will describe an algorithm which combines a conventional worm that propagates defects from site to site with an internal space sub-worm which makes it possible that the same defect can be propagated by different types of flux-variables. x,ν −k 5 6 x,ν k 2 5 x,ν −k 2 5 x,ν k 2 6 x,ν −k 2 6 x,ν x,ν that points from the site x in ν-direction. If we increase for example the component k 5 6 x,ν (indicated by a filled red circle) this implies, due to the anti-symmetry of the k a b x,ν -variables in the (a, b)-indices, that we decrease at the same time the component k 6 5 x,µ = −k 5 6 x,ν (indicated by an empty red circle). In order to satisfy all the delta-function constraints in (2.9) (without changing a k-variable that points in another space-time direction) we have to compensate the changes in k 5 6 x,ν and k 6 5 x,ν so that the total sums of changes done to all components with the same index a (i.e. along the horizontal lines) are zero: we can therefore decrease for example k 5 2 x,ν = −k 2 5 x,ν to compensate for the increase in k 5 6 x,ν . This however also increases k 2 5 x,ν which has again to be compensated by decreasing any of the k 2,b x,ν . For simplicity, we choose to decrease k 2 6 x,ν , such that the corresponding increase in k 6 2 x,ν = −k 2 6 x,ν compensates not only the increase in k 2 5 x,ν but also the decrease in k 6 5 x,ν = −k 5 6 x,ν from the initial update, rendering all delta-functions non-zero again. The minimal length for such an internal space update cycle is three, i.e. one has to update at last three different components of the k x,ν matrix, but there is no upper bound on the length of such update cycles.
In the naive implementation of [11], an ordinary worm was used (see next Sec.), where the internal-space index pair (a, b) of the flux-variables that are updated are kept fixed while the worm propagates the corresponding defect on the lattice; only when the worm closes, the internal space index pair of the flux-variables that are to-be updated can change. This means that if the worm length increases as a function of β, the algorithm updates for increasingly long times always only one type of configuration-variables, i.e. the k-variables which have all the same internal space index pair (a, b). When the worm finally closes, the algorithm can be faced with an energy barrier that has to be overcome in order to change this index pair. This becomes particularly clear when we consider the situation where the simulation has just started (we start with k a b x,ν = 0 ∀a, b, x, ν) and the first worm has just finished updating the, say, (a, b) = (2, 3) components of the n-variables in (2.3) (related to the k-variables through (2.11)). This means that the factors for a = 2, 3 in the numerators of the site-weights (2.4) are already increased, while the factors for the remaining values of a are still zero, and it therefore costs some energy (which can become quite large) to move the defect from (a, b) = (2, 3) to another internal-space index pair and it could therefore take a very long time for the system to thermalize. By extending the standard worm algorithm with 13 the internal space worm as described below in Sec. 3.3, not just the (a, b) = (2, 3) components of the k-variables are updated while the defect for (a, b) = (2, 3) is propagated on the lattice, but also the other components of the k-variables. The values of the different factors in the site-weights (2.4) therefore grow more homogeneously and the energy barrier for a change of the internal space indices of the defect after the worm has closed, does not form. Another difficulty arises: how can an algorithm that uses an ordinary worm to update the k-variables achieve that the (a, b) = (2, 3) component of a single k-variable is non-zero while the (a, b) = (2, 3) components of all its neighboring k-variables are zero? As with the standard worm algorithm, just one type of k-variable can be updated during a single worm update, and because worms always produce connected strings of updated variables, it would require not just one but several worm-moves along a special trajectory to produce such a situation.
With the internal space sub-worm algorithm (see Sec. 3.3) on the other hand, such situations are produced all the time during individual worm steps! The internal space sub-worm algorithm is therefore able to take shortcuts in configuration space between configurations, which would be hard to connect with the ordinary worm algorithm.
Note that if N ≤ 2, the internal space sub-worm algorithm is identical to the ordinary worm, as non-trivial internal space update cycles are possible only if N ≥ 3. This explains why in [11] the ergodicity problem has only been observed for N > 2.

Ordinary Worm Algorithm
The general idea behind a worm algorithm [14] is to update configuration variables for a partition function Z, which are subject to a so-called closed loop constraint, by generating configurations that contribute to some partition functions Z 2 (x, y) instead of Z. Each of the Z 2 (x, y) is a partition function for the same system that is described by Z itself, but in the presence of an external source at x and an external sink at y, so that at these two sites, the closed loop constraint can be violated. The constrained configuration variables are then updated by moving around either the source or the sink and whenever source and sink meet again, a new configuration that contributes to Z can be obtained by dropping the source/sink pair.
As already mentioned, a naive application of this concept to the flux-variable representations (2.8) and (2.12) of the CP N −1 partition function leads to wrong results for N > 2 [11]. However, for the flux-variable representations (2.17) and (2.20), this naive, standard worm algorithm works. Therefore, and because our internal space sub-worm algorithm is a generalization of the standard worm, it makes sense to describe first the standard worm here on the example of (2.17).
Before we can explain how the worm works, we first need to know how an external source or sink appears in our flux-variable representation (2.17). As in the CP N −1 model, the fields z and z † cannot appear on their own due to local U(1) gauge symmetry, we have to consider U(1) gauge-invariant pairs (mesons) z a (x) z b (x) as sources and sinks, i.e. we define a source at a point x as (3.2) and the corresponding sink as We are therefore interested in the flux-representation of the following partition function: By going through the same steps that led us from (2.14) to (2.17), the partition function (3.4) can be turned into: where we marked in red the changes caused in (2.17) by the insertion of the external source/sink pair (3.2) and (3.3). Equation (3.5) is of course closely related to the two-point function for the field φ a b (x) from (3.2), which reads: and can therefore be measured [14] by recording how often the worm's tail is located at site x and its head at site y, such that the corresponding configuration contributesZ a b A,2 (x, y), as well as how often the external source-sink pair is removed, such that a configuration that contributes toZ A is produced.
The basic working principle of the ordinary worm-algorithm is the following (see Fig. 2): a worm update starts in a configuration that contributes to the partition sumZ A and proposes to insert at some site x = x 0 an external source/sink pair φ a b φ b a . If this insertion is accepted by a Metropolis acceptance test [20], one has a configuration that contributes toZ a b A,2 (x 0 , x 0 ). Now one can propose to move the sink (which can be thought of as the head of a worm) in a randomly chosen direction ν from site x to the neighboring site y = x + ν, compensating for the charge displacement by updating appropriate flux-variables. Due to the on-link constraint δ( c k c x,ν ) in (3.5) and because source and sink are mesons, one always has to update two k-variables simultaneously; in our case, if ν is a positive direction, the displacement of the sink would require to update k a x,ν → k a x,ν + 1 and k b x,ν → k b x,ν − 1, and vice versa if ν is a negative direction. If the proposed move is accepted, one obtains a configuration that contributes to the partition functionZ a b A,2 (x 0 , y). One can then set x = y, update y = x + ν for a new randomly chosen direction ν and again propose to move the head of the worm to this new site y. In this manner the worm's head continues to move to new sites y → y + ν (where ν is always chosen randomly) until x = x 0 so that the external sink φ b a hits again the site x 0 where the source φ a b is located. If this happens, it can be proposed to remove again the source/sink pair φ a b φ b a , and if this proposal is accepted, the worm update ends and one ends up in a new configuration that contributes to the original partition functionZ A . One can then take measurements for observables that depend on configurations ofZ A , if necessary, and then pick a new random location x 0 to start the next worm update.
x take measurements and pick new location x = x 0  .17): in the upper-left drawing, the worm update starts in a configuration that contributes to the partition sumZ A and proposes to insert at some site x = x 0 an external source/sink pair φ a b φ b a . If this insertion is accepted by a Metropolis acceptance test, one has a configuration that contributes toZ a b A,2 (x 0 , x 0 ) (see (3.5)). Now, as depicted in the next picture, one can propose to move the sink φ b a (which can be thought of as the head of a worm) in a randomly chosen direction ν from site x to the neighboring site y = x + ν, compensating for the charge displacement by changing k a x,ν → k a x,ν + 1 and k b x,ν → k b x,ν − 1 if ν is a positive direction and vice versa if ν is a negative direction. If the proposed move is accepted, one obtains a configuration that contributes to the partition functionZ a b A,2 (x 0 , y). One can then set x = y, update y = x + ν for a new randomly chosen direction ν and again propose to move the head of the worm to this new site y. In this manner the worm's head continues to move to new sites y → y + ν (where ν is always chosen randomly) until x = x 0 , so that, as depicted in the bottom-left drawing, the external sink φ b a sits again on the same site x 0 as the source φ a b . If this happens it can be proposed to remove again the source/sink pair φ a b , φ b a , and if this proposal is accepted, one ends up in a new configuration that contributes to the original partition functionZ A . One can then pick a new location x 0 and continue with a new worm update at x = x 0 .

Internal Space Sub-Worm Algorithm
As explained in the previous section, for our choice of flux variables, there is not just one, but an infinite number of ways how a defect in the delta-function constraints (as introduced for example by the displacement of an external source or sink) can be compensated by an update of a sequence of appropriate flux variables. Our internal space sub-worm algorithm (ISSW algorithm) takes this into account and thereby ensures ergodicity and ensures that the contribution of different configurations to the entropy is correctly taken into account.
We discuss the ISSW algorithm on the example of the quartic partition function Z Q form (2.8). So this time, we are interested in the corresponding flux-representation of: (3.7) By going through the same steps that turned (2.1) into (2.8), the partition function (3.7) yields: where we marked again in red the changes caused in (3.8) by the insertion of the external source/sink pairs (3.2) and (3.3) into (2.8).
The internal space sub-worm algorithm now starts exactly like the ordinary worm described in the previous section: one starts in a configuration that contributes to the partition sum Z Q and proposes to insert at some site x = x 0 an external source/sink pair φ a b φ b a . If this insertion is accepted by a Metropolis acceptance test, the system is in a configuration that contributes to the partition function Z a b Q,2 (x, x). One then proposes to move the sink in a randomly chosen direction ν from site x to the neighboring site y = x + ν and compensates for the charge displacement by updating appropriate flux-variables. But instead of just updating k a b x,ν , i.e. the k-variable with the same internal space indices as the source and sink (which would be the analog to the simultaneous update of k a x,ν and k b x,ν in the description of the ordinary worm above in Sec. 3.2), one now runs an internal space sub-worm cycle which explores all possibilities by which the defects introduced by the displacement of the external sink could be compensated; this includes the possibility of just updating k a b x,ν , but also the possibility of doing so by updating a sequence of k-variables of length n instead, e.g. k a c 1 x,ν k c 1 c 2 x,ν · · · k cn b x,ν with n > 1. Such a sequence is set up element by element in a wormlike manner: a random internal space index c 1 is chosen and one proposes to update the k a c 1 x,ν variable and to temporarily replace the original sink φ b a (y) by a source/sink pair φ b c 1 (x) and φ c 1 a (y) in order to compensate for the temporary new defects. If this update is accepted, one continues by choosing a new random internal space index c 2 and proposes to update k c 1 c 2 x,ν while replacing φ b c 1 (x) and φ c 1 a (y) by φ b c 2 (x) and φ c 2 a (y) respectively, and so on, until the new randomly chosen internal space index c n coincides either with a or b. If c n coincides with a, the internal space sub-worm cycle ends by bringing the original external sink φ b a (y) back to site x while having updated a closed cycle of k-variables, k a c 1 x,ν k c 1 c 2 x,ν · · · k cn a x,ν , which effectively propagates no defects, whereas if c n coincides with b, the sub-worm cycle ends by restoring the original external sink φ b a (y) on site y while having updated a sequence of k-variables, k a c 1 x,ν k c 1 c 2 x,ν · · · k cn b x,ν , that compensates for the defects that were introduced by moving φ b a from x to y. In the latter case, we now have a configuration that contributes to the partition sum Z a b Q,2 (x 0 , y). One can now set x = y and update y = x + ν for a new randomly chosen direction ν and start a new sub-worm cycle for the corresponding k-variables. In this manner the worm's head continues to move to new sites y → y + ν (where ν is always chosen randomly after every completed sub-worm cycle) until y = x 0 and the external sink φ b a hits again the site x 0 . If this happens, i.e. if the worm closes, it can be proposed to remove again the source sink pair φ a b , φ b a from the system, and if this proposal is accepted, one ends up in a new configuration that contributes to the original partition function Z Q .
Note that whenever the worm chooses a negative direction ν, the roles of the internal space indices a and b have to be interchanged for the corresponding sub-worm cycle. This is necessary in order to satisfy detailed balance for the start and end moves of the sub-worm cycles 4 .
If one would include appropriate non-zero meson-source terms (external background fields) in the CP N −1 action, the system would be allowed to produce dynamical meson sources and sinks which could replace some of the external, non-dynamical φ a b that appear in the above description of our algorithm. For example, the worm could then change the internal space indices (b, a) of its head at any time by replacing the third external source/sink φ b cn that appears temporarily during the sub-worm cycle, by a dynamical one, after which the worm's head would have internal space indices (c n , a) instead of (b, a). Furthermore, as is also the case in other models (see e.g. [22]), the worm can become disconnected as at any time, the external sink at the worm's head can be replaced by a dynamical sink and a new head can be inserted at a new location, together with an appropriate dynamical source. Also here, the internal space indices of the head could of course be changed during this process.
A more detailed description of the ISSW algorithm (without external background fields) is given by the illustration in Fig. 3 and the following step-by-step guide (the acceptance probabilities for the various moves will be discussed in Sec • if ν > 0: set a = a 0 and continue with (d), Figure 3: The figure illustrates (from left to right) how a sub-worm cycle of the algorithm described in the main text, works, assuming that at step (c) a positive direction ν is selected. The small grids represent the sum of all the k x,µ matrices and sources/sinks that enter the delta-function constraints for x and x +ν on the second line of (3.8). Different rows/columns correspond to different a/b coordinates of k a b x,µ respectively. Due to the anti-symmetry of k a b x,µ , there is also an anti-symmetry between the rows and columns in the grids and we can choose to just focus on the rows: each row in a grid represents one of the N constraints in the product on the second line of (3.8) and the net change done to each row has therefore to be zero. Individual changes can be caused by insertions of external source/sink pairs or by updates of flux-variables k a b x,ν . Source/sink pairs are represented by pairs ( , ), joined by a dotted line, where a in row a, joined to a in row b in the grid for site x represents a pairz a (x)z b (x) ≡ φ a b (x). An update of a flux-variable k a b x,ν → k a b x,ν + 1 is represented by a joined pair ( , ) in the grid for site x, where is in row a and in row b, and a pair ( , ) in the grid for site x +ν, where is in row a and in row b. In the last column, we have depicted the two possible ways in which the sub-worm cycle can end: either with b = b 0 , in which case the head of the worm can move from site x to site x +ν before a new direction ν is chosen, or with b = a 0 , in which case the head of the worm remains on site x and the worm just chooses a new direction ν. Note that in the former case, the net effect of all the and in the grid for the site x +ν is the same as that of a ( , ) pair representingz b (x +ν)z a (x +ν), such that we are again in the situation that we had at the beginning for the site x, but now this situation occurs on site x +ν. So the next worm step can proceed in a completely analogous way. Note that if at step (c) in the algorithm-description in the main text, a negative direction ν is chosen, the sub-worm cycle has to start with a = b 0 instead of a = a 0 in the second column in the figure, and also for the two possibilities by which the subworm cycle can end, as depicted in the last column, the roles of a 0 and b 0 are interchanged. This is necessary in order to satisfy detailed balance between start and end of the sub-worm cycles.
if accepted: set a = b and continue with (e), else: continue with (c), if accepted: set a = b and continue with (e), else: continue with (e), • else if b = a 0 propose to: if accepted: set a = b and continue with (g), else continue with (c), if accepted: set a = b and continue with (g), else: continue with (g), if accepted: update x → x + ν, continue with (b), else: continue with (g), So far we haven't mentioned how the unconstrained l-variables are updated: alternating between worm-updates and update sweeps for the l-variable is inefficient if the simulation parameters are such that the average worm-length is large, as the worm then evolves for a long time in a quasi-fixed l-background. Furthermore, such an update strategy would strictly speaking break detailed balance. A better alternative is therefore to incorporate the update of the l-variables into the worm-update by inserting in the above algorithm, whenever an attempt is made to update a k-variable, a random choice whether the worm should really try to update a k-variable, or if rather a Metropolis update of a randomly chosen l-variable should be attempted instead.

Detailed Balance
In order for a Markov process to generate a sequence of configurations, so that for any observable O, it holds that a sufficient condition is, that transitions between neighboring configurations (configurations which can be turned into each other by single updates), e.g. C and C , occur with probabilities P C → C and P C → C , which are in accordance with the detailed balance equation where w(C) and w C are the weights for the configuration C and C , respectively. This means that if in (3.9) we have C n = C, and a transition to C is proposed so that C n+1 = C , 21 then this proposal is only accepted with probability P C → C , whereas otherwise C n+1 = C. If the set of possible candidate configurations is always the same throughout the simulation, appropriate transition probabilities P C → C and P C → C can be obtained [20] by setting and P C → C = min 1, which obviously solves (3.11). However, for the algorithms described in the previous two sections, it often happens that the move C → C is chosen with a different probability than its inverse move C → C, in which case one has to use a generalization of the Metropolis algorithm, which is known as Metropolis-Hastings algorithm [21]. To do so, one thinks of each transition probability to consist of two factors: a move-choice or selection probability, p s , and a so-called reduced transition probability, P r : and similarly (3.14) Now, as the move-choice probability p s C → C is, as already mentioned, indeed the probability that if the system is currently in the configuration C, the algorithm chooses to propose the update that leads it into the configuration C , this factor of the full transition probability P C → C has already been taken into account at the moment where the final acceptance test for the transition is made. The final acceptance test should therefore rely only on the reduced transition probability P r C → C , which, in order to satisfy the detailed balance equation (3.11), can be defined as P r C → C = min 1, p s C → C w C p s (C → C ) w(C) .

(3.15)
Similarly, the corresponding inverse transition from C to C would be carried out with probability P r C → C = min 1, p s C → C w(C) p s (C → C) w(C ) .

(3.16)
If the move-choice probabilities p s for a move and its inverse are the same, i.e. p s C → C = p s C → C , then they just cancel out in (3.15) and (3.16) and one recovers the ordinary Metropolis acceptance probabilities (3.12).

Metropolis Acceptance Probabilities for the ISSW Algorithm
We are now in the position to discuss the acceptance probabilities (given by the reduced transition probabilities) for the different types of moves that occur in the internal space subworm algorithm described above in Sec. 3.3. While the move choice probabilities p s depend only on the algorithm, the transition probabilities are model-dependent and will therefore be different for simulations of Z Q and Z A from (2.8) and (2.12), respectively. We will only discuss the transition probabilities for the Z Q -case but those for the Z A can be obtained in a completely analogous way. To simplify notation, we introduce the following definitions: where A = A 1 , . . . , A N . To indicate a shift in the a-component of A, we will write A + a, with a = δ 1,a , . . . , δ N,a . (3.20) In the following description, C always refers to the configuration that is currently realized in the system and C to the one that is obtained if the update under consideration is carried out.

Start of an ISSW Update
The internal space sub-worm algorithm starts in a configuration that contributes to the partition function Z Q . A particular move (a). from the step-by-step description in Sec. 3.3 is selected with probability and its inverse with probability where p t is a free parameter which we define to be p t = 1/2. According to (3.15), the reduced transition probability for the move C → C can therefore be set to:

(3.23)
If the move is accepted, we set C = C and define y = x = x 0 . The system is now in a configuration that contributes to the two-point partition function Z a 0 b 0 Q,2 (x, y), defined above in (3.8).

Start of Sub-Worm Cycle
Whenever x = y, a new sub-worm cycle can be started by choosing a random direction ν ∈ {±1, . . . , ±d} (step (c). from above) and setting y = x + ν and and finally a =ã 0 . Next (step (d). or (f).), we choose a random internal space index b ∈ {1, . . . , N } \ {a} and call C the configuration to which the corresponding move would lead. 23 However, if x = x 0 , the sub-worm cycle is proposed only with probability (1 − p t ), as in this case, with probability p t , it can also be proposed instead to remove the source/sink pair from the system and terminate the ISSW update. The total move-choice probability p s C → C for starting the sub-worm cycle is therefore in general given by (3.25) The move-choice probability p s C → C for the inverse move depends on whether b =b 0 is true or not: • if b =b 0 , then C is an intermediate configuration from within a sub-worm cycle, which contains three external fields (sources/sinks), in which case we have (see Sec. 3.5.3 below) • whereas if b =b 0 , then the sub-worm cycle starts and ends at the same time (i.e. the move corresponds to an ordinary worm move), so that C is not an intermediate configuration from within a sub-worm cycle, and the move that brings the system back from C to the configuration C has to be selected in a completely analogous way as the move from C to C , so that

(3.27)
In total, p s C → C is given by which is however not very useful, as also the configuration weights will be qualitatively different for the two cases b =b 0 and b =b 0 , so that we will explain the corresponding reduced transition probabilities separately. Let us abbreviatẽ x,ν , ν > 0 l a b y,|ν| , ν < 0 , (3.29) and ∆ = k + 1 − k .

(3.30)
Then we find for the reduced transition probability P r C → C = min(1, r), where (3.31) in which case, if the transition is accepted, we set a = b and C = C , 24 • and if b =b 0 : (3.32) and if the transition is accepted, we set x = y and C = C .

Intermediate and Final Sub-Worm Cycle Moves
If a sub-worm cycle has successfully started, then C is an intermediate configuration which contains three external fields, and the next move will therefore be either of type (e). or type (g). from the guide in Sec. 3.3. In any case, the only possible moves correspond to (N − 1) different choices for the internal space index b ∈ {1, . . . , N } \ {a}, so that the move choice probability is given by (3.33) As before, C refers to the configuration that is obtained if the selected move is carried out. The move-choice probability for the inverse move is given by where the additional factors come from taking into account that if b =b 0 or b =ã 0 , then C will no longer be an intermediate sub-worm cycle configuration like C, and therefore a new sub-worm cycle would have to be started either from x or from y, in order to get back to C. Defining againk   with: (3.38) in which case, if the transition is accepted, we set a = b and C = C , and if the transition is accepted, we set x = y and C = C , • and finally if b =ã 0 : (3.40) where, if the transition is accepted, we set y = x and C = C .

End of an ISSW Update
If C is a configuration that contains a source/sink-pair where both external fields are located on the same site, so that x = x 0 , it is proposed with probability p t that the pair is removed from the system and the ISSW update therefore terminates (step (h). from the step-by-step description in Sec. 3.3). This is just the inverse move from the one described in Sec. 3.5.1, and the reduced transition probability is therefore given by

(3.41)
If the move is accepted, the system is again in a configuration that contributes to the ordinary partition function Z Q , defined above in (2.8).

Results
Here we present the results for some tests that we applied to our new algorithm, as well as a discussion of its efficiency. Although our algorithm works in arbitrary dimensions, all results presented here correspond to the (1 + 1) dimensional case.

Crosscheck of Code
In order to test the correctness of our internal space sub-worm algorithm, we applied it to the dual formulation (2.12) of the auxiliary U(1) version of the CP N −1 model for N = 10, and reproduced some of the results for E, ξ G and χ m (see (2.27), (2.26), (2.25) respectively), presented in [16], where an over-heatbath algorithm was used to simulate (1.4) in terms of the original configuration variables z(x). The data is shown in Table 1. We also included corresponding results for the ordinary worm algorithm, applied to the alternative dual formulation (2.17), that was first proposed in [15]. As can be seen, the results for all three algorithms agree within error bounds (one-sigma).  In Figures 4 and 5 we show the average energy (2.27) and the corresponding specific heat (2.28) as a function of β/N for different values of N in a (1 + 1) dimensional system of volume V = 12 2 , together with the analytic strong and weak coupling expansions provided in [6]. Figure 4 shows the results for the quartic version Z Q from (2.8) and figure 5 shows the ones for the auxiliary U(1) version Z A from (2.12). In both cases, the numerical data nicely interpolates between the strong and weak coupling predictions (and perfectly matches them in the corresponding regions). Note the dramatic peak in the specific heat with the quartic version Z Q .

Efficiency
A measure of the efficiency of an algorithm is given by its critical dynamical exponent, usually called z. This z is observable-dependent and tells one how the integrated auto-correlation time τ int of this observable scales as a function of the correlation length ξ G , where one assumes a dependency of the form τ int ∝ ξ z G . For local, Metropolis-type algorithms one usually finds z ∼ 2. A value of z > 0 means that the algorithm suffers from critical slowing down, i.e. if one changes the lattice size while keeping the physical size of the system fixed, the computational cost for achieving a predefined accuracy for the measurements will not just depend linearly on the lattice volume V = L d (∝ number of degrees of freedom), but grow even faster like ∝ L d+z . However, as long as z 1 the slowing down is usually considered to be weak.
In Fig. 6 we compare the dynamical critical exponents z for the three observables E, ξ G and χ m for the internal space sub-worm algorithm, the ordinary worm algorithm and the over-heat bath algorithm from [16] for the 10 -2.
10 -1.   The dashed and dotted lines correspond to the analytic strong and weak coupling results respectively [6], which show again excellent agreement with the numerical data generated with our internal space sub-worm algorithm (also here, the partition function (2.17) leads to exactly the same results with the algorithm from Sec. 3.2, applied to the CP N −1 partition function of [15]).
were obtained from the data in Table 1, knowing that for an observable O, If one now divides both sides of (4.1) by O 2 in order to get rid of the explicit dependency of O on the lattice volume, and assumes that for fixed L/ξ G , the resulting σ 2 independent of the lattice size for sufficiently large lattices, one finds that whereÑ m is the effective number of measurements, which for non-local observables (like ξ G and χ m ) is equal to N m but for local observables (like E) picks up an extra volume-factor, i.e.Ñ m = V · N m . As can be seen, the three algorithms give rise to very similar critical exponents for the three observables under consideration.  Figure 6: Log-log plots of ∼ τ int vs. ξ G at fixed L/ξ G ≈ 15 for the three observables E, ξ G and χ m and the three different algorithms from Table 1 for the CP 9 model. The τ int values for the different observables are re-scaled by arbitrary constants, to fit in a common figure. The straight lines correspond to fits of the form τ int ∼ ξ z G , where z is the dynamical critical exponent. The left-hand figure shows the data for our ISSW algorithm applied to Z A from (2.12), together with the data for the over-heat bath algorithm from [16]. The right-hand figure shows the data for the ordinary worm algorithm applied toZ A from [15], again together with the over-heat bath results from [16]. All three algorithms show similar behavior.
We also determined the integrated auto-correlation time τ int directly for the average energy (2.27), the magnetic susceptibility (2.25) and the charge densities (2.29) (as long as all µ i are set to zero, all charge densities are equivalent), for the two systems described by (2.8) and (2.12) when updated with the ISSW algorithm. The results are shown in Fig. 7 and Fig. 8, respectively, as functions of ξ G /L, where L is the linear system size (here L = 72) and for N = 4, 10. The auto-correlation times are given in units of sweeps, where we define a sweep, to consist of a fixed number of worms, so that the average number of local updates that are processed during these worms, equals the number of degrees of freedom in the system (in our case #d.o.f = d N 2 V , where d is the number of space-time dimensions, N is the number of flavors and V the system volume). As can be seen, for both discretizations of the CP N −1 model, the maximal integrated auto-correlation time τ int for both, the average energy and the average charge density, increases with increasing N , while for χ m , it decreases. However, for all three quantities, the value of ξ G /L, at which this maximum in τ int occurs, decreases with increasing N . functions of ξ G /L, but this time for fixed N = 4 but three different volumes V = 36 2 , 72 2 , 144 2 . As the Monte Carlo dynamics of the two systems described by Z Q and Z A from (2.8) and (2.12), respectively, are very similar when using the ISSW algorithm, we show in Fig. 9 just the results for Z Q and in Fig. 10 for comparison the corresponding results forZ Q from (2.20), simulated with the ordinary worm algorithm. As was already the case with Z A and Z A in Fig. 6, the ISSW and the ordinary worm algorithm also perform very similarly when applied to Z Q andZ Q , respectively. The figures also illustrate, that the value of z can strongly depend on the value of ξ G /L at which z is determined. Figure 11 is similar to Fig. 10 but contains also data for N = 10, 20, showing that forZ Q , the dependency of τ int on N is similar to what is shown in Fig. 7 for the case of Z Q .
For one quantity the behavior of the integrated auto-correlation time, as shown in Figs. 8-11, is a bit puzzling: for the magnetic susceptibility χ m , τ int initially grows with increasing ξ G /L, but then decreases again quite fast, well below ξ G /L ∼ 0.25 where finite size effects should start to become dominant. The same would happen with the integrated auto- correlation time for the correlation length ξ G . The reason is that these two quantities are defined in terms of the two-point function (3.6), and in contrast to the average energy or the charge density, which are just averages of properties of individual configurations that contribute to the partition function Z, an element φ a b (x)φ b a (y) of the two-point function is not simply the average of a property of configurations that contribute to Z, but rather a kind of reweighting factor between configurations that contribute to Z and configurations that contribute to Z a b 2 (x, y) (partition function for the system in the presence of an external source φ a b at x and an external sink φ b a at y): it is related to, how "expensive" the changes required to configurations contributing to Z are, in order to incorporate the external source-sink pair at x and y. The problem is now that within our dual formalism, due to the strictly imposed conservation laws (see last part of Sec. 2.2), we cannot simply measure these reweighting factors on individual configurations that contribute to Z, but instead have to determine them stochastically by measuring the average frequency by which the worm algorithm reaches a configuration that contributes to Z 2 (x, y), when starting in a configuration that contributes to Z. This leads to some additional statistical noise in the measurements of the two-point function which has nothing to do with the true de-correlation rate for the observable. As long as the number of worms per sweep is large (as is the case for small β where the correlation length on the lattice is small), the effect of this additional statistical noise is small, but as soon as the average worm length increases (together with the lattice correlation length), the number of worms that are required to process a sweep (as defined above) decreases and with it the statistics and therefore the accuracy for individual measurements of the twopoint function. It then becomes impossible to determine the true integrated auto-correlation time for observables that depend on the two-point function by summing up the correlations between different measurements. On can also not just increase the number of worms per sweep in order to reduce the statistical noise in the individual measurements; as the worms do not just sample the two-point function but also update the system, this would also  Fig. 9. Finally, the second moment correlation length ξ G (bottom-right) is identical to that obtained with Z Q , as should be case since ξ G is a physical, algorithm-independent quantity.
increase the number of sweeps between measurements and make it impossible to measure auto-correlation times which are shorter than this number of sweeps between measurements.
Another observable that would be of great interest is the topological susceptibility. In Monte Carlo simulations of the CP N −1 model in terms of the standard configuration variables z(x), the system can tunnel only slowly between different topological sectors, which causes very long auto-correlation times for the topological charge and the topological susceptibility [16]. As in the dualization process that lead us to the flux-variable partition functions (2.12), (2.8), (2.17) and (2.20), the z-fields are integrated out analytically in order to obtain the weights for flux-variable configurations, and this integration runs over all possible configurations and covers therefore also all possible topological sectors, every single configuration in terms of flux-variables contains already contributions from all possible topological sectors. A tunneling between different sectors is therefore no longer necessary, which is why we think that if one could incorporate the measurement of the topological charge and topological susceptibility into our dual simulations, critical slowing down should also be absent for these observables. Unfortunately, such measurements are quite involved   in our dual framework: as shown in appendix A, in order to measure the topological charge and its susceptibility, one has to introduce new plaquette degrees of freedom which then also enter the constraints imposed on the k-variables. This would give rise to interesting effects, but as the weights for the plaquette variables can be negative [15], one has to deal with a sign problem if one wants to sample the plaquette variables by Monte Carlo.
Also Wilson and Polyakov loops of the local U(1) gauge field modify the constraints imposed on the k-variables, but they do not require the introduction of additional degrees of freedom. They can be defined as ordinary observables that are measured on closed-worm configurations (c.f. [19] to see how Wilson loops can be defined in terms of the original configuration variables z(x)).

Large N Limit for Z Q
As already mentioned at the end of Sec. 1.2, the authors of [7] found that in the limit (N → ∞), the quartic action version of the lattice CP N −1 , i.e. the one that in terms of our dual variables is described by Z Q from (2.8), develops in (1 + 1) dimensions a first order transition 34 between the strong (β β cr ) and the weak coupling phase (β β cr ), where β cr /N ≈ 0.956 (see [7,Sec. 4]).  Figure 13: The figures show the energy density distribution at the "pseudo-critical" point (β/N near 1) for the transition between the strong and the weak coupling phase for a (1+1)-dimensional system, described by the partition function (2.8) (quartic action). In the left-hand figure, the number N of flavors is set to N = 64 and the energy density is plotted for four different volumes between V = 4 2 and V = 12 2 . The dotted black lines correspond to double-Gaussian fits. For V < 12 2 , the energy densities show a clear double-peak structure while for V = 12 2 , the two peaks start to merge. For even larger system sizes, the peaks would become indistinguishable. The right-hand figure shows the energy density for two systems, both of size V = 12 2 but for different flavor numbers N = 64 (yellow, same as yellow curve in the left-hand figure) and N = 128 (gray). As can be seen, keeping the volume fixed but increasing the number of flavors enhances again the double-peak structure.
Also in our simulations of the system described by (2.8), the "transition" between the weak and strong coupling regime becomes more pronounced when the number N of flavors is increased, as can be seen in the left-hand part of Fig. 4, where the average energy is shown as a function of β/N for a (1 + 1) dimensional system of size V = 12 2 for different N . However, if one keeps N fixed and instead varies the system size, it turns out that the 35 "transition" becomes smoother if the system size is increased (see Figs. 12 and 13). Thus, the large N , thermodynamic behavior depends on the way the two limits are taken. A possible explanation for this behavior can be found by writing the Boltzmann factor in the following form: In the strong coupling phase, β is small and the entropy of the configuration variables dominates over the Boltzmann factor, no matter what the value of (4.3) is. But with increasing β, the variation of the Boltzmann factor as a function of the configuration variables becomes more and more relevant and configurations which minimize the action become favored as soon as β is large enough so that the Boltzmann factor can dominate over entropy for these configurations. In contrast to simple spin models, where the value of β at which this change of dominance between entropy and Boltzmann weight happens marks the pseudo-critical point at which the system develops long range order (because the spatially ordered configurations are the ones that minimize the Euclidean lattice action), for the CP N −1 model (with quartic action) this is not necessarily the case. This can be seen by noting that there are two different (naive) ways to maximize the cosines in (4.3) : i) the first option is the usual one, where for each link (x, ν) and for each internal space index a the angles φ a x and φ a x+ ν are "in phase", so that all cosines in (4.3) assume the value 1. In this case, the local U(1) gauge-invariance gets "broken" and the system could develop true long-range order, provided the r-variables get ordered as well.
ii) The second possibility is to have on each site x all the angles φ 1 x , φ 2 x , φ 3 x , . . . in phase, so that again all the cosines in (4.3) assume the value 1. This time however, the local U(1) gauge-invariance is preserved as φ-angles that correspond to different sites are still unrelated and there will be no true long-range order even if the r-variables get ordered. Depending on the system size and on the number of flavors N , it is either the "local internal space ordering" described in ii) or the "real-space ordering" from i) which is cheaper, i.e. which requires a smaller change in entropy (coming from the reduction of the effective configuration space due to the ordering): for a small system with large N , the option i) will be cheaper in which the U(1) gauge-invariance is broken (which is consistent with the finding in [7]), while for a large system with smaller N , the system will chose option ii), which preserves U(1) gauge-invariance. As for any fixed N , the transition between the strong and weak coupling region becomes smoother with increasing system size, it will not become a first order phase transition in the thermodynamic limit, so the infinite volume and infinite N limits do not commute.
Of course i) and ii) are two extremes: mixtures of the two orderings are also possible and in option i), some permutations in the choice of the pairs (a, b) for which φ a x is associated to φ b x+ ν are allowed. But this heuristic description motivates our construction of a Monte 36 Carlo algorithm decorrelating both types of order.
It should also be noted that the partition function, as it is a path integral, is always a sum over all possible orientations of the complex vectors z(x) on each site x. By saying that the system puts some of the φ-angles "in phase", we mean that configurations for which these angles are the same, give the dominant contribution to the integral over the φ-angles. 10 -2.
10 -1. Figure 14: The figure shows again the average energy (top-left) and specific heat (top-right) for the CP N −1 model described by (2.8) (quartic action) in (1 + 1) dimensions, with N = 64 and for system sizes V ∈ 4 2 , 6 2 , 8 2 , 12 2 , 24 2 , 48 2 , just as in Fig. 12, but this time we also show the magnetic susceptibility χ m (bottom-left) from (2.25) and the trace of the charge-density covariance matrix (bottom-right) from (2.30). As can be seen, the abrupt changes in the average energy and the magnetic susceptibility, as well as the peak in the specific heat always occur at β/N < 1 for all system sizes. For the trace of the covariance matrix of the charge densities, this is only true for V ≤ 12 2 , while for V = 24 2 , 48 2 there is no longer an abrupt change, and it becomes non-zero only for β/N > 1.
Another thing to be mentioned is, that in the cases where the double-peak structure can be observed in the energy density, i.e. for small volumes and large N , the trace of the chargedensity covariance matrix (2.30) becomes non-zero at the same value of β/N at which the peak in the specific heat occurs, while as soon as the double-peak structure disappears for sufficiently large volumes, the trace of the charge density covariance matrix becomes nonzero only at a larger value of β/N (see Fig. 14), and the value of β/N at which this happens increases even further with increasing volume. The reason why this is interesting is, that the trace of the charge density covariance matrix can be interpreted as an order parameter for 37 the breaking of the global Z 2 symmetry. As this quantity depends only on the k-variables directly, this indicates that the peak in the specific heat at β/N 1 is caused purely by a reordering of the l-variables: for example by a breaking of the global Z N symmetry in the form of a condensation of the l a a -variables for some a (on which also the magnetic susceptibility depends through the diagonal entries of the correlator). But this is so far only speculation and needs to be verified in the future. Also the meaning of these discrete symmetries should be clarified (remember that the internal space indices are related to, but not the same as flavor-space indices, so the breaking of these symmetries should not be in contradiction with the Mermin-Wagner theorem, which would forbid a spontaneous breaking of a subgroup of the continuous SU(N ) flavor-symmetry group). For comparison, Fig. 15 shows the same data for the auxiliary U(1) version of the CP N −1 partition function from (2.12) for which no jump in the energy density occurs. Clearly, the auxiliary U(1) discretization shows a smoother approach to the continuum limit, and should be preferred over Z Q for that purpose.

Summary & Conclusion
We looked at the two most common lattice formulations of the CP N −1 model (referred to as quartic and auxiliary U(1) version, respectively), and reviewed two possibilities how the corresponding partition functions can be expressed in terms of integer valued (dual) fluxvariables by integrating out the original degrees of freedom. The two possibilities to dualize the CP N −1 partition functions differ by the number of independent degrees of freedom that are used to parametrize different configurations in the dual partition function: the first possibility [12] yields a system of configurations depending on O N 2 independent flux variables per link, while configurations of the system obtained by the second possibility depend on just O 2 N flux variables per link [15]. It turns out that in terms of both sets of flux-variables, the partition functions for the quartic and the auxiliary U(1) versions of the CP N −1 model differ only by an extra weight factor for each link. After having discussed the relation between the constraints in the O N 2 and the O 2 N flux-variables per link versions of the CP n−1 partition function, which can be associated with conservation laws, we then found that not just the version from [15] but all four flux-variable representations (with/without U(1) field, with O 2 N or O N 2 flux variables) allow for the introduction of chemical potentials without giving rise to a sign problem.
It has previously been observed in [11] that a naive application of a worm algorithm to the flux-variable formulation of the CP N −1 model form [12] with O N 2 d.o.f. per link, gives rise to an ergodicity problem when N > 2. The problem can be solved by extending the ordinary worm algorithm by additional moves, which allow the worm to propagate a defect not just from site to site, but also to temporarily introduce another defect which can be used to move in internal space. This additional freedom allows the worm to take shortcuts in configuration space and directly relate configurations, which would require many intermediate updates in order to be connected by an ordinary worm. This is the basic idea underlying our internal space sub-worm algorithm. As to our knowledge, no algorithm has been tested so far for the O 2 N d.o.f. per link formulation of the CP N −1 model from [15], we also presented a simulation algorithm for this system. Due to the simpler structure of the constraints imposed on the flux-variables, an ordinary worm algorithm is sufficient in this case. Both, the ordinary worm for the O 2 N and the internal space sub-worm algorithm for the O N 2 d.o.f. per link version of the CP N −1 partition function, have been tested and yield identical results, which also compare well with predictions from strong and weak coupling expansions as well as with previous results from the literature. It also turns out that both algorithms seem to perform equivalently well in the sense that they yield equal accuracy for equal statistics (number of sweeps). However, as for N > 2, the number of degrees of freedom that have to be updated during a sweep is larger for the system that is updated by the internal space sub-worm algorithm, the dual formulation of the CP N −1 model from [15] may in general be cheaper to simulate. Nevertheless, the internal space sub-worm algorithm could still be of interest, as it might also be applicable to other systems where no alternative formulation in terms of fewer degrees of freedom exists. Finally it should be mentioned that the two worm algorithms that have been discussed in this paper, unfortunately, do not seem to perform markedly better than the over-heat bath [8], the cluster [9] or the loop algorithm [13]. Finally, one advantage of the worm algo-39 m x,µ ν , m y,µ ν or on both. So far we only tried to do this summation numerically, which is computationally rather expensive.
In the strong coupling limit (β = 0) however, we have that k a x,ν = 0 ∀x, ν, a in (A.11) and therefore all m x,ν µ have to be equal, which for Θ = 0 in fact means, that m x,ν µ = 0 ∀x, ν, µ. From (A.13) and (A.17) one can then directly read off that which is the correct strong coupling result.
x = y x y x y x y Figure 16: Due to the delta-function on the first line of (A.11), a change in a plaquette variable m x,ν µ requires that either also its neighboring plaquettes get changed, or that for an unchanged neighboring plaquette, the k-variable that lives on the boundary to that plaquette gets updated. As for vanishinḡ Θ, the m x,ν µ can be non-zero only if the corresponding plaquette has been hit by a derivative in (A.13), one always has to update also k-variables. The four cases depicted in the figure give rise to different dependencies of the k-variables on changes of the values of the plaquette variables.
For the dual formulations (2.8) and (2.12) it is more involved to incorporate such a topological term: we have seen above that the plaquette variables that comes from the topological term couple to the k-variables by modifying the on-link constraints. However, for the dual formulations (2.8) and (2.12), these constraints are automatically satisfied due to the antisymmetry of the k a b x,ν variables with respect to their internal space indices (a, b). This makes it impossible to incorporate the topological term (A.2) in a similar way for these partition functions. Instead one could start with the definition (A.5) of the topological charge den-sityQ x,µ ν . This will give rise to plaquette variables which also carry internal space indices which are then compatible with the anti-symmetry of the k-variables.