Affine Symmetries and Neural Network Identifiability

We address the following question of neural network identifiability: Suppose we are given a function $f:\mathbb{R}^m\to\mathbb{R}^n$ and a nonlinearity $\rho$. Can we specify the architecture, weights, and biases of all feed-forward neural networks with respect to $\rho$ giving rise to $f$? Existing literature on the subject suggests that the answer should be yes, provided we are only concerned with finding networks that satisfy certain"genericity conditions". Moreover, the identified networks are mutually related by symmetries of the nonlinearity. For instance, the $\tanh$ function is odd, and so flipping the signs of the incoming and outgoing weights of a neuron does not change the output map of the network. The results known hitherto, however, apply either to single-layer networks, or to networks satisfying specific structural assumptions (such as full connectivity), as well as to specific nonlinearities. In an effort to answer the identifiability question in greater generality, we consider arbitrary nonlinearities with potentially complicated affine symmetries, and we show that the symmetries can be used to find a rich set of networks giving rise to the same function $f$. The set obtained in this manner is, in fact, exhaustive (i.e., it contains all networks giving rise to $f$) unless there exists a network $\mathcal{A}$"with no internal symmetries"giving rise to the identically zero function. This result can thus be interpreted as an analog of the rank-nullity theorem for linear operators. We furthermore exhibit a class of"$\tanh$-type"nonlinearities (including the tanh function itself) for which such a network $\mathcal{A}$ does not exist, thereby solving the identifiability question for these nonlinearities in full generality. Finally, we show that this class contains nonlinearities with arbitrarily complicated symmetries.


A. Background and previous work
Deep neural network learning has become a highly successful machine learning method employed in a wide range of applications such as optical character recognition [1], image classification [2], speech recognition [3], and generative models [4]. Neural networks are typically defined as concatenations of affine maps between finite dimensional spaces and nonlinearities applied elementwise, and are often studied as mathematical objects in their own right, for instance in approximation theory [5], [6], [7], [8] and in control theory [9], [10].
In data-driven applications [11], [12] the parameters of a neural network (i.e., the coefficients of the network's affine maps) need to be learned based on training data. In many cases, however, there exist multiple networks with different parameters, or even different architectures, giving rise to the same input-output map on the training set. These networks might differ, however, in terms of their generalization performance. In fact, even if several networks with differing architectures realize the same map on the entire domain, some of them might be easier to arrive at through training than others.
It is therefore of interest to understand the ways in which a given function can be parametrized as arXiv:2006.11727v1 [cs.IT] 21 Jun 2020 a neural network. Specifically, we ask the following question of identifiability: Suppose that we are given a function f : R m → R n and a nonlinearity ρ. Can we specify the network architecture, weights, and biases of all feed-forward neural networks with respect to ρ realizing f ? For the special case of the tanh nonlinearity, this question was first addressed in [13] for single-layer networks, and in [14] for multi-layer networks satisfying certain "genericity conditions" on the architecture, weights, and biases. The identifiability question for single-layer networks with nonlinearities satisfying the so-called "independence property" (this property -formalized in Definition 1 -corresponds to the absence of non-trivial affine symmetries) was solved in [15], whereas the recent paper [16] reports the first known identifiability result for multi-layer networks with minimal conditions on architecture, weights, and biases, albeit with artificial nonlinearities designed to be "highly asymmetric". We also remark that the identifiability of recurrent single-layer networks was considered in [9] and [10].
It is important to note that all aforementioned results, as well as the results in the present paper, are concerned with the identifiability of networks given knowledge of the function f on its entire domain. This corresponds to characterizing the fundamental limit on nonuniqueness in neural network representation of functions. Specifically, the nonuniqueness can only be richer if we are interested in networks that realize f on a proper subset of R m , such as a finite (training) sample {x 1 , . . . , x m } ⊂ R m . Moreover, we do not address neural network reconstruction, i.e., we do not provide a procedure for constructing an instance of a network realizing a given function f , but rather focus on building a theory that systematically describes how the neural networks realizing f relate to one another. We do this in full generality for networks with "tanh-type" nonlinearities (including the tanh function itself), settling an open problem posed by Fefferman in [14].
Recent results on neural network reconstruction on samples can be found in [17], [18] for shallow networks and in [19] for ReLU networks of arbitrary depth.

B. Affine symmetries as a template for neural network nonuniqueness
In order to develop intuition on the identifiability of general neural networks, we follow [13] and [15] and begin by considering single-layer networks. To this end, let ρ : R → R be a nonlinearity, and let N ρ := n p=1 λ p ρ(ω p · + θ p ) + λ and N ρ = n p=1 λ p ρ(ω p · + θ p ) + λ (1) be the maps realized by the single-layer networks N and N , both with nonlinearity ρ. Suppose that these networks realize the same function, i.e., n p=1 λ p ρ(ω p t + θ p ) + λ = n p=1 λ p ρ(ω p t + θ p ) + λ , for all t ∈ R. This is equivalent to the following linear dependency relation between the constant function 1 : R → R taking on the value 1 and affinely transformed copies of ρ: n p=1 λ p ρ(ω p t + θ p ) − n p=1 λ p ρ(ω p t + θ p ) = (λ − λ)1(t), for all t ∈ R.
For a more intricate example, consider the clipped rectified linear unit (CReLU) nonlinearity given by ρ c (t) = min{1, max{0, t}}, and note that corresponds to a single-layer network with three neurons mapping every input to output 0. This can be rewritten as ρ c (t) = 1 2 ρ c (2t) + 1 2 ρ c (2t − 1) and applied recursively to yield N n ρc := n p=1 2 −p ρ c (2 p · −1) + 2 −n ρ c (2 n ·) = ρ c , for all n ∈ N. In other words, we have effectively used the three-neuron network (3) to repeatedly replace single nodes with pairs of nodes without changing the function realized by the network, thereby constructing an infinite collection of different networks, all satisfying N n ρc = ρ c .
In summary, we see that, at least for single-layer networks, non-uniqueness in the realization of a function arises from affine symmetries of the nonlinearity, where the symmetries are none other than single-layer networks mapping every input to output 0. Namely, these "zero networks" can be used as templates for modifying the structure of (more complex) networks without affecting the function they realize. This motivates the following definition.
Definition 1 (Nonlinearity and affine symmetry). A nonlinearity is a continuous function ρ : R → R such that ρ = {t → at + b : t ∈ R}, for all a, b ∈ R. Let ρ : R → R be a nonlinearity and I a finite index set. An affine symmetry of ρ is a collection of real numbers of the form (ζ, {(α s , β s , γ s )} s∈I ) such that, and (ii) there does not exist a proper subset I of I such that {ρ(β s · + γ s ) : s ∈ I } ∪ {1} is a linearly dependent set of functions from R to R.
Note that every nonlinearity ρ satisfies ρ+(−ρ) = 0, and hence possesses at least the "trivial affine symmetries" (0,{(α, β, γ),(−α,β,γ)}), for α, β ∈ R \ {0} and γ ∈ R. We remark that Definition 1 is more general than what is needed to cover our examples above, as ζ in (4) is allowed to be an arbitrary real number, whereas we had ζ = 0 in both of our examples. One can, of course, seek to build a theory encompassing even more general symmetries, e.g. those for which the right-hand side of (4) is an affine function t → ζ 1 + ζ 2 t (which, in the context of ρ-modification introduced later, could then be absorbed into the next layer of the network). This is, however, outside the scope of the present paper.

C. Formalizing the identifiability question
Our aim is to generalize the aforementioned correspondence between the non-uniqueness in the neural network realization of functions and the affine symmetries of the underlying nonlinearity ρ to multi-layer networks of arbitrary architecture in a canonical fashion, i.e., without regard to the "fine properties" of ρ beyond its affine symmetries. Specifically, we derive conditions under which the set of networks giving rise to a fixed f and derived from the affine symmetries of ρ through "symmetry modification" is exhaustive (i.e., it contains all networks giving rise to f ). These conditions are formally characterized by our null-net theorems (Theorem 1 and Theorem 2). The concept of symmetry modification will be formalized in the following sections, and in the case of ρ = tanh corresponds to using tanh(t) = − tanh(−t) to flip the signs of weights and biases in the network, or in the case ρ = ρ c using ρ c (t) = 1 2 ρ c (2t) + 1 2 ρ c (2t − 1) to replace single nodes with pairs of nodes in the network.
In order to streamline the extension of the discussion above to multi-layer networks and to facilitate the comparison of our results with previous work, it will be opportune to immediately introduce neural networks in their full generality, i.e., as "computational graphs". To this end, we need to recall the definition of a directed acyclic graph, as well as several associated concepts that will be needed later.
Definition 2 (Directed acyclic graph, parent and ancestor set, input nodes, and node level).
-A directed graph is an ordered pair G = (V, E) with V a nonempty finite set of nodes and : v ∈ V } a set of directed edges. We interpret an edge (v, v) as an arrow connecting the nodes v and v and pointing at v.
-A directed graph G is said to be a directed acyclic graph (DAG) if it has no directed cycles.
Let G = (V, E) be a DAG.
-We define the parent set of a node by par( v) = {v : (v, v) ∈ E}.
-For a set W ⊂ V we define par 0 (W ) = W and par r (W ) = s∈W par r−1 (par(s)), for r ≥ 1. We thus let the ancestor set of W be anc(W ) = r≥0 par r (W ).
-We say that v ∈ V is an input node if par(v) = ∅, and we write In(G) for the set of input nodes.
As the graph G in Definition 2 is assumed to be acyclic, the level is well-defined for all nodes of G.
We are now ready to introduce our general definition of a neural network. The role of the output scalars is to form D affine combinations of the functions realized by the output nodes, which are then designated as the coordinates of the D-dimensional output function of the network. Note that this renders the definition of the function realized by a network more general than directly taking the functions realized by the output nodes to be the output of the network.
Formally, we have the following.
Definition 4 (Output maps). Let N = (V, E, V in , V out , Ω, Θ, Λ) be a GFNN with D-dimensional output, and let ρ : R → R be a nonlinearity. The map realized by a node u ∈ V under ρ is the function u ρ : R Vin → R defined recursively as follows: -If u ∈ V in , set u ρ (t) = t u , for all t = (t u ) u∈Vin ∈ R Vin .
-Otherwise, set u ρ (t) = ρ v∈par(u) ω uv · v ρ (t) + θ u , for all t ∈ R Vin . The map realized by N under ρ is the function N ρ : R Vin → R D given by When dealing with several networks N i we will write u ρ, Ni for the map realized by u in N i , to avoid ambiguity.
We will treat nodes u ∈ V only as "handles", and never as variables or functions. This is relevant when dealing with multiple networks that have shared nodes, as in the example depicted in Figure   1. On the other hand, the map u ρ realized by u is a function. We remark that Definitions 3 and 4 are largely analogous to [16,Defs. 8,11], save for the output scalars Λ that do not feature in [16].
Note that LFNNs are similar to feed-forward neural networks as widely studied in the literature, namely as concatenations of affine maps between finite dimensional spaces and elementwise application of nonlinearities. Our definition of LFNNs is, however, somewhat more general, in the sense of the map of the network being allowed to depend directly on "non-final" nodes. An example of such an LFNN is N 1 in Figure 1. Further still, GFNNs are more general than LFNNs, and allow for "skip connections" within the network itself. For an example of a GFNN that is not layered, see Figure 2.
In order to meaningfully discuss the identifiability of GFNNs from their ouput maps, it is necessary that the networks under consideration have no spurious nodes, i.e., nodes that are "invisible" to the map of the network. Formally, we will require that GFNNs satisfy the following non-degeneracy property: Note that N 1 and N 2 share the nodes v 1 , v 2 , and u, even though the functions u ρ, N1 and u ρ, N2 may be "completely unrelated". that N is not layered as lv(u 1 ) = 2 = 1 = lv(v 2 ) + 1. This network is also degenerate due to the presence of the node u 2 which does not contribute to the map realized by N .
Networks that are not non-degenerate are referred to as degenerate.
Informally, a network is non-degenerate if its every non-input node "leads up" to at least one output node, and each output node contributes to at least one of the D coordinates of the map realized by N . For example, the network N in Figure 2 is degenerate. Note that non-degenerate networks are allowed to have input nodes without any outgoing edges. This is useful as we want our theory to encompass networks whose maps are constant relative to some (or all) of the inputs. An extreme but important case are the so-called trivial networks implementing the constant zero function from R Vin to R D .
Note that T Vin, D is the only network with input set V in and D-dimensional output of depth 0.
We are now ready to formalize our notion of neural network identifiability.
Definition 7 (Identifiability). For given V in and D ∈ N, let N be a set of non-degenerate GFNNs with D-dimensional output and input set V in . Let ρ be a nonlinearity, and suppose that ∼ is an equivalence relation on N such that We say that The equivalence relation ∼ thus models the "degree of nonuniqueness" of networks with nonlinearity ρ, in the sense that the relation ∼ partitions N into equivalence classes containing networks realizing the same map. Conversely, by saying that (N , ρ) is identifiable up to ∼, we mean that the equivalence class of networks realizing a given function can be inferred from the function itself. A trivial example of such a relation is the equality relation, i.e., N 1 ∼ N 2 if and only if N 1 = N 2 . We saw in the introduction, however, that networks realizing a given function are not unique in the presence of non-trivial affine symmetries of ρ, and therefore in such cases N is not identifiable up to equality.
On the other hand, we could define an equivalence relation ∼ on N by setting N 1 ∼ N 2 if and only if N 1 ρ = N 2 ρ . Then N is, of course, identifiable up to ∼, but the relation ∼ defined in this way is not at all informative about the relationship between the structures of the networks realizing a given map. We are therefore interested in specifying the relation ∼ in Definition 7 in terms of the architecture, weights, and biases of the networks in N in an explicit fashion, and we would ideally like to do so for as large a class N of networks as possible.
To make further headway in our understanding of how ∼ could manifest itself for concrete nonlinearities and multi-layer networks, we again consider the case ρ = tanh. Let N = (V, E, V in , V out , Ω, Θ, Λ) and N = (V , E , V in , V out , Ω , Θ , Λ ) be non-degenerate GFNNs with D-dimensional output and the same input set V in . Suppose that there exist a bijection π : We will then say that N and N are isomorphic up to sign changes, and write N ∼ ± N . Owing to tanh(t) = − tanh(−t), we have N tanh = N tanh whenever N and N are isomorphic up to sign changes. The following question is thus natural: For which classes N is (N , tanh) identifiable up to ∼ ± ? This question was treated in the seminal paper by Fefferman [14], who showed that (N Vin,D Feff , tanh) is identifiable up to ∼ ± , where N Vin,D Feff is the set of non-degenerate LFNNs N = (V, E, V in , V out , Ω, Θ, Λ) with D-dimensional output and input set V in satisfying the following structural conditions: . . , D}, and V out can be enumerated as V out = {w 1 , . . . , w D } so that λ (r) wj = δ jr , for j, r ∈ {1, . . . , D}, where δ jr denotes the Kronecker delta, as well as the following genericity conditions on the weights and biases: (F4) θ u = 0 and θ u = θ u , for all u, u ∈ V \ V in such that u = u and lv(u) = lv( u), and (F5) for all ∈ {1, . . . , L(N )} and all u, u, v ∈ V such that lv(v) = − 1, lv(u) = lv( u) = , and u = u, where D = #{u ∈ V : lv(u ) = } is the number of nodes in the -th layer.
Fefferman's proof of the identifiability of (N Vin,D Feff , tanh) up to ∼ ± is significant as it is the first known identification result for multi-layer networks. The proof is effected by the insight that the architecture, the weights, and the biases of a network N ∈ N Vin,D with minimal hypotheses for layered networks. In the present paper, we address this issue and fully resolve the question of identifiability up to ∼ ± for GFNNs (and thus, in particular, for LFNNs) with the tanh-nonlinearity.

A. Canonical symmetry-induced isomorphisms and the null-net theorems
We saw in the introduction how the symmetry tanh( · ) + tanh(− · ) = 0 of tanh leads to the equivalence relation ∼ ± . By the same token, we will next show how the affine symmetries of a general nonlinearity ρ lead to a canonical equivalence relation ∼ among GFNNs. We begin by reconsidering the CReLU nonlinearity ρ c (t) = min{1, max{0, t}}, both for the sake of concreteness, and because this nonlinearity, whilst of simple structure, exhibits all the phenomena we wish to address. We have already seen that the affine symmetry (3) of ρ c leads to infinitely many distinct networks of depth 1 realizing the same map. The same symmetry can lead to structurally different multi-layer networks realizing the same map, as illustrated by the following example. Let N 1 , N 2 , N 3 , and N 4 be GFNNs as given schematically in Figure 3. We then have We now observe that N j+1 ρc = N j ρc , for every j ∈ {1, 2, 3}, and moreover, each of these equalities can be established by performing substitutions of the affine symmetry (3) of ρ c in the formal expressions (6) of the maps N j ρc , j ∈ {1, 2, 3, 4}.
This motivates the concept of ρ-modification (to be formally introduced in Definition 18) of a GFNN N . Suppose that an affine symmetry of ρ can be used to manipulate the formal expression of N ρ as in the example above. Performing this manipulation can then be interpreted as a "structural operation" on N involving three distinct sets of nodes (all with a common parent set): -A, the set of nodes of N to be removed, -B, the set of nodes of N whose outgoing weights and output scalars are to be altered, -C, a set of newly-created nodes to be adjoined to the network.
The resulting GFNN N is called a ρ-modification of N . We note that some of the sets A, B, and C may be empty.
We can thus define an equivalence relation Thus, the networks N 1 and N 4 in the example above, although structurally rather different, are ρ cisomorphic.
A special case of ρ-modification arises if the incoming weights of several neurons U = {u 1 , . . . , u m } of a GFNN N "line up" with an affine symmetry of ρ, allowing for a ρ-modification with strictly fewer nodes than N . More precisely, suppose that the nodes U have the same parent set P , and that there exist nonzero reals {β u } u∈U and is an affine symmetry of ρ. Then, setting Therefore, the set {1, u 1 ρ , . . . , u m ρ } is linearly dependent, and so N admits a ρ-modification hence yielding a network with strictly fewer nodes than N . We call such a ρ-modification a ρ-reduction. A simple example of a ρ-reduction is the tanh-reduction of the single-layer network with the map tanh( · ) + tanh(− · ) to the trivial network. For a more involved example of a ρ-reduction, see Figure 4. A ρ-reduction can, in fact, yield neurons with no incoming edges. In that case, the maps of such neurons are constant, determined only by their biases, and so their values can be "propagated through the network" in the form of bias alteration, and the corresponding "constant" parts of the network can subsequently be deleted. For an example of such a ρ-reduction, see Figure 5.   have u 1 tanh + u 2 tanh = 0 (corresponding to (7)), and thus N tanh = tanh u 1 tanh +2 u 2 tanh + 3 = tanh u 2 tanh + 3 = N tanh , as claimed. . This value can be propagated as a bias alteration, and so the bias 7 in the node u is replaced with 7 + tanh(3). This example also illustrates that it is necessary to allow for input nodes without outgoing edges (the node v 1 in this example) in order for every network N to be ρ-isomorphic to a regular N when reduced to "lowest terms".
Definition 8. We will say that a GFNN is irreducible if it does not admit a ρ-reduction, and if it is both irreducible and non-degenerate, we will say that it is regular.
We remark that trivial networks are vacuously regular. Note that every GFNN N can be reduced to "lowest terms" via a sequence of ρ-reductions, i.e., there exists a regular N such that N Definition 9 (Null-net condition). Let ρ be a nonlinearity and V in a nonempty set of nodes. We say that ρ satisfies the general (respectively layered) null-net condition on V in if the only network A ∈ N Vin, 1 G (respectively A ∈ N Vin, 1 L ) satisfying A ρ = 0 is the trivial network T Vin, 1 .
Definition 9 addresses only networks with one-dimensional output, as one can easily construct identically zero networks with multi-dimensional output from identically-zero networks with onedimensional output, and vice versa. where a ∈ (0, 1), ρ 2 = | · |, and either ρ 3 (t) = max{0, t} or ρ 3 = ρ c . Then the ρ j -regular networks A j with one-dimensional output and input set V in = {v 1 } as depicted in Figure 6 are non-trivial, and yet satisfy A j ρj = 0, for j ∈ {1, 2, 3}. These examples can easily be extended to input sets V in of arbitrary cardinality. Fig. 6: The networks A j are ρ j -regular and satisfy A j ρj = 0, for j ∈ {1, 2, 3}.
For such nonlinearities there exist non-ρ-isomorphic networks realizing the same function, indicating that the identifiability of networks with such nonlinearities is more involved. In particular, "non-affine" symmetries of the nonlinearity would have to be taken into account when characterizing the equivalence relation ρ that is supposed to fully capture the non-uniqueness of networks realizing a given function (where, by analogy with viewing affine symmetries as single-layer zero-output networks, "non-affine" symmetries would correspond to multi-layer zero-output networks such as A 1 , A 2 , and A 3 ).

III. IDENTIFIABILITY FOR THE tanh AND OTHER MEROMORPHIC NONLINEARITIES
A. Single-layer networks with the tanh-nonlinearity and the simple alignment condition Even though both the identifiability of (N Vin,D G , ρ) and the null-net condition are statements quantified over all regular GFNNs (or LFNNs), and in particular over networks of arbitrarily complicated architecture, Theorems 1 and 2 allow us to shift the original question of identifiability of regular networks to a different realm where the problem will be easier to tackle by leveraging the "fine properties" of the nonlinearity. Therefore, our goal will henceforth be to establish suitable sufficient conditions on nonlinearities guaranteeing that the null-net condition holds on all input sets V in .
In order to motivate our results and techniques, we demonstrate informally how the null-net condition is established for the tanh nonlinearity on a singleton input set {v in }, and indicate in the relevant places how this argument extends to more general meromorphic nonlinearities. As the maps realized by networks with 1-dimensional output and input set {v in } are functions of one variable, and are defined in terms of repeated compositions of the meromorphic function tanh and affine combinations, they can be analytically continued to their natural domains in C and can therefore be studied in the context of complex analysis. This approach was pioneered by Fefferman in [14].
Before continuing, we will need a concrete description of irreducibility for the tanh nonlinearity: Concretely, this says that the only affine symmetries of tanh are the "trivial" and the "odd" symmetries. As a result, tanh-modification of a regular network corresponds to either leaving the network intact (if substituting the trivial symmetry), or flipping the signs of the bias and the incoming and outgoing weights of a single neuron (if substituting the odd symmetry).
Going back to establishing the null-net condition for tanh on the input set {v in }, we first consider the single-layer case. Concretely, let N be a regular GFNN with 1-dimensional output, input set {v in }, and L(N ) = 1. Enumerating the non-input nodes of N as {u 1 , . . . , u D1 }, we have where λ (1) uj = 0, for all j ∈ {1, . . . , D 0 }, as N is non-degenerate. We aim to show that N tanh cannot be identically zero. Then, as N tanh can be analytically continued to a meromorphic function on C, it suffices to show that its set of poles P ⊂ D1 is nonempty. To this end, be the set of poles of tanh(ω ujv in · + θ uj ), for j ∈ {1, . . . , D 1 }, and consider the set of indices j for which the functions tanh(ω ujv in · + θ uj ) and tanh(ω u1v in · + θ u1 ) have common poles. Now, assume by way of contradiction that P ∩ j∈J P j = ∅, and set β = max k∈J |ω ukv in | and J max = {j ∈ J : |ω ujv in | = β}.
This establishes that N tanh has a pole p ∈ j∈J P j , which suffices to conclude that N tanh cannot be identically zero. Before proceeding to the multi-layer case, it will be opportune to continue the argument above and prove a stronger statement, namely that the set of poles P of N tanh is unbounded. To this end, write N tanh = λ (1) + f 1 + f 2 , where uj tanh(ω ujv in · + θ uj ) and f 2 := j∈{1,...,D1}\J Note that the sets of poles of f 1 and f 2 are disjoint (as they are respectively subsets of j∈J P j and j∈{1,...,D1}\J P j ), and hence p must be a pole of f 1 . What is more, as ω ujv in /ω u1v in ∈ Q, for all j ∈ J , there exists a T ∈ R such that ω ujv in T /π ∈ Z, for all j ∈ J , and so f 1 is iT -periodic, further implying that p + iT k is a pole of f 1 , for every k ∈ Z. Therefore, P ⊃ {p + iT k : k ∈ Z}, and so P is unbounded. This argument leads to the following alignment condition for the tanh nonlinearity.
Definition 10 (Simple alignment condition). Let σ be a meromorphic nonlinearity on C. We say that σ satisfies the simple alignment condition (SAC) if the following implication holds for all finite sets B. Multi-layer networks with the tanh-nonlinearity and the composite alignment condition We are now ready to proceed to the multi-layer case of our argument establishing the null-net condition for tanh on {v in }. More specifically, we will show how the "nonemptiness of the pole set" property can be extended to multi-layer networks by induction on depth. This will then immediately imply that the maps of these networks cannot be identically zero, establishing the null-net condition for tanh on the singleton input set {v in }. Our discussion will reveal a sufficient condition (the composite alignment condition) for this inductive argument to generalize to arbitrary meromorphic nonlinearities with simple poles only, which, together with the SAC, will allow us to establish the null-net condition for meromorphic nonlinearities more general than tanh.
It will be of interest to consider the maximal domain in C to which the map N tanh of a non-trivial regular GFNN N can be analytically continued. Even though for a general holomorphic function there may not exist a unique maximal set to which it can be analytically continued (consider, for instance, the function z → √ 1 + z 2 ), this is the case for holomorphic functions defined on a domain with countable complement in C (a property the map N tanh will be shown to possess). We thus have the following definition. We aim to show that the set of simple poles of N tanh is nonempty under these assumptions. To this end, first note that we can write where N w , for w ∈ V >1 out := {w ∈ V out : lv(w) > 1}, are non-trivial regular GFNNs with input set {v in } and depth L(N w ) < L(N ), and f : D f → C is a meromorphic function given by One can show that (8) holds for z in an open set with countable complement in C (see Lemma 6), out and a p ∈ P w * , and set V * out = {w ∈ V >1 out : p ∈ P w }. We suppose that the following assumption holds: Next, note that, for w ∈ V * out , as p is a simple pole of N w tanh , we can write for z in an open neighborhood of p, where β w ∈ C \ {0}, γ w ∈ C, and w : D w → C is a function holomorphic on a domain D w with countable complement in C and such that w (0) = 0. Using (10) in (8) and performing the variable substitution z = 1 z−p then yields for all z ∈ C of sufficiently large modulus, where is analytic on a punctured neighborhood of p owing to the assumption (9). Then, according to (11), p will be a cluster point of simple poles of N tanh , unless the set of poles of is bounded. Therefore, if we can guarantee that (i) there exists a p ∈ P w * satisfying (9), and (ii) the set of poles of the function (12) is unbounded, then we will be able to conclude that the set of simple poles of N tanh is nonempty, as desired.
Item (i) can be established by more careful bookkeeping of the clusters of poles already formed in N w tanh , for w ∈ V >1 out , whereas (ii) will be a consequence of the composite alignment condition introduced next.
Definition 13 (Composite alignment condition). Let σ be a meromorphic nonlinearity on C with infinitely many simple poles and no poles of higher order. We say that σ satisfies the composite alignment condition (CAC) if the following implication holds for all finite sets of triples the set of poles of To see why item (ii) above follows from the CAC, assume by way of contradiction that the set of poles of the function (12) is bounded. Then, by the CAC, there exists a nonempty U ⊂ V * out such that β −1 w1 s1 = β −1 w2 w2 , for all w ∈ U , and the set of poles of is bounded. This together with (10) implies that for all w 1 , w 2 ∈ U , it would be possible to find distinct w 1 , w 2 ∈ U and construct a non-trivial tanh is constant, which would contradict the assumption that the set of simple poles of N tanh is nonempty. Therefore, (15) must hold. Moreover, (15) will imply the existence of a ϑ ∈ R and a c ∈ C such that β w e −iϑ ∈ R, for all w ∈ U , and As the set of poles of f U is bounded and β w e −iϑ ∈ R, for all w ∈ U , the SAC for σ now implies that the function (16) must be constant. However, this and (15) together contradict the irreducibility of N , establishing that the set of poles of (12) must be unbounded.
Finally, it remains to justify why tanh satisfies the CAC. To this end, we first need to define and analyze several concepts related to densities of subsets of C. These will be used to characterize the geometric relationship between the poles of the summand functions in (13).
[Line, arithmetic sequence, and density] (i) A line in C is a set of the form = {x + ty : t ∈ R}, where x ∈ C and y ∈ C \ {0}.
(ii) An arithmetic sequence in C is a set of the form Π = {x + ky : k ∈ Z}, where x ∈ C and y ∈ C \ {0}.
(iii) For an arbitrary set F ⊂ C, a discrete set P ⊂ C, and ε > 0, we set and we define the asymptotic density of P along F by Note that the limit as ε → 0 in the previous definition always exists, as ∆ ε (F, P ) is an increasing function of ε. Furthermore, as the limit superior is subadditive, so is the asymptotic density, specifically, for F ⊂ C and discrete P 1 , P 2 ⊂ C.
Now, assume that the antecedent of (13) is satisfied with σ = tanh, and let P s denote the set of poles of z → tanh (β s z + γ s + s (1/z)), for s ∈ I. In order to specify the subset I ⊂ I for which we will prove the consequent of (13), we first observe the following: -There exists an R > 0 such that, for every s 1 ∈ I and every p ∈ P s1 with |p| > R, there exists an s 2 ∈ I distinct from s 1 such that p ∈ P s2 , -for every s ∈ I, the set P s is asymptotic to the arithmetic sequence Π s := β −1 s − γ s + iπ Z + 1 2 , in the sense that, for every ε > 0, there exists an A > 0 such that every p ∈ P s with |p | > A is within ε of Π s and every p ∈ Π s is within ε of P s , and -for every s ∈ I, the density of P s along the line = {β −1 s (−γ s + it) : t ∈ R} is strictly positive, i.e., we have ∆( , P s ) > 0.
This motivates defining an undirected graph G = (I, E) on I, with E given by Informally, the condition ∆( , P s1 ∩ P s2 ) > 0, for (s 1 , s 2 ) ∈ E, imposes sufficient "geometrical rigidity" on the points of P s1 and P s2 in order for β −1 s1 s1 = β −1 s2 s2 to hold, whereas, for (s 1 , s 2 ) / ∈ E, we have ∆( , P s1 ∩ P s2 ) = 0 for every line in C, and so P s1 and P s2 do not "get in the way" of one another. This reasoning will allow us to show that the consequent of (13) holds for every connected component of G. To this end, we fix an arbitrary connected component I of G and s 1 , On the other hand, one can show which, by a special case of Weyl's equidistribution theorem [14,Cor. 2.A.12], implies that β s1 /β s2 ∈ Q, further implying that Π s1 − Π s2 is uniformly discrete. Therefore, we must have (β −1 s1 s1 − β −1 s2 s2 )(1/p n ) = 0, for all sufficiently large n, and thus, as β −1 s1 s1 − β −1 s2 s2 is analytic on a neighborhood of 0 and 1/p n → 0 as n → ∞, it follows by the identity theorem that β −1 s1 s1 − β −1 s2 s2 = 0. Hence, as s 1 and s 2 were arbitrary and I is connected, we must have β −1 s1 s1 = β −1 s2 s2 , for all s 1 , s 2 ∈ I . It remains to show that the set P I of poles of f I := s∈I α s tanh (β s · + γ s ) is bounded. We will, in fact, prove a stronger statement, namely that P I is empty. To this end, suppose by way of contradiction that the set P I is nonempty. Then, by an argument analogous to the discussion of the single-layer case, there must exist a line in C such that ∆( , P I ) > 0. Next, letting ξ = β −1 s s for an arbitrary s ∈ I , we have s = β s ξ, for all s ∈ I , and thus the asymptotic density of the poles of along is equal to ∆( , P I ), since ξ(1/z) → 0 as |z| → ∞. Now, using the subadditivity property of the asymptotic density, we find that the set P I of poles of s∈I α s tanh (β s · + γ s + s (1/·)) must which contradicts the assumption that P I is bounded. This proves that P I = ∅, thereby establishing the CAC for tanh and concluding our informal argument establishing the null-net property for tanh on {v in }.

C. General meromorphic nonlinearities and arbitrary input sets
We will later formalize the discussion in the previous two subsections, proving the following result for meromorphic nonlinearities more general than tanh.
Proposition 1. Let σ be a meromorphic nonlinearity on C with infinitely many simple poles and no poles of higher order. Suppose that σ(R) ⊂ R, and that σ satisfies both the SAC and the CAC. Then, for every non-trivial regular GFNN N with 1-dimensional output and a singleton input set {v in }, the map N σ can be analytically continued to a domain with countable complement in C, and its set of poles is nonempty. In particular, σ satisfies the general (and therefore also the layered) null-net The final step is to establish the null-net property on input sets V in = {v 0 1 , . . . , v 0 D0 } of arbitrary size D 0 . As the argument is identical for tanh and for more general meromorphic nonlinearities, we proceed by assuming that σ is a meromorphic nonlinearity on C satisfying the SAC and the CAC, but otherwise arbitrary (we will shortly discuss such nonlinearities that are not tanh). We argue by contradiction, i.e., we assume the existence of a non-trivial regular GFNN M with input set V in and a one-dimensional output identically equal to zero. Next, we use the input anchoring procedure, which is a method for constructing a non-trivial network M a derived from M in a manner that preserves the zero-output property while reducing the cardinality of the input set. This is achieved by selecting an input node of M, say v 0 D0 , and a real number a ∈ R that is then assigned to that node as a fixed value and propagated through the network in the form of bias alteration. The parts of M whose contributions are rendered constant in the process are then deleted. The so-constructed network M a has a smaller input set V in \ {v 0 D0 } and by construction satisfies We will later show that a value of a can be selected so that the network M a is regular. The procedure can now be repeated, successively eliminating the input nodes until only one remains. We are thus left with a non-trivial regular GFNN with a singleton input set and one-dimensional output identically equal to zero. This constitutes a contradiction to the null-net property for σ on singleton input sets, thereby establishing the null-net property on arbitrary input sets. The input anchoring procedure is illustrated in Figure 7. Formalizing this argument will allow us to prove the following theorem.
Theorem 3. Let σ be a meromorphic nonlinearity on C with infinitely many simple poles and no poles of higher order. Suppose that σ(R) ⊂ R, and that σ satisfies both the SAC and the CAC. Then σ satisfies the general (and therefore also the layered) null-net condition on V in , for every finite set V in . The SAC and the CAC are admittedly rather technical conditions. However, unlike the null-net condition, which is a "recursive" statement about σ (i.e., a statement about repeated compositions of affine functions and σ), the alignment conditions are statements about linear combinations of functions.
The significance of Theorem 3 thus lies in bridging the conceptual gap between the identifiability of single-layer networks and the identifiability of multi-layer networks, at least for meromorphic nonlinearities with simple poles only. In the present paper, we verify the SAC and the CAC for the class Σ a,b of "tanh-type" nonlinearities introduced next.
where C ∈ C, and {c k } k∈Z is a sequence of complex numbers such that sup k∈Z |c k |e −πa |k|/b < ∞, for some a ∈ (0, a), and at least one c k is nonzero.
Theorem 4. Let a, b > 0 and let σ ∈ Σ a,b . Then σ satisfies the SAC and the CAC.
The proof of Theorem 4 is a generalization of the arguments presented above establishing the SAC and the CAC for the tanh nonlinearity. Specifically, it relies on the ib-periodicity of the nonlinearities in Σ a,b and the lattice geometry of their poles. As the proof involves the application of various "point density" techniques (such as the Kronecker-Weyl equidistribution theorem) to the poles of functions of the form σ(β · + γ + (1/·)) (where is an ABC), Theorem 4 can be seen as a far-reaching refinement of the "Deconstruction Lemma" in [14]. We finally remark that our techniques can be adapted to prove the SAC and the CAC for nonlinearities of the form σ(z) = r(e z ), where r is a bounded non-constant real rational function with only simple poles.
The implications of Theorems 1, 2, 3, and 4 can now be summarized as follows: In particular, as Lemma 2 implies that tanh-isomorphism is the relation ∼ ± , and tanh ∈ Σ 1,π , Theorem 5 specializes to the following result. are identifiable up to ∼ ± .
We remark that the characterization of irreducibility for the tanh nonlinearity according to Lemma 1 directly generalizes the concept of irreducibility in [13], and is analogous to the no-clones condition introduced in [16].

E. Nonlinearities in Σ a,b with exotic affine symmetries
Note that, given an arbitrary ζ ∈ R and a finite set of real numbers {(α s , β s , γ s )} s∈I , it is not clear whether there exists a nonlinearity with the affine symmetry (ζ, {(α s , β s , γ s )} s∈I ). It is likewise unclear if such a nonlinearity exists that additionally satisfies the null-net condition. Even though the existence of such nonlinearities would be desirable to justify the generality of the theory of ρmodification and ρ-isomorphism presented in Section II, this is likely a difficult open problem. We are, however, able to offer a partial solution by showing that the class Σ a,b contains nonlinearities with (infinitely many) distinct affine symmetries that are more involved than the trivial and odd symmetries of the tanh function.

F. Organization of the remainder of the paper
We conclude this section by laying out the organization of the remainder of the paper. In Section IV, we formalize the concepts of ρ-modification and ρ-isomorphism and prove Theorems 1 and 2. In Section V, we analyze the pole structure of network maps with a meromorphic nonlinearity satisfying the SAC and the CAC, providing a formal proof of (a strengthened version of) Proposition 1. In Section VI, we introduce the procedure of input anchoring, allowing us to prove Theorem 3, and in Section VII, we analyze the fine properties of Σ a,b -nonlinearities, allowing us, in turn, to prove Theorem 4. Finally, the Appendix contains the proofs of various ancillary results needed throughout the paper.

IV. THE ρ-ISOMORPHISM AND THE NULL-NET THEOREMS
A. Irreducibility, regularity, ρ-modification, and the ρ-isomorphism We begin this chapter by formalizing the concepts of irreducibility and regularity, already introduced informally in Section II.
output, and let ρ : R → R be a nonlinearity. Let U ⊂ V be a set of nodes, and suppose the following hold: (i) the nodes in U have a common parent set P ⊂ V , i.e., par(u) = P , for all u ∈ U , for all u ∈ U , and (iii) there exist a ζ ∈ R and nonzero real numbers {α u } u∈U such that (ζ, {(α u , β u , θ u )} u∈U ) is an affine symmetry of ρ.
We then say that N is ρ -reducible. Whenever we wish to specify the set U causing the reducibility, we will say that N is (ρ, U )-reducible. Finally, a GFNN that is not reducible will be called irreducible.
Definition 17 (Regularity). We say that a GFNN is regular if it is irreducible and non-degenerate according to Definition 5. The set of all regular GFNNs, repectively regular LFNNs, with Ddimensional output and input set V in is denoted by N Vin,D G and N Vin,D L , respectively.
We now formalize symmetry modification, introduced informally in Section II. Before providing the formal definition, we motivate the concept by describing how an affine symmetry can be used to replace a single node in the network by newly-created nodes. Thus, let N be a GFNN, and let u * be a non-input node of N to be replaced. Let P = par(u * ), and suppose that B ⊂ V \ {u * } is a set of nodes with parent set P and such that there exist nonzero real numbers Suppose furthermore that the nonlinearity ρ has an affine symmetry ζ, for t ∈ R. Therefore, N can be modified without changing the map w ρ, N by removing the node u * , replacing the weights ω wu by ω wu − αuω wu * α u * , for u ∈ B ∩ par(w), creating new edges (u, w) with weights − αuω wu * α u * , for u ∈ B \ par(w), adjoining n new nodes {u 1 , . . . , u n } with biases γ p , incoming edges (v, u p ) with weights β p κ v , for v ∈ P , and outgoing edges (u p , w) with weights − α p ω wu * α u * , and finally replacing the bias θ w by θ w + ζω wu * α u * . In this example only the node u * was removed. However, multiple nodes (the set A in the next definition) can be removed at once in a similar manner, provided a suitable affine symmetry exists. We thus have the following formal definition: of non-input nodes with a common parent set P ⊂ V , and let W = {w ∈ V : par(w) ∩ A = ∅}.
Suppose the following are satisfied: , for all w ∈ W , and there exist nonzero real numbers {ν w } w∈W such that -For v ∈ P and u p ∈ C, an edge (v, u p ) is created and assigned weight β p κ v , and the node u p is assigned bias γ p .
-For w ∈ W and u p ∈ C, an edge (u p , w) is created and assigned weight −α p ν w , and the bias θ w is replaced by θ w + ζν w .
-For w ∈ W and u ∈ B -If A ∩ V out = ∅, then set V out = V out and Λ = Λ, completing the construction.
-If A ⊂ V out , then, for every r ∈ {1, . . . , D}, -the output scalar λ (r) is replaced by λ (r) + ζµ r , -for u p ∈ C, new output scalars λ (r) out ∪ C completing the construction. We say that the so-constructed network N is a (ρ ; A, B, C)-modification of N . Whenever it is not necessary to explicitly specify the sets A, B, and C involved in the modification, we will simply say that N is a a ρ-modification of N . A ρ-modification that is a regular network is called a regular ρ-modification.
Note that the set B in Definition 18 is allowed to be empty, but the sets A and C must be nonempty.
In particular, Definition 18 does not encompass ρ-reduction, in contrast to the informal definition of ρ-modification provided in Section II. This is in order to avoid the scenario described in Figure 5 that necessitates further alteration to obtain a network without "constant parts". Moreover, restricting the number of possibilities in which ρ-modification can be carried out renders the claims of Theorems 1 and 2 stronger.
The following proposition summarizes the properties of GFNNs that are readily seen to be preserved under ρ-modification.
out , Ω 1 , Θ 1 , Λ 1 ) be a GFNN with D-dimensional output, let ρ be a nonlinearity, and let N 2 = (V 2 , E 2 , V in , V 2 out , Ω 2 , Θ 2 , Λ 2 ) be a ρ-modification of N 1 . Then, (i) if N 1 is layered, then N 2 is also layered, These properties naturally lead to the following definition of isomorphism up to ρ-modification. It is readily seen that the networks N 1 , N 2 , N 3 , and N 4 in Figure 3 are ρ c -isomorphic.
Definition 19 (ρ-isomorphism). Let N and M be regular GFNNs with D-dimensional output and the same input set, and let ρ : R → R be a nonlinearity. We say that N is ρ-isomorphic to M, and write as desired.
We note that trivial networks T Vin, D do not admit any ρ-modifications, and therefore the only network that is ρ-isomorphic to T Vin, D is T Vin, D itself.

B. Subnetworks and proofs of the null-net theorems
The following proposition is the cornerstone of the null-net theorems. with one-dimensional output and input set V in such that A ρ = 0.
The proof of Proposition 6 relies crucially on being able to perform ρ-modification in a manner that preserves regularity. Unfortunately, neither irreducibility nor non-degeneracy are generally preserved under ρ-modification. The following proposition, however, tells us that, for every ρ-modification of a regular GFNN, there exists an alternative (but related) ρ-modification that preserves regularity, which will suffice for the purpose of proving Proposition 6.
Proposition 7. Let N be a regular GFNN with D-dimensional output, let ρ : R → R be a nonlinearity, and let A 0 , B 0 be disjoint sets of nodes of N with a common parent set P such that N admits a (ρ ; A 0 , B 0 , C 0 )-modification. Then there exist disjoint sets A ⊃ A 0 and B of nodes with common parent set P , and a C ⊂ C 0 , such that N admits a regular (ρ ; A, B, C)-modification.
The proof of Proposition 7 proceeds via the next two lemmas (proved in the Appendix) that treat the irreducibility and non-degeneracy aspects of regularity separately. To motivate the first lemma, we note that ρ-modification can be seen as a process whereby certain nodes A are removed from a GFNN by replacing their maps with a combination of the maps of nodes B already present in the GFNN, as well as several "nascent" nodes C. However, if we add too many nascent nodes C at once, we might provoke a (ρ, B ∪ C ∪ D)-reducibility in the resulting network, for some set of nodes D.
Our lemma thus shows that irreducibility can be preserved by "modifying frugally", i.e., by adding the least possible number of nodes C that facilitates ρ-modification: To motivate the second lemma, note that the (ρ ; A, B, C)-modification of a non-degenerate network N is degenerate precisely if there exists a node u * ∈ B that both loses all its outgoing edges in the process, and, if u * is an output node of N , all its output scalars are set to zero. Degeneracy can thus be avoided by performing an alternative ρ-modification that, in addition to the nodes in A, removes such problematic nodes as well. Lemma 4. Let N be a non-degenerate GFNN with D-dimensional output, let ρ : R → R be a nonlinearity, and let A, B be disjoint sets of nodes of N with a common parent set such that N admits a (ρ ; A, B, C)-modification. Then there exists a set B * ⊂ B such that N admits a non- We are now ready to prove Proposition 7.
Proof of Proposition 7. Let C ⊂ C 0 be a subset of minimal cardinality such that N admits a (ρ , A , B , C)-modification, for some disjoint sets A ⊃ A 0 and B of nodes of N with a common parent set. Now, as N is regular and hence non-degenerate, we have by Lemma 4 that there exists a B * ⊂ B such that N admits a non-degenerate (ρ ; A ∪ B * , B \ B * , C)-modification N . As N is irreducible, it follows by Lemma 3 that N is irreducible, and thus N is the desired regular ρ-modification of N .
In order to prove Proposition 6, we will also need the following definition of a subnetwork of a GFNN: Whenever we wish to specify explicitly the set S giving rise to N , we will say that N is a subnetwork of N generated by S.
Note that subnetworks generated by a set S are not unique. They become unique, though, if we also specify their input and output sets V in and V out , and their set of output scalars Λ .
Proof of Proposition 6. Let Note that both N 1 and N 3 contain M 1 as a subnetwork. In particular, the set of nodes of M 1 is given by V M1 = V 1 ∩ V 3 . Furthermore, as N 2 ρ ∼ N 3 , we have by Proposition 4: for all r ∈ {1, . . . , D}. We now show the following: Claim: there exist an r ∈ {1, . . . , D} and a w ∈ V 1 out ∪ V 3 out such that at least one of the following three statements holds: Proof of Claim. Suppose by way of contradiction that this is not the case, i.e., we have for all r. Then, as N 1 is non-degenerate, Property (ii) in Definition 5 implies that V 1 out \ V 3 out = ∅, i.e., V 1 out ⊂ V 3 out . Similarly, as N 3 is non-degenerate, we have V 3 out \ V 1 out = ∅, and thus V 1 out = V 3 out ⊂ V M1 . But then we have V 1 \V in = V M1 \V in = V 3 \V in , again by non-degeneracy of N 1 and N 3 . Next, as λ Furthermore, for r ∈ {1, . . . , D} and w ∈ S r , let w : w ∈ S r } and Λ = D r=1 Λ (r) . By the Claim we know that there exists an r * such that S r * = ∅. Moreover, as S r . Finally, let A be the subnetwork of A generated by S r * , with input set V in , output set S r * , and output scalars Λ (r * ) . Then A is non-degenerate by construction, A is not the trivial network T Vin, 1 as S r * ∩ V in = ∅, and by (19) we have A ρ = 0. Moreover, if N 1 and N 2 are layered, then N 3 is layered as it is ρ-isomorphic to N 2 , and hence A is layered as well.
It remains to show that A is irreducible. As A is a subnetwork of A, it suffices to show that A is irreducible. Assume by way of contradiction that A is (ρ, U )-reducible for some U ⊂ V 1 ∪ V 3 .
As N 1 and N 3 are both irreducible, we must have U ⊂ V 1 and U ⊂ V 3 . In particular, we have U ∩ V 3 = ∅, U \ V 3 = ∅, and the common parent set P of the nodes U is contained in V M1 . By definition of reducibility, there exist sets of nonzero real numbers {κ v } v∈P and {β u } u∈U such that {ω uv } v∈P = β u {κ v } v∈P , for all u ∈ U , as well as a ζ ∈ R and nonzero real numbers {α u } u∈U such that (ζ, {(α u , β u , θ u ) u∈U }) is an affine symmetry of ρ. Now, by definition of affine symmetry, Fix an arbitrary node u * ∈ U ∩ V 3 and let B 0 : Then (21) can be rearranged to A with one-dimensional output and input set V in such that A ρ = 0. Therefore, ρ fails the general (respectively layered) null-net condition on V in .
Conversely, suppose that ρ does not satisfy the general (respectively layered) null-net condition on V in , and let A be a non-trivial regular GFNN (respectively LFNN) with one-dimensional output such that A ρ = 0. Then the networks T Vin, 1 and A are regular GFNNs (respectively LFNNs) satisfying T Vin, 1 ρ = 0 = A ρ , and are not ρ-isomorphic (simply as the only network that is ρ-isomorphic to T Vin, D is T Vin, D itself). Hence, (N Vin,D Throughout this section we fix a meromorphic nonlinearity σ such that σ(R) ⊂ R, -σ has infinitely many simple poles and no poles of higher order, and σ satisfies the SAC and the CAC.
In this section we formally establish that the map of every non-trivial regular GFNN N with 1dimensional output and a singleton input set can be analytically continued to a domain with countable complement in C, and that the set of simple poles of N σ is nonempty. We will, in fact, prove a much stronger result about the structure of the singularities of N σ . In order to state this result, we need the concept of clustering depth introduced next.
Definition 23 (Cluster sets and clustering depth). Let E ⊂ C be a set and let z ∈ C be a point.
(i) For a nonnegative integer k we define the k th cluster set C k (E) of E inductively as follows: -We set C 0 (E) = E, and -for k ≥ 1, we let C k (E) be the set of cluster points of C k−1 (E).
(ii) We define the clustering depth L C (E) of E as the least k for which C k (E) = ∅, if such a k exists, and otherwise we set L C (E) = ∞.
(iii) We define the clustering depth of E at z by Note that the limit as ε → 0 in the previous definition always exists, as L C (E ∩ D • (z, ε)) is an increasing function of ε. The following lemma lists some of the properties of cluster sets and clustering depth.
Lemma 5. Let E, F ⊂ C be sets, let z ∈ C be a point, and let k be a nonnegative integer. Then We are now ready to state the main result of this section, which strengthens Proposition 1. Note that this result immediately implies Proposition 1 since the depth of a non-trivial GFNN N is at least one, and hence L C (P N ) = L(N ) ≥ 1 implies that P N = ∅. We remark that Statement (ii) of Proposition 8 is equivalent to the assertion that every essential singularity of N σ be the limit of a sequence of its simple poles. The proof of Proposition 8 uses the following auxiliary results, whose proofs can be found in the Appendix.
Lemma 6. Let f : D f → C be a non-constant holomorphic function on its natural domain D f (with countable complement in C), and suppose that C \ D f is countable. Furthermore, let g : D g → C be a meromorphic function on C with a nonempty set of poles P . Then g • f can be analytically continued to D := {z ∈ D f : f (z) ∈ C \ P }, and D has countable complement in C.
Lemma 7. Let ρ : R → R be a nonlinearity, and let J be a finite index set. Suppose that {(α s , β s , γ s )} s∈J are triples of real numbers such that s∈J α s ρ(β s · + γ s ) is constant. Assume j * ∈ J is such that α j * = 0. Then there exist a set I ⊂ J such that j * ∈ I, and real { α s } s∈I such that α j * = 0 and (ζ, {( α s , β s , γ s )} s∈I ) is an affine symmetry of ρ, for some ζ ∈ R.
Proof of Proposition 8. The proof follows the argument outlined in Section III. We proceed by induction on L(N ). To establish the base case, we assume that L(N ) = 1, and enumerate the nodes V \ {v in } as {u 1 , . . . , u D1 }. Now, as N is non-degenerate, we have V out = {u 1 , . . . , u D1 }, and so we can write where λ (1) uj = 0, for all j ∈ {1, . . . , D 1 }. Therefore, N σ is meromorphic on C, and so Statements (i) and (ii) hold immediately. To show Statement (iii), note that P N is discrete (simply as N σ is meromorphic), and so P N = P N and L C (P N ) ≤ 1. It therefore suffices to show that P N is nonempty, as we will then have L C (P N ) ≥ 1. Suppose by way of contradiction that P N is empty.
Then, in particular, P N is bounded, and so the SAC for σ implies that N σ is constant. Thus, D1 j=1 λ (1) uj σ(ω ujv in · + θ uj ) is constant, and hence, by Lemma 7, there exist a nonempty set U ⊂ {u 1 , . . . , u D1 } and real numbers ζ and {α u } u∈U such that (ζ, {(α u , ω uv in , θ u )} u∈U ) is an affine symmetry of σ. This implies that N is (σ, U )-reducible, which stands in contradiction to the regularity of N , and thus establishes that P N is nonempty.
We proceed to the induction step. Suppose that L(N ) ≥ 2 and assume that the claim of the proposition holds for all non-trivial regular GFNNs N with 1-dimensional output, input set {v in }, and depth L(N ) < L(N ). We can now write where N w , for w ∈ V >1 out := {w ∈ V out : lv(w) > 1}, are non-trivial regular GFNNs with input set {v in } and depth L(N w ) < L(N ), and f : D f → C, given by is a meromorphic function with simple poles only.
For Statement (i), we first observe that, for w ∈ V >1 out , the induction hypothesis for N w implies that N w σ is non-constant and can be analytically continued to a domain with countable complement in C. Thus, by Lemma 6, we have that σ • N w σ also analytically continues to a domain with countable complement in C, and, in particular, its natural domain D σ• Nw σ is well-defined. Next, note that N σ can be analytically continued to the set Then, as f is meromorphic and C \ D σ• Nw σ is countable, for every w ∈ V >1 out , we have that C \ D is countable, establishing Statement (i) for N . (Note that the natural domain D N σ can be a strict superset of D, e.g., if there is a point in C that is a simple pole of σ • N w1 σ and σ • N w2 σ for distinct w 1 and w 2 , their residues could be such that the pole disappears in the linear combination (22)).
For Statement (ii), we begin by noting that, as C\D N σ is countable, every element of C is a point of analyticity, a pole, or an essential singularity of N σ , and we can thus write C\D N σ = P N ∪E N , where P N is the set of simple poles of N σ and E N is the set of its essential singularities and poles of higher order. Now, as C 1 (P N ) ⊂ E N , in order to complete the proof of Statement (ii) for N , it suffices to establish that E N ⊂ C 1 (P N ). To this end, note that the induction hypothesis for N w implies that we can write C \ D Nw σ = P w ∪ E w , where D Nw σ is the natural domain of N w σ , P w is its set of simple poles, and E w = P w \ P w = C 1 (P w ) is the set of its essential singularities, for w ∈ V >1 out . Then, recalling (22) and the fact that f and σ are meromorphic with simple poles only, we have and thus E N ⊂ w∈V >1 out P w . It will therefore be enough to show that To this end, first note that we immediately have C 1 (P N ) ⊂ E N ⊂ w∈V >1 out P w . For the reverse inclusion, we let p ∈ w∈V >1 out P w , and distinguish between the cases p / ∈ w∈V >1 out E w and p ∈ w∈V >1 out E w .
The case p / ∈ w∈V >1 out E w . Fix an arbitrary w * ∈ V >1 out such that p ∈ P w * and set V * out = {w ∈ V >1 out : p ∈ P w }. Now, for w ∈ V * out , as p is a simple pole of N w σ , we can write for z in an open neighborhood of p, where β w ∈ C \ {0}, γ w ∈ C, and w : D w → C is an ABC.
Then, using (24) in (22) and performing the variable substitution z = 1 z−p yields for all z ∈ C of sufficiently large modulus, where Now, due to the case assumption p / ∈ w∈V >1 out E w , we have that N w σ is analytic at p, for all w ∈ V >1 out \ V * out , and so g is analytic on a punctured neighborhood of p. Thus, according to (25), we will have p ∈ C 1 (P N ), unless the set of poles of is bounded. Suppose by way of contradiction that the set of poles of (26) is bounded. Then, by the CAC for σ, there exists a nonempty U ⊂ V * out such that β −1 w1 w1 = β −1 w2 w2 , for all w 1 , w 2 ∈ U , and the set of poles of is bounded. This and (24) together imply that is constant, for all w 1 , w 2 ∈ U . We next establish the following claim.
Claim 1: Writing Y w = par N (w), for w ∈ U , we have for all w 1 , w 2 ∈ U , and there exists a ϑ ∈ R such that β w e −iϑ ∈ R, for all w ∈ U .
Proof of Claim 1. We argue by contradiction, so suppose that the claim is false. Then there exist and define the sets Z Im 1 , Z Im 2 , Z Im 3 , and S Im analogously. Then, by our assumption, at least one of S Re and S Im must be nonempty. Suppose for now that S Re = ∅. Next, set to be the subnetwork of N with onedimensional output generated by S. Then N is a regular GFNN of depth L(N ) < L(N ), and, as lv N (w) > 1, for w ∈ U ⊂ V >1 out , we have that N is non-trivial. It hence follows by the induction hypothesis for N that the set P N of poles of N satisfies L C (P N ) = L(N ) ≥ 1. In particular, we have P N = ∅. On the other hand, showing that N σ is constant, which stands in contradiction to P N = ∅. An entirely analogous argument leads to a contradiction in the case S Re = ∅ and S Im = ∅, establishing that (29) must hold. Now, as β w1 /β w2 = ω w1v /ω w2v ∈ R, for all w 1 , w 2 ∈ U and v ∈ Y , the β w all have the same complex argument, and so there must exist a ϑ ∈ R such that β w e −iϑ ∈ R, for all w ∈ U , completing the proof of Claim 1.

Now, (29) implies that
which together with (28) gives for all w 1 , w 2 ∈ U . Therefore, there exists a c ∈ C such that β −1 w (θ w − γ w ) = c, for all w ∈ U . Hence, recalling (27), we have that the set of poles of is bounded, and so, as β w e −iϑ ∈ R, for all w ∈ U , the SAC for σ implies that w∈U λ (1) w σ β w e −iϑ · + θ w must be constant. Now, Lemma 7 establishes the existence of a nonempty U ⊂ U and real numbers ζ and {α w } w∈U such that ζ, {(α w , β w e −iϑ , θ w )} w∈U is a symmetry of σ. On the other hand, Claim 1 implies that the nodes U have a common parent set Y in N and that there exist nonzero real numbers {κ v } v∈Y such that {ω wv } v∈Y = β w e −iϑ {κ v } v∈Y , for all w ∈ U , therefore implying that N is (σ, U )-reducible. This, however, contradicts the assumption that N is regular and thereby establishes that p ∈ C 1 (P N ) in the case p / ∈ w∈V >1 out E w . The case p ∈ w∈V >1 out E w . Define the sets P • u := P u w∈V >1 out E w , for u ∈ V >1 out . Then every element of P • u is a cluster point of P N (by the case already established), for every u ∈ V >1 out , and thus p itself will be a cluster point of P N , provided we can establish the existence of a u * ∈ V >1 out such that p ∈ C 1 (P • u * ). This will be an immediate consequence of the following claim.
Proof of Claim 2. For every u ∈ V >1 out , we have P • u ⊂ P u , and so For the reverse inclusion, we suppose by way of contradiction that there exists a point y ∈ w∈V >1 out E w u∈V >1 by Statement (iii) for N w , and so Let w * ∈ V >1 out be such that y ∈ E w * and L C (E w * , y) = k. Next, as y is not an element of , it is not a cluster point of P • w * , and so there exists an ε > 0 such that P • w * ∩ D • (y, ε) = ∅. Then, by definition of P • w * , we have for every δ ∈ (0, ε), and thus, using item (vi) of Lemma 5, we get On the other hand, as E w * = C 1 (P w * ), we have C 1 (P w * ∩ D • (y, δ)) = E w * ∩ D(y, δ) = ∅, and so Now, (33) and (34) together yield and so there must exist a w ∈ V >1 out such that L C (E w , y) ≥ k + 1. Thus, by item (iii) of Lemma 5 and the fact that E w is closed (which follows from item (i) of the same lemma and E w = C 1 (P w )), we must have y ∈ E w . But now which contradicts (32) and thus concludes the proof of Claim 2.
We have thus established that w∈V >1 out P w ⊂ C 1 (P N ), completing the proof of (23) and thereby proving Statement (ii) for N .
In order to establish Statement (iii), we use (23) together with item (vi) of Lemma 5 and the induction hypothesis to argue as follows: This, in particular, implies that C 1 (P N ) is nonempty (as L(N ) − 1 ≥ 1), and hence where we also used C 1 (P N ) = C 1 (P N ). This concludes the proof of the proposition.

A. Input anchoring
In this section, we introduce the procedure of input anchoring, which will allow us to extend the null-net property for meromorphic symmetries on a singleton input set to input sets of arbitrary size.
This procedure was first introduced in [16] for networks satisfying the so-called no-clones condition, which constitutes a special case of irreducibility for nonlinearities with no affine symmetries other than the trivial ones. We now generalize this method to arbitrary nonlinearities σ satisfying the SAC.
This involves finding a precise "topological description" of the set of affine symmetries of σ (in the sense of Lemma 8 below), as well as applying the Baire category theorem. Before further discussing input anchoring, we address the case of regular GFNNs having input nodes without any outgoing edges (which is allowed by Definition 5). Concretely, suppose that M = (V, E, V in , V out , Ω, Θ, Λ) is a non-trivial regular GFNN with one-dimensional output such that M ρ = 0. Then, writing V 0 in for the set of input nodes of M without any outgoing edges, we have V 0 in V in , as M is non-trivial. Therefore, we can define a non-trivial regular GFNN M = (V , E, V in , V out , Ω, Θ, Λ) with one-dimensional output, obtained from M by deleting the nodes V 0 in . This network also satisfies M ρ = 0, as well as V ⊂ anc(V out \ V in ), which can be viewed as a stronger version of Property (i) of Definition 5. Thus, we can henceforth work w.l.o.g.
with networks satisfying the following strong regularity condition.
Definition 24 (Strong non-degeneracy and strong regularity). Let M = (V, E, V in , V out , Ω, Θ, Λ) be a GFNN. We say that M is strongly non-degenerate if it is non-degenerate and V = anc(V out \ V in ).
We call M strongly regular if it is strongly non-degenerate and irreducible.
be a strongly regular GFNN with onedimensional output identically equal to zero. Enumerate the input nodes according to V M in = {v 0 1 , . . . , v 0 D0 }, and suppose that D 0 ≥ 2. Let a ∈ R and let ρ be a nonlinearity. We seek to construct a non- for all (t 1 , t 2 , . . . , t D0−1 ) ∈ R D0−1 (after identifying R Vin with R D0 ).
(IA-2) For all w ∈ V M out \ V Ma out , the function R D0−1 → R given by is constant, and we denote its value by w ρ, M (a). In the following definition, we provide the desired network M a , and we refer the reader to Figure   7 in Section III for an illustration of this construction.
Let a ∈ R, and let ρ be a nonlinearity such that M ρ = 0. The network obtained from M by anchoring the input v 0 D0 to a with respect to ρ is the GFNN given by the following: -For a node v ∈ V M \ V Ma , we define recursively (Note that this is well-defined, as The network M a satisfies (IA-1) and (IA-2) by construction, and therefore M a ρ = 0 by the choice of the output scalars of M a . Moreover, M a is strongly non-degenerate. To see this, take an arbitrary v ∈ V Ma . Then, by strong non-degeneracy of M, there exists a w ∈ V M out \ V M in such that v ∈ anc M (w). As w is connected directly with a node in V Ma , it follows that w ∈ V Ma , and so w ∈ V Ma out \ V Ma in . Therefore, v ∈ anc Ma (w), and, as v was arbitrary, we obtain V Ma ⊂ However, M a is not, in general, guaranteed to be irreducible. Consider, for instance, the network M in Figure 7. As the biases of the nodes u 1 , u 2 , w 2 , w 3 are changed, the network M a may be (ρ, {u 1 , u 2 })-reducible or (ρ, {w 2 , w 3 })-reducible, or both. This is unfortunate, as our program for proving Theorem 3 envisages maintaining regularity when constructing networks with zero output.
However, this nuisance can be circumvented, as the following lemma says that, for real meromorphic nonlinearities satisfying the SAC, either there exists some value of a ∈ R such that the network M a is, indeed, irreducible, or it is possible to select a strongly regular subnetwork N of M with input {v 0 D0 } and identically zero output. This will be sufficient for our purposes.
Let σ be a nonlinearity such that σ(R) ⊂ R, and suppose that σ is meromorphic on C and satisfies the SAC. Finally, suppose that M σ = 0, and let M a denote the network obtained by anchoring the input v 0 D0 to some a ∈ R with respect to σ, according to Definition 25. Then one of the following two statements must be true: (i) There exists an a ∈ R such that M a is strongly regular.
(ii) There exist a strongly regular subnetwork The proof of Proposition 9 requires the following auxiliary result, whose proof can be found in the Appendix. Proof of Proposition 9. For a subset U of nodes of M define Suppose that Statement (i) is false, so that, for every a ∈ R, there exists a U ⊂ V M such that a ∈ E U . We can then write R as a finite union and, as R is a complete metric space and the union over the subsets of V M is finite, it follows by the Baire category theorem [20,Thm. 5.6] that there exists a U ⊂ V M such that E U is not meagre in R, i.e., it is not a countable union of nowhere dense sets. Fix such a set U , let P be the common parent set in M a of the nodes in U , let {κ v } v∈P and {β u } u∈U be such that for all u ∈ U , and set P = u∈U (par M (u) \ P ). Note that, for v ∈ P , the map v σ, M depends on v 0 D0 , but not on the remaining input nodes {v 0 1 , . . . , v 0 D0−1 } of M, so we can write v σ, M (a) for the value of v σ, M at an arbitrary point (t 1 , . . . , t D0−1 , a) ∈ R Vin . Now, the bias of every u ∈ U in M a is given by As σ is a meromorphic function satisfying the SAC and σ(R) ⊂ R, we know by Lemma 8 that the is an affine symmetry of σ for some ζ ∈ R and nonzero real numbers {α u } u∈U is a countable union of parallel lines in R U , i.e., there exists a countable set Γ ⊂ R U such that Note that, by definition of reducibility, we have (ξ u (a)) u∈U ∈ Γ, for all a ∈ E U , and thus we can partition E U according to Now, as E U is not a countable union of nowhere dense sets, and Γ is countable, there must exist a γ ∈ Γ such that E γ U is dense in an open subset of R. Next, consider ϑ ∈ R U such that u∈U β u ϑ u = 0. Then for all a ∈ E γ U . As σ(R) ⊂ R, the functions a → ξ u (a), for u ∈ U , are holomorphic in a neighborhood of R. Hence, as E γ U has a cluster point in R, it follows by the identity theorem [20, Thm. 10.18] that (37) holds for all a ∈ R. Now, as ϑ was arbitrary, we see that, for every a ∈ R, there exists a ξ(a) ∈ R such that (ξ u (a) − γ u ) u∈U = ξ(a) · (β u ) u∈U .
for all u ∈ U , and thus, for all u 1 , u 2 ∈ U , we have that is constant as a function of a ∈ R. We now use this identity to construct a subnetwork N of M with one identically zero output and input {v 0 D0 }, thereby establishing Statement (ii) of the proposition. This will be done analogously to the construction of the network N in the proof of Proposition 8.
Concretely, we proceed by showing that there exist u 1 , u 2 ∈ U , u 1 = u 2 , such that either par M (u 1 ) \ P = par M (u 2 ) \ P or P := par M (u 1 ) \ P = par M (u 2 ) \ P and Suppose by way of contradiction that this is not the case. First note that #(U ) ≥ 2, as σ is nonconstant. Next, recalling that P = u∈U (par M (u) \ P ), we have par M (u) \ P = P , for all u ∈ U , and there exists a set of nonzero real numbers But this implies that M is (ρ, U )-reducible, contradicting the assumption that M is irreducible.
We can therefore find u 1 , u 2 ∈ U , u 1 = u 2 , such that either par M (u 1 ) \ P = par M (u 2 ) \ P , or P := par M (u 1 ) \ P = par M (u 2 ) \ P and β −1 u1 {ω u1v } v∈ P = β −1 u2 {ω u2v } v∈ P . It hence follows that there exists a v ∈ P such that one of the following statements holds: Hence S := {v ∈ P : one of (39) holds} is nonempty, and we can set for v ∈ S, We now take N = (V N , E N , {v 0 D0 }, S, Ω N , Θ N , Λ N ) to be the subnetwork of M with one-dimensional output, generated by S, and with Λ N as given in (40). Then N σ = 0 by (38), and N is strongly regular, as M is. This establishes Statement (ii) of the proposition and hence completes its proof.

B. Proof of Theorem 3
We are now ready to combine the results of Sections V and VI to prove Theorem 3.
Proof of Theorem 3. We argue by contradiction, so suppose that the statement is false. Specifically, fix a non-trivial regular GFNN A with one-dimensional identically zero output and input set V in of minimal cardinality. Then, as V in is of minimal cardinality, A must be strongly regular. We further claim that #(V in ) = 1. To see this, suppose by way of contradiction that #(V in ) ≥ 2, and apply Proposition 9 to A. Note that both circumstances of Proposition 9 yield a strongly regular network A with one-dimensional identically zero output, and input set V in strictly contained in V in . As #(V in ) < #(V in ), we have a contradiction to the minimality of #(V in ), and hence must have #(V in ) = 1. Now, as A σ | R = 0, it follows by the identity theorem that A σ continues in a unique fashion to the zero function on its natural domain D A σ = C. On the other hand, as A is non-trivial, Proposition 8 implies that the natural domain D A σ of the analytic continuation of A σ is equal to C \ P A , where P A is the set of poles of A σ satisfying L C (P A ) = L(A) ≥ 1. This, in particular, implies that P A must be nonempty, which stands in contradiction to D A σ = C, completing the proof.

A. Basic properties of Σ a,b -nonlinearities
In this section, we derive various straightforward results about lattices in C and the functions in Σ a,b and use these findings to establish both the SAC and the CAC for Σ a,b -nonlinearities. We begin with a lemma listing several elementary properties of Σ a,b -nonlinearities. In the following we write d(z, F ) = inf{|z − w| : w ∈ F } for the Euclidean distance between the point z ∈ C and the set F ⊂ C.
The proof of Lemma 9 can be found in the Appendix.

B. Asymptotic density and the CAC for Σ a,b -nonlinearities
The first main result of this section is the following proposition that immediately implies the CAC for functions in Σ a,b .
Proposition 10. Let a, b > 0, σ ∈ Σ a,b , and let {(α s , β s , γ s )} s∈I be a nonempty finite set of triples of complex numbers such that α s , β s ∈ C \ {0}, for all s ∈ I. Furthermore, let { s } s∈I be ABCs, and suppose that the function is analytic on C \ D(0, R), for some R > 0. Then the set I can be partitioned into sets I 1 , . . . , I n such that, for every j ∈ {1, . . . , n}, (i) there exists an ABC ξ j so that s = β s ξ j , for all s ∈ I j , and (ii) the function f j := s∈Ij α s σ(β s · + γ s ) is entire.
The proof of Proposition 10 uses several ancillary results about asymptotic densities of arithmetic sequences and lattices in the sense of Definition 14. Concretely, we will need the following three lemmas, whose proofs can be found in the Appendix, as well as a special case of Weyl's equidistribution theorem, which was also employed in the proof of the "Deconstruction Lemma" in [14].
Lemma 12. Let a, b > 0 and let σ ∈ Σ a,b . Furthermore, let {(α s , β s , γ s )} s∈I be a finite set of triples of complex numbers such that α s , β s = 0, for all s ∈ I, and set f := s∈I α s σ(β s · + γ s ). Then, either (i) f is entire, or (ii) f has a nonempty set of poles P f , and there exists a line in C such that ∆( , P f ) > 0.
We are now ready to prove Proposition 10.
Proof of Proposition 10. Let δ ∈ (0, 1/R) be sufficiently small for the functions s to be analytic on an open neighborhood of D(0, δ), for all s ∈ I. Then, for every s ∈ I, is a meromorphic function on D δ := C \ D(0, 1/δ). Let P s ⊂ D δ denote its set of poles. Next, for s ∈ I, set P s = β −1 s (P σ − γ s ), where P σ is the set of poles of σ. We now show the following: Claim: There exist δ ∈ (0, δ) and A > 1/(2δ ) such that, for all s ∈ I, the function g s : D • (0, δ ) → C given by g s (z) = βsz βs+z s(z) is biholomorphic onto its image Img(g s ) ⊃ D • (0, 1/A), and, for every p ∈ P s \ D(0, 2A), we have p := 1/g s (1/p ) ∈ P s \ D(0, A), and where h s := β −1 s ( s • g −1 s ) : D • (0, 1/A) → C. Proof of Claim. First note that, for every s ∈ I, the function z → g s (z) = βsz βs+z s(z) is holomorphic on a neighborhood of 0. Moreover, we have g s (0) = 0 and g s (0) = 1, and thus by the complex open mapping theorem [20,Thm. 10.32], there exists δ s ∈ (0, δ) such that g s : D • (0, δ s ) → C is biholomorphic onto its image. Let δ = min s∈I δ s and A > 0 be such that for all s ∈ I, for all s ∈ I, and The last of these conditions implies that the image Img(g s ) contains D • (0, 1/A), for every s ∈ I.
We proceed to show (42). To this end, fix an s ∈ I and take a p ∈ P s \ D(0, 2A). Then,|p | > 2A and ψ := β s p + γ s + s (1/p ) ∈ P σ . Now, and, as |1/p | < 1/(2A) < δ , we have This establishes p ∈ P s \ D(0, A). Next, as |1/p | < δ and g s : D • (0, δ ) → C is a bijection with its image containing D • (0, 1/A), we must have 1/p = g −1 s (1/p). Therefore, and let I 1 , . . . , I n be the subsets of I corresponding to different connected components of G. Next, fix a connected component I j of G. We proceed to establish the existence of an ABC ξ j such that s = β s ξ j , for all s ∈ I j . If I j = {s * } is a singleton set, we can then simply set ξ j = β −1 s * s * , so suppose that #(I j ) ≥ 2.
We are now ready to show that β −1 s1 s1 = β −1 s2 s2 . To this end, let {q k } k∈N be a sequence in P s1 ∩ P s2 such that |q k | → ∞ and d(q k , ) → 0 as k → ∞, and let {p k } k∈N ⊂ P s1 and {p k } k∈N ⊂ P s2 be the corresponding sequences such that |p k | → ∞ and |p k | → ∞ as k → ∞, and Then p k , p k ∈ P ,1 for sufficiently large k, and, since and P ,1 is uniformly discrete, we must have p k = p k , for all sufficiently large k. Therefore (h s1 − h s2 )(1/p k ) = 0, for all sufficiently large k. Since h s1 − h s2 is holomorphic in D • (0, 1/A) and 1/p k → 0 as k → ∞, it follows by the identity theorem that h s1 − h s2 = 0 on D • (0, 1/A). Now, choose an arbitrary x ∈ D • (0, δ ) ∩ g −1 s1 D • (0, 1/A) , and set x = g −1 s2 (g s1 (x)). Then and thus Hence, x = x and β −1 s1 s1 (x) = β −1 s2 s2 (x), and, since x was arbitrary, we again deduce by the identity theorem that β −1 s1 s1 = β −1 s2 s2 on D • (0, δ). Now, choose an arbitrary s * ∈ I j and define the ABC ξ j = β −1 s * s * . Then, for every s ∈ I j , as I j is a connected component of G, we can find a finite sequence s 1 = s, s 2 , . . . , s m−1 , s m = s * in I j such that (s k , s k+1 ) ∈ E, for k ∈ {1, . . . , m − 1}. Consequently, and thus s = β s ξ j , for all s ∈ I j . As the connected component I j of G was arbitrary, we have established item (i).
It remains to show that the functions f j = s∈Ij α s σ(β s · + γ s ) are entire. To this end, fix a j ∈ {1, . . . , n}, and suppose by way of contradiction that the set P fj of poles of f j is not empty.
Then, by Lemma 12, there must exist a line in C such that ∆( , P fj ) > 0. Next, define the function on D δ , and let P fj denote its set of poles. Now, for every p ∈ P fj with sufficiently large |p|, there exists a unique p ∈ D δ such that and 1/p → 0 as |p| → ∞. Then p ∈ P fj and |p − p | = |ξ j (1/p )| → 0 as |p| → ∞. Performing density estimates analogous to (45), we find that ∆( , P fj ) ≥ ∆( , P fj ) > 0. Finally, we let P f ⊂ D δ be the set of poles of f , and argue where we used that ∆( , P s ∩ P s ) = 0, for s and s in different connected components of I, by definition of the graph G. This in particular implies that P f = ∅, which stands in contradiction to the assumption that f is analytic on D δ , and hence establishes that f j must be entire. Since j ∈ {1, . . . , n} was arbitrary, the proof of the proposition is complete.
C. The SAC for Σ a,b -nonlinearities The second main result of this section establishes the SAC for Σ a,b -nonlinearities: Proposition 12. Let a, b > 0 and let σ ∈ Σ a,b . Then σ satisfies the SAC.
The proof of Proposition 12 relies on Carlson's theorem, as well as Lemma 12 that we already used to establish the CAC.
Then f is identically 0.
Proof of Proposition 12. Let R > 0 and a finite set {(α s , β s , γ s )} s∈I ∈ C I × R I × R I be such that f := s∈I α s σ(β s · +γ s ) is analytic on C \ D(0, R), and assume w.l.o.g. that α s = 0, β s = 0, for all s ∈ I. We use induction on #(I) to show that f is constant. If #(I) = 0, i.e., I = ∅, then f is given by the empty sum, and so f ≡ 0 is constant. Suppose now that #(I) ≥ 1, and assume that the implication in the definition of the SAC holds for all {(α s , β s , γ s )} s∈I ∈ C I × R I × R I with #(I ) < #(I). First, note that, as the set of poles of f is bounded, its density along any line in C is zero, and so it follows by Lemma 12 that f must be entire. Now, let β max = max{|β s | : s ∈ I}, β min = min{|β s | : s ∈ I}, and set I 1 = {s ∈ I : |β s | = β max }. Then, as the functions σ s := α s σ(β s · +γ s ) do not have poles along ib 2βmax + R, for s ∈ I \ I 1 , the function does not have poles along ib 2βmax + R either. Therefore, as f 1 is ib βmax -periodic and its poles are contained in n∈Z R + ib βmax n + 1 2 , it follows that f 1 is entire. Next, by item (iv) of Lemma 9, there exist M > 0 and η ∈ (0, π) such that |σ(z)| ≤ M 1∧d(z,Pσ) e η|z|/b , for all z ∈ D σ := C \ P σ , where P σ is the set of poles of σ. Now, let P s = β −1 s (P σ − γ s ) be the set of poles of σ s , for s ∈ I 1 , and set P = s∈I1 P s . Then, for z ∈ C \ P ,
Now, as f 1 is analytic on an open neighborhood of the disk D(p, µ/2), it follows by the maximum modulus principle [20,Thm. 10.24] that Hence |f 1 (z)| ≤ M e ηβmax|z|/b , for all z ∈ C. Now, f 1 ib βmax · − f 1 (0) satisfies the assumptions of Proposition 13, and therefore must be identically zero. This establishes that f 1 is constant. But now is entire, and therefore constant, by the induction hypothesis, and so f = f 1 + f 2 is constant. This completes the induction step and concludes the proof of the proposition.

ACKNOWLEDGMENT
The authors would like to thank Charles Fefferman for his insightful comments on an earlier version of the manuscript, which have lead to a significantly improved exposition in Section III and a simplification of the proof of Proposition 8.
Proof of Lemma 3. Let N be the (ρ ; A, B, C)-modification of N with respect to the affine symmetry and adopt the remaining notation of Definition 18. By definition of ρ-modification, there exist nonzero Suppose by way of contradiction that N is (ρ, U )-reducible, for some set of nodes U with common parent set P U . Then, as N itself is irreducible, we must have Suppose first that C U := U ∩ C = ∅, and let D = U \ C. It follows by definition of reducibility that the parent set of all nodes in U = C U ∪ D is P U , and, in particular, as U ∩ C = ∅, we have P U = P . Moreover, there exist nonzero real numbers { β u } u∈CU ∪D and { κ v } v∈P such that for all u ∈ C U ∪ D, as well as nonzero real numbers { α u } u∈CU ∪D and ζ ∈ R such that Specifically, we have Fix an arbitrary u p * ∈ C U and let τ = β p * / β u p * . Now, by replacing β u by τ β u , for u ∈ C U ∪ D, Similarly, by replacing the α u with α u / α u p * , for u ∈ C U ∪ D, and ζ by ζ/ α u p * , we may assume w.l.o.g. that α u p * = 1. With this, (52) reads for t ∈ R. Combining (50) and (53) now yields for t ∈ R. Now let C = (C \ C U ) ∪ {u p ∈ C U \ {u p * } : α p − α p * α u p = 0}, and note that C = ∅.
Indeed, suppose by way of contradiction that C = ∅. Then C U = C and α p − α p * α u p = 0, for all p ∈ {1, . . . , n} \ {p * }, and so (54) reduces to Lemma 7 now implies that N is (ρ, D )-reducible, for some D ⊂ A ∪ B ∪ D, which contradicts the assumption that N is irreducible and thus establishes C = ∅. It now follows from (54) that N admits we have C C, contradicting the assumption that C is a subset of C 0 of least possible cardinality such that N admits a corresponding ρ-modification. This establishes that U ∩ C = ∅.
Recalling (51), we deduce that we must have P U ∩ C = ∅, which further implies U ⊂ W . Next, by definition of ρ-modification, there exist nonzero real numbers {ν w } w∈U such that {ω wu } u∈A = ν w {α u } u∈A , for all w ∈ U . We now write : ω wu − ν w α u = 0, for all w ∈ U }, and respect to set inclusion, and let N be the (ρ ; A ∪ B * , B \ B * , C)-modification of N with respect to the affine symmetry of ρ, and let {κ v } v∈P , {ν w } w∈W , and {µ r } D r=1 be as in Definition 18. We now show that N is non-degenerate. Assume by way of contradiction that N is degenerate and let u * ∈ B \ B * be such that -{w ∈ V : (u * , w) ∈ E} = W and ω wu * − α u * ν w = 0, for all w ∈ W , and -either u * − α u * µ r = 0, for all r ∈ {1, . . . , D}. We claim that then N admits a (ρ ; , Condition (i) of Definition 18 is satisfied by the same affine symmetry (56). Moreover, ω wu * = α u * ν w , for all w ∈ W , and, in the circumstance (b) above, λ For the induction step, suppose that k ≥ 1 and C k−1 (E) ⊂ C k−1 (F ). Then, as every cluster point of C k−1 (E) is a cluster point of C k−1 (F ), we obtain C k (E) ⊂ C k (F ), as desired.
(v) We again proceed by induction on k, starting with the base case k = 1 (the case k = 0 is clear).
First, as every cluster point of E is a cluster point of E ∪ F , we have C 1 (E ∪ F ) ⊃ C 1 (E). Similarly, C 1 (E ∪ F ) ⊃ C 1 (F ), and so C 1 (E ∪ F ) ⊃ C 1 (E) ∪ C 1 (F ). For the reverse inclusion, suppose that z is neither a cluster point of E nor F , i.e., there exists an ε > 0 such that Then (D • (z, ε) \ {z}) ∩ (E ∪ F ) = ∅, and so z is not a cluster point of E ∪ F . Therefore, every cluster point of E ∪ F must be a cluster point of at least one of E or F , establishing C 1 (E ∪ F ) = C 1 (E) ∪ C 1 (F ). For the induction step, assume k ≥ 2 and C k−1 (E ∪ F ) = C k−1 (E) ∪ C k−1 (F ). Then, using the identity for the already established base case, we have as desired. (vi) Letting k = L C (E ∪ F ), we have ∅ = C k (E ∪ F ) = C k (E) ∪ C k (F ), and thus both C k (E) and C k (F ) must be empty. Then L C (E) ≤ k and L C (F ) ≤ k, and thus Next, let k = max{L C (E), L C (F )}. Then L C (E) ≤ k and L C (F ) ≤ k , and so both C k (E) and C k (F ) are empty. Thus, C k (E ∪ F ) = C k (E) ∪ C k (F ) = ∅, and so which together with (57) implies the desired identity.
Proof of Lemma 6. The function g • f can clearly be analytically continued to D, so it remains to show that D has countable complement in C. To this end, let E f = C \ D f and E = C \ D. We claim that if z * is a cluster point of E ∩ D f , then z * ∈ E f . Suppose by way of contradiction that this is not the case, and let (z n ) n∈N be a sequence of distinct elements of E ∩ D f such that z n → z * , for some z * ∈ D f . Now, as f is holomorphic, it is, in particular, continuous on D f , and therefore f (z n ) → f (z * ) as n → ∞. On the other hand, we have f (z n ) ∈ P , by definition of E, and as P is discrete, we deduce that there exists a p * ∈ P such that f (z n ) = p * for all sufficiently large n ∈ N.
Now, as E f is closed and countable by assumption, we have that D f is connected, and therefore it follows by the identity theorem that f (z) = p * , for all z ∈ D f . But this contradicts the assumption that f is non-constant, and thus completes the proof that any cluster point of E ∩ D f is contained in E f .

Now define the compact sets E
We see that E N is finite, for each N ∈ N, for otherwise there would exist a sequence (z n ) n∈N of distinct elements of E N converging to a point z * ∈ C. But then, by the claim above, we would have z * ∈ E f , countable set, as desired.
Proof of Lemma 7. Let I be the set of all I ⊂ J such that j * ∈ I , and there exist real numbers { α s } s∈I such that α j * = 0 and s∈I α s ρ(β s · +γ s ) is constant. Note that J ∈ I by assumption. Let I be a minimal element of I with respect to set inclusion. We then have s∈I α s ρ(β s · +γ s ) = ζ 1, for some ζ ∈ R, so in order to show that (ζ, {( α s , β s , γ s )} s∈I ) is an affine symmetry of ρ, it suffices to establish that there does not exist an I I such that {ρ(β s · +γ s ) : s ∈ I } ∪ {1} is linearly dependent. Suppose by way of contradiction that such an I exists. Assume for now that j * ∈ I , and let α s ∈ R, for s ∈ I , be such that s∈I α s ρ(β s · +γ s ) is constant. Then we must have α j * = 0, for otherwise we would have I ∈ I , contradicting the minimality of I. Therefore, s∈I \{j * } α s ρ(β s · +γ s ) is constant, so we may w.l.o.g. assume j * / ∈ I by replacing I with I \ {j * } if necessary. Now, there exist s * ∈ I , ξ ∈ R, and δ s ∈ R, for s ∈ I \ {s * }, such that ρ(β s * · +γ s * ) = ξ 1 + s∈I \{s * } δ s ρ(β s · +γ s ). Thus, and therefore I \{s * } ∈ I , which again contradicts the minimality of I and concludes the proof.

D. Proof of Lemma 8
Proof. Let {β s } s∈I and Γ be as in the statement of the lemma, and fix a γ = (γ s ) s∈I ∈ Γ. Now, let P σ be the set of poles of σ, and let P s = β −1 s (P σ − γ s ) be the set of poles of σ(β s · +γ s ), for s ∈ M. We define an undirected graph G = (I, E) by setting E = {(s 1 , s 2 ) ∈ I × I : s 1 = s 2 , P s1 ∩ P s2 = ∅} , and claim that G is connected. Suppose by way of contradiction that G is disconnected, and let I = I 1 ∪ I 2 be a partition of I into nonempty subsets that are not mutually connected. By definition of Γ, there exist ζ ∈ R and nonzero real numbers {α s } s∈I such that (ζ, {(α s , β s , γ s )} s∈I ) is an affine symmetry of σ. Now, for j ∈ {1, 2}, let f j = s∈Ij α s σ(β s · +γ s ), and note that f j is meromorphic and its poles are contained in A j := s∈Ij P s . Moreover, A 1 ∩ A 2 = ∅ by the choice of I 1 and I 2 .
Thus, as f := f 1 + f 2 = ζ 1 is constant, it follows that f 1 must be entire, for otherwise f would have poles. It hence follows by the SAC for σ that f 1 must, in fact, be constant. But this violates condition (ii) of Definition 1, so we have reached the desired contradiction, establishing that G is connected.
If y 1 /y 2 is irrational, then where µ denotes the Lebesgue measure on R and A stands for the Lebesgue measure on [0, 1)×[0, 1).
We can thus find (n a , n b ) ∈ Z × Z \ {(0, 0)} such that n b y1 a − n a y2 b = 0. Moreover, in the case when one of y 1 or y 2 is zero, we take (n a , n b ) ∈ {(0, 1), (1, 0)}, and if y 1 and y 2 are both nonzero, we assume w.l.o.g. that n a and n b are coprime. Then, letting K = n a a + in b b, we have x + sy + (n a a)Z × (in b b)Z.
Note that the last expression is strictly positive, as Π\ is closed, Y is compact, and (Π\ )∩Y = ∅.
Proof of Lemma 11. Let c > 0 be arbitrary, and let m 1 , m 2 ∈ Z be such that B := m 1 y 1 = m 2 y 2 .
As the infimum in the last quantity is taken over a finite set of positive numbers, we have η(c) > 0.
Therefore, P ,c is uniformly discrete, for all c > 0, as desired.
Let Ψ(P) denote the set of equivalence classes of P with respect to ∼ Q . We proceed with the proof of the lemma by induction on n := #(Ψ(P)). If n = 0, i.e., P = ∅, then f is given by the empty sum, and so f = 0 is trivially entire, as desired. Next, suppose that n ≥ 1, and that the statement of the lemma holds for all functions parametrized by P = {(α s , β s , γ s )} s∈I with #(Ψ(P )) < n.
Let P f be the (possibly empty) set of poles of f , and assume that ∆( , P f ) = 0, for every line in C.
We show that then f must be entire. To this end, fix an equivalence class P 1 := {(α s , β s , γ s )} s∈I1 ∈ Ψ(I), and note that, as β s1 /β s2 ∈ Q, for all s 1 , s 2 ∈ I 1 , there exists a T ∈ C such that β s T ∈ Z, for all s ∈ I 1 . Next, define g = f ( · + ibT ) − f and let P g ⊂ (P f − ibT ) ∪ P f be its set of poles.