Normative and mechanistic model of an adaptive circuit for efficient encoding and feature extraction

Significance The brain represents information with neural activity patterns. At the periphery, these patterns contain correlations, which are detrimental to stimulus discrimination. We study the peripheral olfactory circuit of the Drosophila larva, which preprocesses neural representations before relaying them downstream. A comprehensive understanding of this preprocessing is, however, lacking. We formulate a principle-driven framework based on similarity-matching and, using neural input activity, derive a circuit model that largely explains the biological circuit’s synaptic organization. It also predicts that inhibitory neurons cluster odors and facilitate decorrelation and normalization of neural representations. If equipped with Hebbian synaptic plasticity, the circuit model autonomously adapts to different environments. Our work provides a comprehensive approach to deciphering the relationship between structure and function in neural circuits.


ORN-LN circuit connectome
We use the synaptic counts from the EM reconstruction obtained by (1).We note here several details regarding our usage of the connectome data.The indices of the Broad Trio (BT) and Broad Duet (BD) are arbitrary, and there is no correspondence between the indices on the left and right sides.Although BT 1 R is of the same type as other BT, its connection vector has a correlation of 0 with other BT in the connectome data.
There are 2 Keystones (KS) in the Drosophila larva.One has its soma positioned on the right of the larva, and the other one on the left.We call them KS L and KS R, respectively.Each KS establishes bilateral connections, i.e, it connects with neurons both on the left and right sides of the larva.Therefore, in terms of connectivity, there are effectively 4 KS connections with other neurons.For example there are 4 ORNs → KS synaptic count vectors.In the paper, when referring to a connectivity vector, call KS L R the connections of a Keystone with the soma positioned on the left, connecting with the neurons of the right.
The Picky 0 predominately receives synaptic input on the dendrite (relatively to its axon), we thus only consider the connections synapsing onto the dendrite.

ORN activity data (Fig. 2A)
We use the average maximal Ca 2+ ∆F/F 0 responses among trials for the activity data as in (2).For the ORN 85c in response to 2-heptanone, and for the ORN 22c in response to methyl salicylate, we only have responses to dilutions ≤ 10 −7 .Because the ORN responses are very similar for dilutions 10 −7 and 10 −8 and are already saturated (for this cell we have responses down to dilutions of 10 −11 ), we set the missing response for dilutions 10 −6 , 10 −5 and 10 −4 as the response for 10 −7 .

Relationship between ORN activity patterns and ORNs → LN synaptic count vectors in the data (Fig. 2)
A. Results.In this section, we provide further evidence that the ORNs → LN synaptic count vectors contain signatures of the ORN activity patterns.In the main text, we show that the ORNs → LN synaptic count vectors w LNtype for the Broad Trio and the Picky 0 significantly correlate with a subset ORN activation patterns.Fig. S4A shows the distribution of p-values for the correlation between each of the ORNs → LN synaptic count vectors w LNtype and all the ORN activity patterns {x (t) } data .In the case when the null hypothesis is true (the activity patterns are not more correlated with the connectivity vector than expected by chance) the distribution of p-values is expected to be flat.Here, however, we observe that for the Broad Trio and for the Picky 0, the distribution of p-values is skewed towards small values, confirming a significant alignment of these connection vectors with this ensemble of ORN activity patterns Our next approach to test whether the synaptic count vectors w LNtype contains signatures of the ORN activity patterns is the following: we investigate how well the ensemble of activity patterns {x (t) } data reconstructs the connectivity vector w LNtype in comparison to reconstructing randomly shuffled versions of w LNtype .As the number of activity patterns x (t) (170) is larger than the dimension of the connectivity vector (21), we add an L1 regularization term on the coefficients of the reconstructions and consider the following lasso linear regression minimization: Where v = [v 0 , ..., v T ] is a vector of coefficients, λ encodes the strength of the regularization, â = a/ ∥a∥, ∥a∥ 2 is the L2 (Euclidean) norm, and ∥a∥ 1 = i abs(a i ) is the L1 norm.Note that we added the constant vector x (0) = 1 since the connectivity and activity vectors are not centered.We make the hypothesis that the connectivity vectors w LNtype from the connectome are significantly more accurately reconstructed by the ORN activity vectors than a shuffled version of w LNtype .We probe the accuracy of the reconstruction by plotting the reconstruction error ∥ ŵLNtype − T t=0 v t x(t) ∥ 2 as a function of the norm of the coefficient vector ∥v∥ 1 .To get different values in this relationship we optimize this objective function for different values of λ for the original w LNtype and for the shuffled one.Figs.S4B to E shows that indeed, for the Broad Trio and for the Picky 0 the reconstruction is significantly better than expected by random.
Finally, to test if the w LNtype are significantly aligned with the ensemble {x (t) } data , we compare the relative cumulative frequency (RCF) of the correlation coefficients r between each w LNtype and all the {x (t) } data with the RCFs of r obtained after randomly shuffling the entries of each w LNtype (Figs.S4F to I).We use the maximum deviation from the mean RCF from the shuffled connection vector to measure significance (Figs.S4J to M) and find that w BT is significantly aligned to {x (t) } data , and that w P0 is at the edge of the 0.05 significance level (Fig. S4N).
All the above evidence corroborates the hypothesis that the ORNs → LN synaptic count vectors are adapted to ORN activity patterns.
B. Methods: RCF distribution of correlation coefficient and significance testing.Given a vector a ∈ R D , we define the mean ā := 1 D D i=1 a i , the centered vector a c := a − ā, and the centered normalized vector a := a c /||a c ||: We call w ∈ R D the centered and normalized ORNs → LN synaptic count vector w.Similarly, we define X ∈ R D×T the centered and normalized ORN activity X data = [x (1) , ..., x (T ) ], where each column vector is centered and normalized.
Each row of the matrix of correlation coefficients depicted in Fig. 2B is given by c := w ⊤ LNtype X. c is used to calculate the true relative cumulative frequency (RCF) of correlation coefficients in Figs.S4F to  We define the random variables w ′ , c ′ and RCF ′ c .w ′ is generated by shuffling the entries of a connectivity vector w: Where σ(i) is a random permutation operator.We define RCF ′ c (x) (Figs.S4F to I, black line) as the mean RCF ′ c (x) arising from all RCFs that come from shuffled w.Next, we define, the maximum negative deviation δ ′ (Figs.S4J to M) random variable as: Finally, we define p-value = Pr(δ ′ ≥ δ true ).The p-value is thus the proportion of RCFs generated with the random shuffling of entries of w that deviate from the mean RCF more than the true RCF.Numerically, these calculations were done by binning the RCF function into 0.02 bins and generating 10000 instances of shuffled w.

Number of aligned dimensions between the activity and connectivity subspaces (Fig. S7)
A. Results.To examine the alignment of the subspace spanned by the four w LNtype 's and the one spanned by the top five PCA directions of {x (t) } data , we define a measure 0 ≤ Γ ≤ 4, which approximately represents the number of aligned dimensions between these 2 subspaces and find Γ ≈ 2. This value significantly deviates from the expected Γ from subspaces generated by 4 and 5 Gaussian random normal vectors in 21 dimensions (p < 10 −4 ) and subspaces generated from the 4 connectivity vectors with shuffled entries and the top 5 PCA directions (p < 0.01) (Fig. S7).Approximately 1 more dimension is significantly aligned between the 2 subspaces than expected by random, supporting the results of Fig. 2H, but there is no complete alignment between the connectivity {w LNtype } and the ORN activity principal subspace.Below we describe the rationale behind the measure Γ.

B. Methods.
Given a Hilbert space of dimension D, we define Ω -a measure of dissimilarity between 2 subspaces S A and S B generated by the matrices of linearly independent K A and K B column vectors: Where P A , P B ∈ R D×D are the orthogonal projectors onto the subspaces S A and S B , respectively, F stands for the Frobenius norm, Tr is the matrix trace, and K X = dim(S X ) is the dimensionality of a subspace S X .In the above equalities, we use the following properties of orthogonal projectors: P 2 A = P A , meaning that they are idempotent.
Nikolai M. Chapochnikov, Cengiz Pehlevan, Dmitri B. Chklovskii Any idempotent matrix has only eigenvalues 1 and 0 and has as many eigenvalues of value 1 as the number of dimensions it projects on.Thus the trace of a projector is its rank, i.e., the dimensionality of the space it projects on.We assume The projection matrix can be obtained thus Intuitively, for two very similar subspaces, the projection P A v of an arbitrary vector v onto S A will be very similar to the projection P B v vector v onto S B , thus P A v ≈ P B v and Ω will be small.Conversely, if the subspaces are very different, the projections P A v and P B v will also be different and Ω will be large.
We now define the more intuitive measure Γ: which is a proxy of the number of aligned dimensions in the two subspaces.Indeed 0 ≤ Γ ≤ min(K A , K B ).For 2 perpendicular subspaces, Γ = 0 and for 2 fully aligned subspaces Γ = min(K A , K B ).
In the main text, we refer to the subspaces spanned by the following matrices: A = [w BT , w BD , w KS , w P0 ] and B is the matrix with the top 5 PCA loading vectors of {x (t) } as columns,

Hierarchical clustering for plotting Fig. 3D
To plot Fig. 3D, we ordered the correlation matrix using hierarchical clustering.For that we used the Python function scipy.cluster.hierarchy.linkage with the options method='average', optimal_ordering=True (3).

Optimization problem (Eq. (4) in the main text)
A. Description.We postulate the following minimax optimization problem: Where ∥•∥ 2 F is the square of the matrix Frobenius (Euclidean) norm, X, Y ∈ R D×T , Z ∈ R K×T with D the number of ORNs (21 for this olfactory circuit), K the number of LNs, T the number of data (sample) points, ρ and γ positive unitless parameters, u a unit with the physical dimension as X, Y, and Z (e.g., spikes • s −1 ) (dropped for simplicity in the main text).X, Y, and Z represent the activity of ORN somas, ORN axons, and LNs, respectively.We can interpret X as all the discretized activity of ORNs up to a certain point in their lifetime.We set γ = 1 in the main text, as this parameter does not alter the computation, and only linearly scales synaptic weights and Z.We have kept it in all derivations here in the supplement.
The optimization problem Eq. (S12) leads to the Linear Circuit (LC) model.Adding the nonnegativity constraints on Y and Z (Y ≥ 0 and Z ≥ 0) leads to the NonNegative Circuit (NNC) model.
We expand the optimization function in Eq. (S12).Using the property that ∥X∥ Where we dropped the X ⊤ X term because it does not influence the solution of the optimization problem.
B. Equivalence of scaling X and ρ.Here, we show that scaling X is equivalent to scaling ρ in the optimization.It is easy to see that the transformation X → aX, Y → aY and ρ → ρ/a (for a ̸ = 0) only scales the objective function, which does not affect the optima of the optimization, i.e., this transformation is a symmetry of the optimization.Indeed: Let us explore the consequence of this symmetry.The solution Y * of the optimization is a function of X and ρ, thus we can define a function f such that: Y * = f (X, ρ): The symmetry implies: As a consequence, first, we have that: Second, let's define b = a −1 and X ′ = aX = X/b.Then, by substituting the variables we get: In summary, we have the following properties: This means performing an optimization with an input aX, is equivalent to doing the optimization with input X and parameter aρ, and finally multiplying the obtained Y * by a.
It is worth noting though, that for a circuit with fixed W and M, scaling an input x by a factor a, simply scales the output y by the same factor a, since it is a linear transformation, at least for the circuit without the nonnegative constraints.
C. Equivalence of scaling Z and γ.We can see that the transformation Z → aZ and γ → γ/a does not change the objective function, i.e., this transformation is a symmetry of the optimization.Indeed:

Offline solution/computation of the optimization problem
A. Solution for the LC (Eq.(3a), Eq. (3b), Eq. (3c) in the main text).Here we describe the solution of the optimization problem Eq. (S14), without any constraints on Y and Z.As we show below, this solution can also be found by a circuit model, with the same architecture as the olfactory circuit under study.The situation without constraints on Y and Z corresponds to the Linear Circuit (LC) model.Understanding the solution of the optimization problem allows us to understand the computation performed by such a circuit model.We use the singular value decomposition (SVD) for X, Y, and Z: SY , SZ only have values on the diagonal.We call S ∈ R T ×T the diagonal square matrix corresponding to the rectangular matrix S, with padded zeros.Only the first D columns in V X and V Y and the first K in V Z contain relevant information about X, Y, and Z, respectively.The left singular vectors U X , U Y , and U Z are also the principal directions of the uncentered PCA of X, Y, and Z, respectively.Whereas the values on the diagonal of SX , SY , and SZ are the square root of the variances of the corresponding uncentered PCA directions.For b -a variable of the optimization problem, we call b * an optimal value (solution).In the results section of the main text, we dropped the star symbol * .Nikolai M. Chapochnikov, Cengiz Pehlevan, Dmitri B. Chklovskii In the following we prove that Y * and Z * , the optima in Y and Z in the optimization problem Eq. (S14), are given by: where A + the Moore-Penrose pseudo-inverse of A and S * Y |K ∈ R K×T is the matrix with the first K row of S * Y .These equations lead to relations Eq. (3a), Eq. (3b), Eq. (3c) in the main text, where the relations are written in terms of PCA variances rather than singular values.The relationship between a singular value s and the corresponding PCA variance σ 2 is s 2 /T = σ 2 .
In other words, writing Y * = FX, we have that X being a diagonal matrix.This signifies that the linear transformation F does not perform any rotation of the input.
In particular, we find that U * Y = U X , meaning that the ORN soma input X and the ORN axon output Y * have the same left singular vectors (although the order can be different).The left singular vectors correspond to the directions of uncentered PCA.Also, we find that V * Y = V * Z = V X , meaning that the right singular vectors of X, Y * and Z * are the same (although, again, their order can be different).The i th right singular vector corresponds to the neural activity in the i th singular (or uncentered PCA) direction.The equality between the right singular directions in X and Y * means that the neural activity in a singular direction u X,i at the level of the ORN somas and at the level of the ORN axons is the same up to a multiplication factor.Similarly, the neural activity in the direction u Z,i at the level of LNs is proportional to the activity in the direction u X,i at the level of ORN somas or axons.Thus, when looking at the neural activity in the directions of a left singular vector i (PCA direction i) at the level of ORN soma, axons, and LNs, the activity is the same up to a multiplication factor.The multiplication factor is set by the ratios between the corresponding singular values (or PCA variances).
This explicit expressions for s * Y and s * Z are: The behavior of s * Y is such: Note that because Z only appears as Z ⊤ Z in the optimization problem Eq. (S14), U * Z is a degree of freedom of the optimization.Thus, for {Y * , Z * , W * , M * } a solution of the optimization, {Y * , QZ * , W * Q ⊤ , QM * Q ⊤ } is a solution as well, where Q ∈ R K×K is an orthogonal matrix.Consequently, there is a manifold of W * , M * , and Z * that satisfies the optimization for the LC.

B. Proof.
For convenience, we copy here the optimization problem Eq. (S14): Where the objective function OF(Y, Z) is: We first find the optimum in Y of this objective function by taking the partial derivative of OF(Y, Z) with respect to Y and equating it to 0. We thus obtain the expression for Y: where I T is the identity matrix of dimension T .What is the meaning of this equality?Since this equality is obtained by finding the optima in Y, this equation gives the expression for the axonal output Y for an arbitrary input X and LN activity Z.To intuitively understand this expression, we imagine that Z is of dimension 1 × T , which corresponds to just 1 LN.We use the SVD expansion of Z: Where U Z = 1 because it is a orthogonal matrix of dimension 1, SZ is of dimension 1 × T , V Z is of dimension T × T and we have that the first column of V Z is [z (1) , ..., z (T ) ] ⊤ /s Z,1 , and s Z,1 is the norm of Z, i.e., s Z,1 = ( T t=1 (z (t) ) 2 ) 1/2 .We now put this expression of Z into Eq.(S33): where ( , where all the diagonal elements are 1, apart from the first one which is: (1 + γ 2 /(T u 2 )s 2 Z,1 ) −1 .This implies that the activity in Y is the same as the activity in X, apart from the directions of the K first right singular vectors of Z.In those directions it is diminished by the factors In other words, the directions of activity (in terms of right singular vectors) that are the most dampened in Y in comparison to X, are those which are the most aligned/correlated with the activity in Z.
Next, we replace the solution Eq. (S33) for Y into the original optimization problem Eq. (S14), obtaining the equivalent optimization problem: Next we replace X and Z by their SVD, use the property of the trace Tr(AB) = Tr(BA) and the property of orthogonal matrices UU ⊤ = U ⊤ U = I: Where, for the last equivalence we used the fact that multiplying the objective function by a constant (here u −2 ) does not alter the optimization problem.Since U Z does not appear in the minimization, it is a free parameter, i.e., it can be any orthogonal matrix.For fixed S Z , only the first term in the trace needs to be minimized.One can show that the optimal a i and b i are the ordered singular values of A and B, respectively.Thus, choosing V Z = V X will give us the lower bound of that inequality.Indeed: Where s X,i and s Z,i are the values on the diagonal of S X and S Z , respectively.Thus, the highest singular values of giving us the lower bound of the von Neumann inequality.The optimization problem Eq. (S40) can now be simplified to: Each s Z,i can be minimized independently.By construction of SVD, we already have that s Z,i = 0 for i > K.We thus consider 1 ≤ i ≤ K. To simplify notation, we drop the index i.We take the derivative of OF({s Z,i }) in Eq. (S46) with respect to s Z,i and equate it to 0 (we drop the index i for convenience of notation): This leads to, considering that singular values are positive: We can now use the obtained solution for Z to find the solution for Y.We replace X and Z by their SVD in relation Eq. (S33) and use that V * Z = V X : Although the right-hand side of Eq. (S54) has the form of [orthogonal matrix] × [diagonal matrix] × [orthogonal matrix], it is not strictly the normal SVD expression, because the values on the diagonal of SX (I T + γ 2 /(T u 2 )S * 2 Z ) −1 are not necessarily in decreasing order.Equating the terms on the left and right sides we obtain U The last equality gives: Thus, for i > K, we have s * Y,i = s X,i (since s Z,i = 0), whereas for i ≤ K: s * Y,i = γ ρ s * Z,i (using relation Eq. (S50) to replace s X ).The relation analogous to Eq. (S50) is: Note that the resulting decomposition of Y with Y = U X SY V X is equal to the usual SVD decomposition of Y, up to the order of the singular values and singular directions.In summary, for i ≤ K: and for i > K: This ends the derivation.

C. Computation in LNs and relationship between the NNC and SNMF.
To understand the computation at the level of LNs, we consider the optimization problem from the aspect of Z, which represents LN activity.We copy here the original optimization problem Eq. (S12), while dropping the 1/T 2 factor in front: We can isolate the maximization over Z: This means, that for a given Y, the optimal Z can be found with the optimization problem: Where we dropped the factor ρ 2 /(4u 2 ), which does influence the optimization and also changes the maximization to a minimization by changing the sign.This corresponds to the original, most simple similarity-matching optimization problem, that has been extensively studied (4,5).
If we now add the nonnegativity constraint on Z, the LN activity, one gets: Which is the Symmetric Nonnegative Matrix Factorization (SNMF) optimization problem (6).It has been shown that in this situation the activity in Z * corresponds to the soft clustering memberships of clusters found in Y, as seen in Figs.5A, C, E, and 6A.SNMF corresponds to soft K-means clustering (6).

Online algorithm and its implementation by a neural circuit with ORN-LN architecture
Here we show that a neuron circuit model with the ORN-LN architecture (Fig. 1A) can solve the optimization problem Eq. (S14).We convey two messages.First, given an input X, specific synaptic weights will allow the circuit to output the optimal Y * and Z * .Second, the circuit is capable of finding on its own (i.e., in an unsupervised manner) the optimal synaptic weights to perform the computation of the optimization problem.For that, it is sufficient that synapses follow Hebbian synaptic plasticity rules.We derive Eq. ( 5), Eq. ( 6), Eq. ( 7), and Eq. ( 8) from the main text.
A. Circuit equations for the LC (Eq.(6) in the main text).We first derive the circuit equations for the LC (Eq.( 6) in the main text).For convenience, we copy the optimization problem Eq. (S14) here: We then first introduce the unitless variables W ∈ R D×K and M ∈ R K×K : and perform the Hubbard-Stratonovich transform on the optimization problem Eq. (S14) (5): where the objective function is now: Y, Z, W, M)/∂W = 0) and that the solution of the optimization in M of Eq. ( S68) is M = ZZ ⊤ /(T u 2 ) (by solving ∂ OF(Y, Z, W, M)/∂M = 0).Then, putting W = YZ ⊤ /(T u 2 ) and M = ZZ ⊤ /(T u 2 ) into the optimization problem in Eq. (S68)-Eq.(S69), we get the original optimization problem Eq. (S14).
We then rewrite the objective function Eq. (S69) in vector notation, with each sample point written out separately: Giving us the optimization problem: min Given the solution Y * and Z * to the optimization problem Eq. (S14), solutions for W and M are W * = Y * Z * ⊤ /(T u 2 ) and M * = Z * Z * ⊤ /(T u 2 ), which can be put in the optimization problem Eq. (S71), giving us the following new optimization problem: min where: We can then perform the optimization of each y (t) , z (t) .At a given sample index t, the minimum in y (t) and the maximum in z (t) can be found by taking a derivative of the objective function Eq. (S73) with respect to y (t) and z (t) , respectively: The minimum in y (t) and the maximum in z (t) can be reached by gradient descent and ascent, respectively.We can thus write a system of differential equations whose steady-state correspond to the optima in y (t) and z (t) : Where τ is the local time evolution variable.We rearranged the parameters so that the equation form is the same as in Eq. ( 6) in the main text, which does not change the final steady-state of the equations.Thus, we obtained equations to find the optima ȳ(t) and z(t) of the objective function.As explained in the main text, these equations can directly be mapped onto the dynamics of the ORN-LN neural circuit.Note that for a given input X there are infinitely many solutions for Z (see Eq. (S26), Eq. ( S28c)), i.e., for any solution Z * , QZ * is also a solution, where Q is an orthogonal matrix.Therefore changing W * to W * Q ⊤ and M * to QM * Q ⊤ still gives a circuit that solves the original optimization problem.It is possible to construct more circuits that implement the same computations, however, that would require having feedforward ORNs → LN connectivity W not proportional to the feedback LN → ORNs, or LN-LN connections (i.e., M) not being symmetric.Here, we focus our analysis on circuits with ORNs → LN connectivity proportional to LN → ORNs and to M symmetric.This is reasonable given the data in the connectome (Fig. 4A, Fig. S2A).
B. Circuit equations for the NNC (Eq.(7) in the main text).Here we derive the circuit equations for the NNC (Eq.( 7) in the main text).In the case of the NNC, we start with the same optimization problem Eq. (S14), but adding the nonnegative constraints on Y and Z: Following the same steps as above we arrive at the optimization problem similar to Eq. (S72) but with nonnegative constraints: min with the objective function OF as in Eq. (S73).
Here too, we perform the optimization for each y (t) , z (t) .However, because of the nonnegativity constraints, the optima for y (t) and z (t) are not to be found where the derivatives Eq. (S74) are zeros.We can, however, reach the optima by a projected gradient descent: where the max is performed component-wise.Here too, W * and M * are found by finding Y * and Z * in the optimization problem Eq. (S76), and setting Because of the nonnegativity constraint on Y and Z in the NNC, there is no more degree of freedom in Z as in the LC.
C. Circuit model with Hebbian synaptic update rules (Eq.(8) in the main text).We now show that the circuit can also reach the optimal synaptic weights (W * and M * ) via Hebbian plasticity.We derive the Eq. ( 8) in the main text.The equations are the same for the LC and NNC, therefore we just show the LC here.We start the derivation from Eq. (S71) and Eq.(S70).Next, we exchange the order of the min Y max Z with max W min M (5), giving us the optimization problem: max We now perform the optimization of the 4 variables separately: y (t) , z (t) , W, and M. We alternate the optimization in {y (t) , z (t) } and in {W, M}, which corresponds to the "online setting" for this optimization problem: as a new sample (i.e., stimulus, input) x (t) arrives, we find the steady-state values of z (t) and y (t) with the current values W (t) and M (t) and update W (t) and M (t) to W (t+1) and M (t+1) before the arrival of the next input sample x (t+1) .Biologically, this can be seen as first a convergence of neural spiking rates or neural electrical potential encoded Nikolai M. Chapochnikov, Cengiz Pehlevan, Dmitri B. Chklovskii through the variables y (t) and z (t) , and second a synaptic weight update based on those steady-state activity values.The steady-state y (t) and z (t) are found in the same way as above, and give us the same equations as Eq.(S75) for the LC and Eq.(S78) for the NNC: and We only then need to derive the updates for the variables W and M. By construction, the offline solution for W and M is given by Eq. (S67).Online -we compute a new W (t) and M (t) after each sample x (t) is presented and the steady-state solutions of Eq. (S80) or Eq.(S81) ȳ(t) and z(t) are found.The gradient descent (respectively ascent) steps on these variables give the following updates (e.g., ( 5)): where η (t) and ν are parameters of the gradient descent/ascent, and where ȳ(t) and z(t) are the steady-state solutions of Eq. (S80) (or Eq. (S81)) for given W (t) and M (t) .This indeed corresponds to local Hebbian synaptic update rules.Choosing η (t) and ν appropriately will lead to Eq. ( 8) from the main text.These synaptic update equations are the same for the LC and the NNC.

D. Steady-state solution of the circuit dynamical equations for the LC and stability.
We can directly find the steady-state solution of the circuit dynamics equations Eq. (S75) of the LC by setting the derivatives to 0. For M invertible, the steady-state is (after dropping the index (t) and the * for simplicity of notation): As mentioned above, the steady-state for y does not depend on γ, whereas z does depend on γ.Note that the transformation from x to ȳ is symmetric: indeed, writing ȳ = Fx, we have that F = F ⊤ .This means that the transformation is diagonalizable.We indeed showed in section 7 above that the basis in which the transformation is the uncentered PCA basis of X.
Here we show that the fixed point of Eq. (S75) is stable if W is maximum rank and M positive definite.We first rewrite the dynamical system: This system has a unique stable fixed point if and only if A has only positive eigenvalues.To investigate under which conditions this is the case, we write the eigenvalue equations for A: We consider the case when λ ̸ = 1, as we are interested to see if λ could potentially be negative.
Assuming that W is full rank, the matrix W ⊤ W on the left-hand side of the equation has only positive eigenvalues.The above equation does not have any solution z ̸ = 0 for λ < 0 if M is positive definite (which is true when constructed as the autocorrelator of z).Thus, W full rank and M positive definite are sufficient conditions for the dynamical system to always converge to a stable fixed point.

Effect of ρ and γ on the computation and the circuit
Having the expression for the optimal outputs Y * and Z * (section 7), we can describe the effect of ρ and γ on the computation.
For ρ → 0, based on Eq. (S57) we get that s Z,i → 0 and thus Z * → 0, leading to Y * = X, which means that the output is equal to the input and no inhibition is taking place.
Conversely, for ρ → ∞, according to Eq. (S58) the lowest D − K singular values of Y * remain the same, whereas top K drop to 0, i.e., the top K singular values are totally suppressed.
According to Eq. (S58), changing γ has no effect on the output Y * .This is because, as shown above in section 6C, scaling γ only scales Z * , but does not alter the optimization.There is a drastic difference between setting γ = 0 and taking the limit γ → 0. In the case of the limit of γ towards 0, it will increase the elements of Z * towards infinity, but will not change the value of Y * .On the other hand, setting γ to 0 in the original optimization problem Eq. (S14) removes all the terms in Z and we get Y * = X, because there is no inhibition.
Next, we inspect the scenario where γ → 0 and ρ → 0 such that γ/ρ = C where C is a constant.To understand this scenario we make the substitution γ = ρC in Eq. (S57)-Eq.(S61).For i ≤ K: and for i > K nothing changes.Now taking the limit ρ → 0 (which automatically takes the limit γ → 0 since they are related in the constant C ), we get: This means that Y * = X, i.e., there is no inhibition, but there is still activity in the LNs (Z).Physiologically this corresponds to the scenario where the is no feedback connections from LNs to ORNs, only feedforward connection from ORNs to LNs -this is then a pure feedforward circuit.What is the consequence in terms of synaptic weights matrices W * and M * for this situation?By definition, these matrices are given by relations Eq. (S67), repeated here: For the LC, we show below that synaptic weight vectors of W * span the same subspace as the first K singular vectors (uncentered PCA directions) as X.This is still the case here.Similarly, there is no difference in terms of LN-LN connection weights M * in this particular scenario in comparison to the general one.Similarly, for the NNC case, there is no difference from the general case.

Circuit dynamics equations contains two effective parameters (ρ and γ)
Here we show that, in its general form, the system of differential equations describing the olfactory circuit has just two effective parameters and can be reduced to Eq. ( 6) (or Eq. ( 7)) from the main text.Without a lack of generality the system of differential equations yields: Where we imposed that x = y in the case of no LN activity (i.e., z = 0), that a > 0, b > 0, c > 0, d > 0, and that all ORNs have similar response properties (i.e., the same coefficient in front of each x i and y i ).To extract the effective parameters, we compute the steady-state solution of Eq. (S95) by setting the derivatives to zero.We find the following steady-states for y and z, for invertible M: This shows that we only have two degrees of freedom: bd ac and d c .We define ρ 2 := bd ac and γ 2 := c d ρ 2 = b a .This gives us: Now replacing these definitions into the original Eq.(S95) we get: By setting τ y := τ 1 /a, τ z := τ 2 /c we obtain Eq. ( 6) from the main text (when Thus, scaling x, W 1 , W 2 and M is equivalent to controlling just two effective parameter γ and ρ.Scaling τ y and τ z does not influence the steady-state solutions.
Increasing ρ increases the weight of feedforward connections, making the LN activity and the feedback inhibition stronger.Increasing γ simultaneously increases the feedback connection strength and decreases the feedforward connection strength.Changing γ influences the steady-state solution z but not ȳ.Thus, a manifold of circuits leads to the same steady-state output ȳ.In addition, the same differential equations can be implemented by different circuits.For example, multiplying a differential equation by a parameter does not alter the final steady-state, but gives yet another implementation to the circuit as a scaling of the synaptic weights and of the time constant.

Relationship between W and M (Eq. (2) in the main text)
Here we prove the relationship ρ 2 /γ 2 W ⊤ W = M 2 = M ⊤ M for the LC.In this section, for simplicity we dropped the * from M * , W * , Y * , Z * , and the related variables.
One way to obtain this relationship is to start from the circuit dynamics (Eq.(S75)).The steady-state for z(t) is: Multiplying by z(t)⊤ on both sides, taking the average over all samples t, and using the definition of W and M (Eq.(S67)): An alternative approach to derive the above relationship is to use the definition of W and M (Eq.(S67)) and the SVD decomposition of X, Y, and Z.We write out W and M: Where we used that V X = V Y = V Z and U X = U Y are orthogonal matrices and that s Y,i = γ ρ s Z,i for i ≤ K and s Z,i = 0 for i > K.We call ŜZ ∈ R K×K the square submatrix of the rectangular matrix S Z ∈ R K×N .U X|K ∈ R D×K is the submatrix with the first K columns of U X .Thus: Since M is a symmetric matrix, i.e., M = M ⊤ , this relationship can also be written as: This ends the derivation.
Taking the unique square root on both sides gives the relationship Eq. ( 2) in the results section of the main text.
A. Consequence of the matrix relationship.We can inspect the consequence of this relation on an element-per-element basis.We call m i the i th column of M, which corresponds to the vector of synaptic weight from LN i onto all the other LNs.We get that: Where θ w ij is the angle between the vectors w i and w j , θ m ij is the angle between m i and m j ; and where we used the scalar product property.
For the elements on the diagonal (i = j), we get: ∥w i ∥ = γ/ρ∥m i ∥.This implies that ∥w i ∥/∥m i ∥ = const, meaning that the ratio between the magnitude of the ORNs → LN and LNs → LN synaptic weight vectors is the same at each LN.We call magnitude the square root of the sum of the squared connection weights, corresponding to the length of the synaptic weight vector and a proxy for the total synaptic strength of a synaptic weight vector.
Feeding ∥w i ∥ = γ/ρ∥m i ∥ into Eq.(S109), we get that θ w ij = θ m ij , meaning that the angle between w i and w j is the same as the angle between m i and m j .In other words ∡(w i , w j ) = ∡(m i , m j ), where ∡(a, b) is the angle between two vectors a and b.Thus 2 LNs with a similar (different) connectivity pattern with the ORNs have a similar (different) connectivity pattern with LNs.

Relationship between ORN activity and ORN-LN connectivity (Eq. (1) in the main text)
In this section, for simplicity we dropped the * from M * , W * , Y * , Z * , and the related variables.Based on the expressions for W and M (Eq.(S103) and Eq.(S104)) we can write W as: Where we used that U ⊤ Z U Z = I K .Where U X|K ∈ R D×K is the submatrix with the first K columns of U X .As stated above, U Z is a free parameter and could be any orthogonal matrix.
In the case of a single LN, W is a column vector and corresponds to the first left eigenvector of X.For multiple LNs, the column vectors of W span the same subspace as the top K loading vectors of X, U X|K .However, because of the multiplication on the right by U ⊤ Z M, the connections vectors do not necessarily correspond to specific PCA directions and are not orthogonal, but only span the top K-dimensional PCA subspace.Thus, this relation above gives us the relationship between the left eigenvectors of X, W, and M.

Decrease of the spread of PCA variances in ORN axons vs soma in the LC
Here we show that the coefficient of variation (CV σ , i.e., the spread) of PCA variances ({σ 2 i }) is smaller at the ORN output (axons) than at the input (somas) in the LC model when the number of ORNs (D) equal to the number of LN (K), i.e., D = K.In that case, we have σ X = σ Y (1 + ρ 2 σ 2 Y ).As we have shown, for small σ X , we have σ Y ≈ σ X and for large σ X , we have σ Y ≈ 3 σ X /ρ 2 .We call X a positive random variable, representing the variances.We will show that for a 0 < α < 1, CV(X) ≥ CV(X α ), which mimics the case we have.
The last inequality can be proven by using Hölder's inequality twice.First: which leads to: and second: which leads to: Combining inequalities Eq. (S117) and Eq.(S119) proves inequality Eq. (S115) and ends the proof.Thus, for an LC with the same number of LNs as ORNs (i.e., K = D), the computation in the LC decreases the spread of {σ 2 Y,i } relatively to the spread of {σ 2 X,i }.Although for K < D, the variance of only the top K PCA direction is decreased, in most cases the computation in the LC also leads to a decrease of CV σ (Fig. 6D).

Numerical simulations of the LC and NNC
A. Numerical simulation of the LC offline.For the LC, we have the theoretical solution, so numerical simulations are not necessary to obtain the optima Y * and Z * .Also, there is a manifold of solutions of Z * , W * , and M * .However, to confirm the theoretical results, we simulate the LC too.For that, we use the optimization problem that depends on Z only (Eq.(S40), with γ = 1 and u dropped): We use an algorithm similar to (7).
Algorithm 1 Finding the minimum of 2: Inputs: 3: X ∈ R D×T 4: K > 0: the number of dimensions of Z 5: ρ > 0: a constant encoding the strength of the inhibition by the LNs 6: 0 < σ < 1: acceptance parameter (usually 0.1) 7: α 0 > 0: initial gradient step coefficient (usually 1 or 10) 8: 0 < β < 1: reduction factor (usually 0.1 or 0.5) 9: 0 < µ ≪ 1: tolerance parameter (usually ≈ 10 −6 ) 10: n cycle ≈ 500: number of steps after which one decreases the value of α 0 11: Initialize: ▷ Find a potential new Z through a gradient descent step Where ⊙ is an element-wise multiplication and the "sum" adds all the elements of the matrix.In the inner repeat loop of the algorithm, it can happen that because of limited numerical precision, no α is small enough to make a decrease in f (i.e., satisfy the condition ∆f < ∆f ), in that case, the inner and outer repeat loops stop and the current Z (not Z new ) is outputted.
∇f (Z) is given by: Finally, the expression for Y is (Eq.(S33)): B. Numerical simulation of the NNC offline.For the NNC, we do not have the analytical expressions of Y and Z.To optimize the objective function, we perform alternating gradient descent/ascent steps on Y and Z, respectively.We start from the expanded expression of the optimization problem Eq. (S14) with nonnegativity constraints (with γ = 1 and u dropped): , Z). 2: Inputs: 3: X ∈ R D×T 4: K > 0: the number of dimensions of Z 5: ρ > 0: a constant encoding the strength of the inhibition by the LNs 6: 0 < σ < 1: acceptance parameter (usually 0.1) 7: α 0 > 0: initial gradient step coefficient (usually 1 or 10) 8: 0 < β < 1: reduction factor (usually 0.1 or 0.5) 9: 0 < µ ≪ 1: tolerance parameter (usually ≈ 10 −6 ) 10: n cycle ≈ 500: number of steps after which one decreases the value of α 0 11: Initialize: Where [a] + = max[0, A], is an element-wise rectification.In the case of the LC, this algorithm holds as well, with all the rectifications [.] + removed and the "abs" removed from the initiation.If in either of the inner repeat loops, no α is small enough to make a decrease/increase in f (i.e., satisfy the condition ∆f < ∆f or ∆f > ∆f ), the iterations stop and the current Y and Z are the output of the algorithm.
The gradients of f (Y, Z) are: C. Numerical simulation of the circuits online.For Fig. S17, we simulated the circuit dynamics for a given W, M, and X.For that purpose, to find y * and z * , we performed gradient descent steps based on the discretized Eq. (S75) for the LC or Eq.(S78) for the NNC (correspondingly Eq. ( 6) and Eq. ( 7) in the main text).
15. Simulation of the circuit with synaptic weights from the connectome (Fig. S15) We investigate the computation performed by a nonnegative ORN-LN circuit where the synaptic weights are set proportionally to the synaptic counts from the connectome (1) (Section 15).Given that we have a connectome for the left and right sides of the larva, there are two such circuit models.We call this model NNC-conn.It has 8 LNs.Several aspects are worth mentioning regarding this model.Two main reasons might make the results of these simulations not entirely trustworthy and the computation performed by this circuit might not necessarily represent the true computation in the real biological circuit.First, because several physiological parameters are not available and are guessed: neuronal leaks, the ratios of the synaptic strengths of ORNs → LNs vs LNs → ORNs vs LNs → LNs.Second, the observed computation of a circuit strongly depends on the input it receives.Since we do not know the true input statistics to which this circuit model is adapted to, the observed computation might be misleading.This simulation is rather a control of whether the predictions of the NNC model are somewhat compatible with the potential computation done using the synaptic counts.
To simulate this circuit, we thus first need to choose a scaling for the synaptic counts found in the connectome, in order to convert them to synaptic weights (note that the circuit contains both excitatory and inhibitory synapses, and their relative strength is unknown).To perform that transformation, we divide the ORN → LN counts by 80, divide the LN → ORN counts by 30, and divide the LN →LN counts by 60.These numbers are roughly the average of the norms of the columns of the matrices W ff , W fb , and M, respectively.We choose these scaling factors to ensure that the synaptic strengths are somewhat comparable between different directions of activity flow (i.e., ORNs to LN, LNs to ORN, LNs to LN).Next, we need to choose values for the diagonal of M, which correspond to the neural leaks of LNs and which are not known.We set those values to the maximum of each column of M, which makes the neural leak (i.e., self-inhibition) comparable to the inhibition coming from other LNs.We then simulate this circuit for the left and right sides of the larva.In Fig. S15, we show the average between the left and right side for ORN activity, and we show the LN activity separately for the left and right.We use the same equations as for the NNC to simulate the circuit (Eq.(S78), Eq. ( 7) in the main text), having adapted the formulas to incorporate different feedforward and feedback connectivity.
Finally, given the multidimensional space of unknown parameters, different modes of computation could arise in different regions of the parameter space.These modes of computation might not correspond to the true computation of the actual biological circuit.To be more accurate, this bottom-up approach would require an in-depth investigation of a large parameter space to see what different modes of operation this circuit could have and then evaluate their plausibility.More physiological recordings of this circuit would allow making such bottom-up models more reliable.

Optimization problem for circuit without LN-LN connections (Fig. S18)
The following optimization problem provides a circuit without LN-LN connections (8): Where the variables and parameters are the same as for the optimization problem Eq. (S14).Z ⊤ ZZ ⊤ Z has been replaced with Z ⊤ Z and parameters arranged accordingly.It can help to see that this objective function implements whitening by rewriting it as follow: Where Z ⊤ Z acts like a Lagrange multiplier.
A. Online solution.Following a similar approach as with the optimization problem Eq. (S14), we find, analogously to Eq. (S80) that the online algorithm that can be implemented by a circuit model is: As one can see, there is no interactions between LNs.The synaptic updates are (see Eq. (S82)): Nikolai M. Chapochnikov, Cengiz Pehlevan, Dmitri B. Chklovskii Similar to Eq. (S81), in the nonnegative version of the optimization problem Eq. (S127), the circuit equations become B. Circuit computation.Using similar methods as above, we find that the solution of the optimization problem Eq. ( S127) is: This means that in the output Y * , all the top K (as the number of LNs) PCA variances become equal to u 2 ρ 2 or stay the same as in the input X if the original variance is smaller than u 2 ρ 2 .If K ≥ D (i.e., the number of LNs is equal or more than the number of input neurons) and all original variances are larger than u 2 ρ 2 , then the output Y * will be white: all variance will be u 2 ρ 2 .We have used the relationship between the PCA variance σ 2 and the singular value s: C. Numerical simulations.Numerical simulations for these objective functions are performed using the same methodology as for the original optimization problem.(A) Distribution of p-values arising from the significance testing in Fig. 2B.We observe that for the Broad Trio and Picky 0, the distribution of p-values is skewed towards small values, confirming that the significant correlations found are not solely a result of randomness and multiple comparisons.
(B-E) Black line: Reconstruction error ∥ ŵLNtype − T t=0 vt x(t) ∥2 as a function of the L1 norm of the coefficient vector v (see text for details), gray lines: same as black but for a shuffled ŵLNtype.Red line: proportion of randomly shuffled ŵLNtype that have a smaller reconstruction error for the same norm of v. Broad Trio and Picky 0 have significantly better reconstructions as shown by the small p-values for an extended range of ∥v∥1.(F-I) Red line: relative cumulative frequency (RCF) of the correlation coefficients r (from each row of Fig. 2B) between each wLNtype and all the {x (t) }data.In other words, the RCF in a normalized cumulative histogram of all the correlation coefficients in one row of Fig. 2B.Black line and gray band: mean ± SD from the RCFs generated by 50,000 instances of shuffling the entries of wBT.Blue line: normal fit to the shuffled distribution.Apart from the LN type P0, the distribution arising from shuffling is quite close to normal.This can be explained by the fact that P0 has sparse connectivity.Bin size: 0.004.(J-M) Same as (F-I) with the mean RCF subtracted.We define the maximum deviation as the maximum negative difference between the true and the mean RCF of correlation coefficients.
(N) RCF maximum deviation and log 10 of the multi-comparison adjusted p-values (9) for each of the four ORNs → LNtype synaptic count vectors wLNtype.*: significance at 5% FDR (false discovery rate).(A) Same as Fig. 2B, for all the wLN and wLNtype and with all the odors labeled.The label "Broad T" corresponds to the average ORNs → LN synaptic count vector for all Broad Trio LNs; same for "Broad D", "Keystone", and "Picky 0 [dend]".These correspond to the ones shown in Fig. 2B.The individual LNs have similar correlation patterns as the average ones.Same odor order.(G) Same as (F), for NNC-5.The small number of significant points in (E-G) results from the higher number of hypothesis tests, which decreases the adjusted p-values in the FDR multi-hypothesis testing framework.(H) Same as Fig. 3A, for NNC-5.This figure complements Fig. 6 to characterize the difference in PCA directions of the odor representations at ORN somas ({x (t) }data) vs. at ORN axons ({y (t) }) in the LC, NNC, and NNC-conn models.We consider models LC-1, LC-8, NNC-1, NNC-8, NNC-conn, i.e., LC and NNC models with K = 1 and K = 8 LNs, as well as the NNC model constructed based on the synaptic counts in the connectome.{uX,i} and {uY,i} are the PCA directions of the uncentered activity at the somas ({x (t) }data) and axons ({y (t) }), respectively.There are D = 21 PCA directions (as the number of ORNs).To quantify the change of PCA directions, we calculate the scalar products between {uX,i} and {uY,i}.A scalar product of 1 (or -1), means that the direction is exactly the same; 0: means that they are perpendicular.
Because PCA direction vectors are determined up to the sign, we show the absolute value of the scalar product.Change of PCA directions has implications on the stimuli representations.If the PCA directions are strongly altered, it could mean the cloud of representation in the neural space is not only stretched but also rotated.Having a minimal rotation of the representations is potentially advantageous for downstream processing, because, since the ORN axon representation is computed dynamically through LN activation, the original representation appearing in ORN axons before the effect of LNs kick in will be maximally close to the final, converged representation.Thus downstream processing can be meaningful even before representation convergence.A lack of rotation is called a "zero-phase".If the rotation of the stimulus was substantial between the original representation at the ORN soma and the converged representation at ORN axons, the downstream computation could potentially be wasted at stimulus presentation and give incorrect information to the brain about stimulus identity.(A-B) LC-1 and LC-8.For the LC, the identity of the PCA directions is conserved, only their order changes, as can be deduced from the fact that all scalar products between {uX,i} and {uY,i} are either 1 or 0. Because the variance of the first or first 8 PCA directions decreases, their global order change.
(C-D) NNC-1, NNC-8.For the NNC, the PCA directions at the soma and at the axon are not exactly the same, but they conserve their approximate ordering.(E) NNC-conn model.Here, the PCA directions are even more intermixed than in the NNC-8 model, similar to NNC'-8 model (Fig. S17), where the LN-LN connection have been removed.  , ..., x (T ) ], D × T matrix of x (t)  depends on the {x (t) } considered Y = [y (1) , ..., y (T ) ], D × T matrix of y (t) Z = [z (1) , ..., z (T ) ], D × T matrix of z (t) A * the optimal A for the optimization problem, A can stand for Y, Z, W, M, etc.The * is often dropped in the text to simplify the notation when it is clear that one is talking about the optimal solution and not the variable.It is dropped in the results of the main text.

2 F
= Tr[X ⊤ X] and Tr[A + B] = Tr[A] + Tr[B], where Tr[•] is the matrix trace (sum of the diagonal elements of the matrix) we get: based on von Neumann trace inequality, we know that Tr[AB] ≥ N i a i b N −i+1 where Nikolai M. Chapochnikov, Cengiz Pehlevan, Dmitri B. Chklovskii iACT bil.low.L mPN iACT bil.low.R mPN iACT bil.up.L mPN iACT bil.up.R mPN iACT VUM L mPN iACT

B
Fig. S1.Full ORN connectivity and circuit selection.(A) Heat map of the ORNs feedforward and feedback connections on the left side of the Drosophila larva.We focus on the neurons, that synapse bidirectionally with ORNs (inside the red dashed rectangle): Broad Trios, Broad Duets, Keystones, and Picky 0. These neurons are all LNs.(B) Same as (A) for the right side.

B
Fig. S3.ORN soma activity from Si et al., 2019(2).(A) ORN soma activity patterns {x (t) }data in response to 34 odors at 5 dilutions acquired through Ca 2+ imaging.Different odors are separated by vertical gray lines.For each odor, there are 5 columns corresponding to 5 dilutions: 10 −8 , ..., 10 −4 .The odors and ORNs are ordered by the value of the second singular vectors of the left and right SVD matrices of this activity data, after centering and normalizing.This data is obtained by averaging the maximum responses of several trials to the same odor and dilution (as in Si et al., 2019(2)).(B)Same as (A), with each x(t) scaled between 0 and 1 to better portray the patterns.

Fig. S5 .
Fig. S5.Alignment of activity patterns x(t) in ORNs and ORNs → LN synaptic count vectors wLN.(A) Same as Fig.2B, for all the wLN and wLNtype and with all the odors labeled.The label "Broad T" corresponds to the average ORNs → Fig. S6.PCA of ORN activity and NNC connectivity vs data connectivity.(A) Percentage of the variance of the ORN activity patters {x (t) }data explained by the uncentered PCA.The top 4 and 5 PCA directions explain 71% and 76% of the variance, respectively.(B) First 5 PCA loading vectors of {x (t) }data.(C-D) w k from NNC with K = 4, 5 and ρ = 1.(E) Same as Fig. 2H with all wLN.(F) Same as (E), with w k from NNC-4 instead of PCA directions.

kw 4 IFig. S8 .Fig. S9 . 1 E
Fig. S8.Alignment of activity patterns {x(t) }data in ORNs and connectivity weight vectors {w k } from NNC-4.(A) Same as Fig.2B, for the four ORNs → LN connection weight vectors w k arising from NNC-4 simulations (ρ = 1).We see that the LNs of the NNC model, which is specifically adapted to this set of odors, have high and significant correlations with different sets of odors.w1 most resemble wBT, w2 -wBD, and w4 -wP0.(B-I) Same as Figs.S4F to M, for the four w k arising from NNC-4 simulations and with an overlaid normal fit to the shuffled distribution.These plots are quite similar to the ones based on the connectome, showing an additional match between the model and experimental data.In particular, we find two connectivity vectors (w1 and w4) that have, just as BT and P0, rather large deviations from the shuffled distribution, and the other two, just as BD and KS, are closer to the shuffled distribution.

Fig. S16 . 8 , 8 GFig. S17 .
Fig. S16.Input transformation by LC and NNC with ρ = 10, continued.(F-I)Corresponds to Figs. 6E to H.Because LC-1 only affects a single PCA direction, the results for ρ = 2 and ρ = 10 are quite similar in terms of channel variance and pattern magnitudes for this model.For LC-8, although we observe a decrease in channel variances and pattern magnitudes, there is virtually no difference between ρ = 2 and ρ = 10 in terms of the CV of channel variances or pattern magnitudes.For the NNC models, we observe both a decrease in channel variances and pattern magnitudes, and a decrease in their CV in comparison to when ρ = 2.As for (E), the difference between LC and NNC can be attributed to the fact that LC only affects certain stimulus directions, whereas the NNC as a global effect.(J-L)Corresponds to Figs.6I and J, Figs.S13K and L, and Figs.S14K and L. At ρ = 10, the channels are even more decorrelated than at ρ = 10 as seen in the correlation matrices and the histograms.For the LC, some channels become anti-correlated.(M-O) Corresponds to Figs.6K and L, Figs.S13N and O, and Figs.S14N and O.At ρ = 10, the odor representations are even more decorrelated than at ρ = 2 as seen in the correlation matrices and the histograms.This can particularly be observed for correlation coefficients above 0.5, whose proportion is less than at ρ = 2.

1 [ 1 1Aotherwise 0 or 1 r 1 n 1 X
−1,x] (ci): relative cumulative frequency function of a set of correlation coefficients 0 ≤ RF Cc(x) ≤ (y) indicator function of a given set A: 1 A (y) = 1 if y ∈ A, and 1 A (y) = 0 Pearson's correlation coefficient−1 ≤ r ≤ 1 r+ = max[0, r] -rectified correlation coefficient 0 ≤ r ≤ 1 r+ average r+ -mean rectified correlation coefficient 0 ≤ r ≤ 1 D number ofORNs 21 K number of LNs in different circuit models from 1 to 8 wLN D dimensional column vector, containing the number of synapses in parallel (synaptic counts) between each of the D ORNs and a specific single LN as in Berck et al., 2016 (1), see Fig. 1B wLNtype = LN∈LNtype wLN, D dimensional column vector, each entry is the average synaptic count from an ORN onto a given LN type LNtype (which contains n members); for Broad Trion = 6, Broad Dueln = 4, Keystonen = 4, Picky 0n = 2. calculated from Berck et al., 2016 (1), see Fig. 1B x (t) D dimensional column vector, representing the activity of ORN soma arbitrary {x (t) } a set of T x (t) , can refer to any (abstract) dataset arbitrary {x (t) }data set of the 170 x (t) taken from the measurements (2), as the maximum Ca 2+ fluorescence activation (2) Si et al., 2019 (2) y (t) D dimensional column vector, representing the activity of ORN axons {y (t) } a set of T y (t) z (t) K dimensional column vector, representing the activity of LNs {z (t) } a set of T z (t)Γ measure of alignment between 2 subspaces A and B 0 ≤ Γ ≤ min[dim(A), dim(B)] p or pv p-value 0 ≤ p ≤ 1 T number of inputs/samples x (t) 170 for the Si et al., 2019 dataset (2), otherwise arbitrary w k D dimensional column vector, containing the synaptic weights between each of the D ORNs and a specific single LN.Note that in the model, the feedforward connection weight vectors are ρ 2 w k and the feedback connection weight vectors are w k usually arising from the model W = [w1, ..., wK ], D × K matrix containing the (feedforward) synaptic counts or synaptic weights between ORNs and LNs either from Berck et al., 2016 (1) or from model simulations M = {mi,j }i,j=1...K , K × K matrix containing the synaptic counts or synaptic weights between LNs; mi,i relates to the leak term of LN i either from Berck et al., 2016 (1) or from model simulations ρ parameter of the circuit model, that encodes the strength of the feedback inhibition relative to the feedforward excitation in the simulations 0.1 ≤ ρ ≤ 10 γ parameter of the circuit model, that only scales the activity in LNs and the synaptic weights, without affecting the nature of the computation γ = 1 in the paper u unit with the physical dimension as X, Y, and Z e.g., spikes • s −= [x (1 {ui}i=1...D D PCA directions of the uncentered dataset {x (t) }, corresponds to the left singular vectors of the matrix X depends on the {x (t) } considered {σ 2 X,i }i=1...D D PCA variances of the uncentered dataset {x (t) }, {σX,i}i=1...D correspond to the square of the singular values of the matrix X depends on the {x (t) } considered {σ 2 Y,i } i=1...D D PCA variances of the uncentered dataset {y (t) }, {σ Y,i } i=1...D correspond to the square of the singular values of the matrix Y