The Dimensions of dimensionality

it is plausible that different brain regions emphasize different properties. The success of neuroscientists using imaging techniques with limited spatial and temporal resolution in uncovering embedding spaces suggests that in some cases neural representations are somewhat smooth and regular [124].Thebrainitself isnotuniform in itscircuitry and function,whichmeansthatgloballyinterpretabledimensionsmay be less likelyin certainregions, such asprefrontal cortex[125]. Even withina regionsuchas prefrontal cortex,the dimensionality foundmay be quitelow[126] or high[127], perhapsdue totask differences. Forexample,over-training on atask may lead to ﬁ ne-grain, essentially over ﬁ t, representations. Even within the same region of visual cortex with the same stimuli, minor task differences in how many aspects of a stimulus is relevant for a decision can affect dimensionality estimates

Multidimensional representational spaces can capture complex structures such as hierarchical relationships, lexical entailment, and compositional features.
Superficially different representations, such as graphs and multidimensional data, can capture the same structural relationships under appropriate settings.
The dimensionality of a representation conveys relatively little information on its own.exists for discovering latent dimensions [1][2][3][4][5][6][7][8].Although all embedding algorithms share the same general objective, there are meaningful differences.On the input side, different algorithms place restrictions on the type of data they can ingest.On the output side, different algorithms prioritize different latent space properties, such as prediction performance, transformation type, interpretability, and compactness.Collectively, these differences create a space of trade-offs, where researchers must decide which properties best suit their goals.Seemingly minor algorithmic differences can have unintuitive consequences for the inferred latent space.For example, the latent space dimensions may have limited interpretability.Or the dimensionality of the latent space may be unrelated to the number of so-called 'true' dimensions of the task.Or structurally dissimilar latent spaces may fit the input data equally well.The remainder of this article expands on key considerations when inferring latent spaces.Before diving in, it is worth pausing for a moment to unpack the metamorphosis that occurs during an embedding transformation.

Dimensions before and after
A dimension is a basic building block that contributes to a multidimensional structure.But the meaning of a dimension changes when mapping from dimensions defining an observable space to those defining a latent space.Observable data is fundamentally anchored in reality; it is as close as we can get to an objective measure of something that physically exists.In the case of multidimensional observable data, each dimension quantifies how a specific observable feature varies, such as the wavelength of light absorbed by a single photodiode.
After observable data are embedded, the meaning of a dimension is altered in two fundamental ways.First, the new latent dimensions are not guaranteed to correspond to something globally interpretable.In other words, the new dimension may not have an easy to articulate label since each latent dimension may use a sprinkling of all observable dimensions.We return to the issue of interpretable dimensions later.
The second fundamental difference is that latent dimensions do not necessarily exist in the real world.Latent dimensions are unobserved variables that can be estimated from input data.When someone looks at a photograph of a bird, the latent features burst into awareness so rapidly that there is an illusion that the features always existed out there.But photons do not carry metadata stating 'the last thing I bounced off of was an orange beak'.Latent visual features like an orange beak do not actually exist in the visual world, they must be inferred and rendered by the perceptual machinery of our minds.You might be thinking, of course the bird beak exists.While that is true, the bird exists in terms of physical atoms.The visual features are inferred by our mind and are at least one processing step removed from reality.This distinction is made apparent by converting the pixels of a photograph into a table of numbers.Reformatted this way, our existing perceptual machinery is completely derailed and the latent dimensions fail to materialize in our mind's eye.
At times, it is easy to think nothing profound has occurred when mapping from observable dimensions to latent dimensions, but good latent dimensions are the heart of cognitive models.Despite the unquestionable utility of latent dimensions, 'latent' is still a polite way to refer to something made-up, albeit in a data-driven way.Given the data-driven aspect of embeddings, it is important to recognize that an new set of input data could drastically change the inferred latent space.There is a balance to be made between researchers sharing out imaginary (but promising) dimensions and reminding fellow researchers to hedge their bets.In the absence of extraordinary evidence, the research community should remain cautious of designating a particular latent space as having the final say.At any time, a new embedding approach could be invented or additional data could become available that shakes up the status quo.

Glossary
Dimensionality: the number of dimensions present in a given multidimensional representation, which is a consequence of both the input data and the embedding algorithm.Embedding or encoding algorithm: an algorithm that trains a mathematical entity, sometimes called an encoder, using input data and (optionally) target data.Encoder: a mathematical entity that maps from input space to a latent space.Input data: data that is input into an embedding algorithm.The input data may be directly observed or the outcome of a previous processing step.Input data does not have to be multidimensional, for example, it may be text or graph-like.Latent space: (or embedding space); the learned multidimensional space that is a transformation of the input space.Typically, embedded data captures some notion of proximity (i.e., similar items from the input data tend to be neighbors in the embedding).The specific notion of proximity is determined by the particular algorithm used to infer the embedding.Manifold: a relatively contiguous lowerdimensional space that exists within a higher-dimensional space.Observable data: data associated with a directly observable measurement (e.g., a photodiode measuring photon wavelength).

Engineering desirable latent dimensions
Observable data by itself, provides limited insight and must be processed using algorithms that reveal valuable informative dimensions.It may be tempting to believe that the inferred latent dimensions are the objectively 'true' dimensions.However, it is important to recognize that different algorithms optimize different properties, meaning that there is no single latent space that is objectively correct.Rather, the optimal latent space is the one that best suits the research question.For example, both the Big Five Inventory [9,10] and HEXACO [11] provide low-dimensional latent spaces for describing personality traits, but HEXACO may make better predictions for moral and ethical behavior.Likewise, inferred latent dimensions are highly dependent on the input data.When the input data is primarily derived from undergraduates in western universities, the latent dimensions may not be representative of the wider world [12].
When selecting an embedding algorithm there are five key considerations.On the input side, (i) one must select an algorithm that accommodates the structure of the input data.On the output side, different algorithms prioritize different latent space properties: (ii) prediction performance, (iii) transformation type, (iv) interpretability, and (v) compactness.A small preview of embedding algorithms and their properties are summarized in Table 1.For a brief discussion of practical considerations see Box 1.These properties often trade-off against one another, forcing researchers to prioritize based on their objectives.Each of these considerations is unpacked in more detail later.
Input source and structure Observable data can be obtained from remarkably different sources.Intuitive examples include measurements of sensory modalities, such as as photon wavelengths, audio frequencies, and the outputs of gas chromatography-olfactometry.Various techniques also yield observable  [13].Embedding steps can also be chained to create an arbitrary sequence of data transformations where the output of one embedding step is passed as input to the next embedding step, as is the case in deep neural networks.
So far we have outlined how input data varies along two axes: (i) variety in source and (ii) variety in the extent of prior processing.A third axis concerns the structural form of input data.For example, the input data may be arranged as a table of numbers (i.e., multidimensional data), a set of ordinal relationships (e.g., human preference judgments), a graph of nodes and edges, or a large body of text.Historically, embedding algorithms have focused on ingesting multidimensional data, such as in the case of PCA [1,2], independent component analysis (ICA) [4], autoencoders [7,8], and t-SNE [3].But embedding algorithms can embed arbitrary input data in a multidimensional space.
In cognitive science, a common source of data comes from collecting human similarity judgments (e.g., stimulus Q is more like stimulus A than stimulus B).A diverse family of embedding algorithms exist that transform ordinal similarity judgments into a multidimensional representation where each stimulus is represented as a point in the latent space [14][15][16][17][18][19][20][21][22][23][24][25][26][27].Many of these algorithms do not have formal names, but fall into the general family of multidimensional scaling algorithms or psychological embeddings.The inferred latent dimensions formalize the notion of psychological distance, where similar items occur close together (Figure 1E).Analogous embedding algorithms also exist for categorization confusion matrices, pairwise ratings [28,29], odd-one-out judgments [30], arrangement data [31,32], and non-human paradigms [33,34].Embedding algorithms can also be guided by asking participants to rate the degree that a particular stimulus (image of a cat) exhibits a particular feature ('is stealthy') [29,35,36].In all of these efforts, the similarity function chosen to model behavior may be pragmatic or reflect theoretical assumptions, such as how the brain represents and processes information [37].
Embedding algorithms can also ingest text corpora in order to produce multidimensional word embeddings [38,39] and sentence embeddings [40].Perhaps easiest to understand are word embedding algorithms that leverage the co-occurrence statistics of words, which counts how often a pair of words co-occur within a specified window (Figure 1F).Words that co-occur in similar contexts likely have similar meanings.For example, pet animal words are likely to co-occur Box 1. Practical considerations In addition to key theoretical considerations, practical considerations must also be taken into account when selecting an embedding algorithm.The impact of computational efficiency varies by application, but typically becomes more important as the problem size grows.For small problems on the order of a few hundred data points, popular embedding algorithms easily find solutions within a few seconds on widely available hardware and most computational inefficiencies can be ignored.But as problems grow, it becomes increasingly difficult to ignore computational inefficiencies because poor design choices can preclude discovering solutions within a reasonable time-frame.Some algorithms will be more or less prone to getting stuck in local optima and finding degenerate solutions.Different algorithms will also have varying memory and storage requirements.For example, nonnegative embedding algorithms typically require substantially more dimensions, thus more memory and storage, than an equally accurate unconstrained embedding algorithm.In this case, there is a substantial trade-off between interpretability and model size.In a similar vein, estimating uncertainty using Markov chain Monte Carlo methods will scale poorly compared with variational inference [118].
with the phrase 'water bowl'.Word embeddings have also been extended so that every word is represented as a multivariate Gaussian distribution in the multidimensional space [41].In such a space, the covariance matrix of the Gaussian can capture properties like lexical entailment.For example, the concept 'mammal' may exhibit a large Gaussian that overlaps the concepts 'bear', 'otter', and 'whale'.Modern multidimensional word embeddings have been shown to capture linguistic hierarchical structure [42], further demonstrating the breadth of possible meanings for multidimensional spaces.
The notion of latent dimensions can even capture graph structure, where connected nodes are embedded as nearby points (Figure 1D).Graph embeddings deserve special mention because hierarchical graphs have long been argued as a case that multidimensional spaces struggle to adequately model [43][44][45].Modern embedding algorithms have demonstrated that graphs can be embedded in multidimensional spaces [46][47][48].Hierarchical graphs can be embedded in multidimensional spaces by exploiting properties of hyperbolic spaces [49,50], which tend to require fewer dimensions relative to their Euclidean counterparts when modeling hierarchical data [49].Some behavioral data, such as smells, are well-described by a hyperbolic space [51].Graph embedding algorithms highlight how the distinction between graph data and multidimensional data can be superficial.Sometimes it is possible to generate topological signatures to assess whether data are better described by the curvature in a Euclidean or hyperbolic latent space [52], which have been used to argue spatial maps are hyperbolic [53].
Prediction performance Perhaps the most important property of an embedding algorithm is the preservation of information when mapping from input space to latent space.A good latent space will retain the key statistical regularities present in the input data while discarding noise.The appropriate metric for assessing preservation of information will vary by application, but can be loosely described as the prediction performance of the latent space.For example, after applying PCA one could measure the total variance explained by the latent dimensions.Likewise, if the latent representation is used in a downstream image classifier, then the embedding algorithm could be scored on its ability to embed a test image among neighbors of the same category.Prediction performance can be expounded further by considering whether an approach permits out-of-sample predictions and estimates of uncertainty.
Prediction performance may vary for trained items (i.e., within-sample) versus test items (i.e., outof-sample).Out-of-sample predictions are important because they enable an encoder to generalize to novel situations.For example, a mammogram encoder will have limited practical value if it cannot embed X-ray images obtained from new patients.Interestingly, many algorithms cannot perform out-of-sample generalization, although much work has been done to extend popular approaches [54,55].By contrast, artificial neural networks naturally make predictions for novel inputs.Out-of-sample generalization may suffer from overfitting, which can be framed as a separability versus generalization trade-off [56] or an efficient versus robust trade-off [57].
A second factor in prediction performance is the ability to generate uncertainty estimates.Algorithms can learn point estimates or distributions for each embedded sample.Learning an embedding with uncertainty estimates has the advantage of describing both the most likely location (e.g., mode) of the embedding point and how confident we can be of the embedded location.
Various techniques are available to learn uncertainty embeddings, such as variational inference autoencoders [8], variational psychological embeddings [23], and epistemic neural networks [58].Uncertainty information can help inform downstream analysis, such as identifying latent space regions that exhibit individual differences, estimating the significance of a treatment condition, or computing expected information gain within a active learning paradigm [23,59].

Transformation type
The type of transformation has a large impact on the final inferred latent space.Broadly speaking, the type of transformation can be characterized by the linearity of the embedding transformation and the change in dimensionality.A transformation of the input space is typically (but not always) performed with the additional goal of discovering a lower dimensional latent space.A lower dimensional latent space is typically desirable because it transforms a large number of (likely) uninformative dimensions into a smaller number of informative dimensions.For example, face space is a low dimensional space that can be used to describe the perceived similarity between human faces and can help explain face distinctiveness [60].By contrast, an embedding transformation can also be used to discover a larger dimensional latent space, such as when ICA is applied to image pixels to discover a rich set of latent dimensions that look remarkably like V1 receptor fields [61].
When restricted to linear transformations, the new dimensions embody relatively simple changes (Figure 1A,B).Popular methods of linear transformations include PCA [1,2], ICA [4], Fourier transform, factor analysis [62], and tensor-based dimensionality reduction [63,64].By contrast, a nonlinear transformation, such as t-SNE [3], allows the new dimensions to describe a drastically different space, potentially twisting and turning through the original space (Figure 1C).
The consequences of dimensionality reduction are different for linear and nonlinear approaches.
For linear approaches, dimensionality reduction may involve identifying a set of latent dimensions that explain the most variance and excluding any remaining latent dimensions (Figure 1A).For example, if two input dimensions are highly correlated then one of the dimensions can be dropped.By contrast, nonlinear dimensionality reduction algorithms often assume that data are not distributed in a uniform way, but that the data points are distributed in relatively contiguous sheets (i.e., manifolds) that twist and turn in a larger multidimensional space.When the manifold assumption is true, the empty parts of the original space can be 'squeezed' or 'flattened' out (see Box 2 for an example).This assumption is known as the 'manifold hypothesis'.Widely used nonlinear dimensionality reduction algorithms include ISOMAP [5], t-SNE [3], UMAP [6], and various flavors of autoencoders [7,8].Other nonlinear dimensionality reduction algorithms include kernel methods [65], local linear embedding [66], Laplacian eigenmaps [67], Hessian eigenmaps [68], and local tangent space alignment [69].
Cognitive scientists have developed a diverse set of application-specific dimensionality reduction methods.Neuroscience has been particularly prolific in developing dimensionality reduction algorithms that can ingest neural population activity [70].Neuroscience-specific methods range from diffusion embedding for connectivity data [46], delayed latents across groups (DLAG) for neurophysiological recordings [71], and calcium imaging linear dynamical system (CILDS) [72].The concept of low-dimensional manifolds has invigorated discussions around the brain's latent representations, proving fruitful for studying motor [73] and spatial representations [53].

Interpretability
Cognitive scientists are often interested in uncovering psychologically interpretable dimensions in order to advance mechanistic hypotheses.For example, latent dimensions can define a motor space with separate preparation and execution dimensions [73].While the interpretation of a observable dimension is self-evident, the interpretation of a latent dimension is not straightforward.The specifics of the embedding algorithm, such as linear or nonlinear latent dimensions, influence the manner of interpretation.
There are two forms of interpretability.In its strong form, interpretability means that each dimension has a clear interpretation independent of the other dimensions.For example, the second dimension of a latent space may correspond to 'wingspan of a bird', where lower values mean smaller wingspan.The strong form can be framed as global interpretability: the interpretation holds for the entire dimension regardless of where you are on the dimension.This strong form Box 2. Manifold example As a simple example of a twisty manifold, consider a set of images that are each 100 × 100 pixels, where each pixel can be either black or white.Each image depicts a single black circle on a white background.All images are unique since the circles vary in both size and location.In its input form, each image has 10, 000 dimensions (one dimension for each pixel).With respect to input space, each image can be thought of as a point in the 10, 000 dimensional space.Even with many unique images, vast regions of the 10, 000 dimensional space will be unoccupied.For example, you will never encounter an image with an isolated black pixel, so all coordinates corresponding to an image with isolated black pixel will be unused.Occupied parts of the space will tend to be clumpy; circles that have a similar radius or slightly different center will occur close together in pixel space.The vast unused regions of pixel space suggest that a lower dimensional latent space is possible.By construction, we know this to be true.The data associated with these images can be re-described using three latent dimensions: an x-and y-coordinate describing the center of the circle and the radius of the circle.The new latent space is more concise and the underlying structure of the stimuli is made clearer.
of interpretability is often referred to as a disentangled representation in machine learning [74].By contrast, a dimension may only have local interpretability: the interpretation of the dimension changes as you gradually move along the dimension (Figure 2C).Given sufficient input data that is not pure noise, an embedding should at least have local interpretability.In practice, an embedding may not appear to exhibit local interpretability if the input data contains too few samples.
For both global and local interpretability, although the dimensions capture a clear statistical regularity, it may be difficult to articulate a description of the dimension using natural language.
In our estimation, when cognitive scientists discuss dimensions they are often referring to globally interpretable dimensions, perhaps because such dimensions are the most intuitive and the simple stimuli of early methods yield such spaces.The colloquial prominence of globally interpretable dimensions is likely encouraged by cognitive models that deploy dimension-wide attention to expand or shrink the extent of a dimension, thus altering perceived similarity [75][76][77].Relatedly, globally interpretable latent dimensions are a highly desirable outcome since such representations provide actionable insight, such as helping researchers understand how representation spaces in the brain change as a function of learning [78][79][80][81].Interpretable latent dimensions could be used to design more efficient training programs.For example, if the latent dimensions reveal a diagnostic dimension of malignant skin lesions, then training can be structured to emphasize learning the diagnostic dimension.
If globally interpretable dimensions are a high research priority, this property should be deliberately balanced with other properties.To bias the inferred latent space towards globally interpretable dimensions, one option is to use algorithms that employ nonnegativity constraints that force the discovery of sparse, part-like representations [20,30,82]; although this will likely come at the cost of needing a latent space with substantially more dimensions.By employing nonnegative constraints, each dimension is biased to code for part-based features, promoting compositional representations that tend to be easy to articulate.Similarly, demixed PCA is capable of exposing the dependence of the neural representation on task parameters such as stimuli, decisions, or rewards [83,84].

Compactness
In light of the previously introduced properties, one can see why it is problematic to make absolute claims about the number of 'true' latent dimensions.For example, one could trade a lower dimensional, locally interpretable space for a method that yields a higher dimensional, globally interpretable space.Although both solutions may fit the input data equally well, they will differ in the number of dimensions recovered.Going the other direction, a researcher could employ nonlinear transformations or hyperbolic spaces in order to obtain a compact lower dimensional representation.When work indicates that the measured dimensionality of the neural representations in the prefrontal cortex is high [85], that neural codes are confined to low-dimensional latent spaces [70,84], or that neural codes actually exist in high-dimensional spaces [86], it is important that these results are also qualified by employed embedding technique and the corresponding properties being prioritized.
A potential improvement over reporting dimensionality, is to report the intrinsic dimensionality, which can be thought of as the smallest dimensionality possible at the expense of all other properties.However, methods for computing intrinsic dimensionality can still be sensitive to factors like the distribution of points and curvature of the latent space, so care must be taken to use modern methods that are robust to these complications [87,88].
Regardless of reporting dimensionality or intrinsic dimensionality, it is our view that researchers should make clear which latent space properties are being prioritized (e.g., prediction performance, globally interpretable dimensions, compactness).Dimensionality by itself is not a good indicator of the amount of information in a representational space.Rather, one should aim to make dimensionality comparisons between representations that are derived with comparable assumptions [79].
High-quality work that includes analyses of the functional or intrinsic dimensionality of a dataset [86,89] is still bounded by the assumptions baked into intrinsic dimensionality computations.

Selecting and comparing latent spaces
If one is seeking a one-size-fits all embedding solution, the trade-offs introduced earlier should make it clear that such a solution does not exist.The natural course of research will inevitably lead to a need to select the latent space(s) that best meet the research objective.Model selection can be performed by directly comparing candidate embeddings spaces or indirectly comparing embeddings via performance on a downstream task.
A direct comparison can identify how two latent spaces agree and diverge, which in turn suggests information that is shared or distinct between the two sources of input data.For example, one may want to know whether the latent representation of fMRI data from a particular brain region is capturing high-level semantic information (such as 'this item is edible') versus lower-level feature information ('this item is yellow').One way to address this question is to compare the latent representation of fMRI data to latent representations based on category membership (limes and tennis balls will be in distinct clusters occurring far apart) versus latent representations based on low-level features (limes and tennis balls will be intermixed because they are both round and green) [90].Likewise, one may want to compare the latent represent of fMRI data from a particular brain region with the latent representation of different layers of a deep neural network in order to find which parts of a neural network best correspond to specific regions of a brain [91].Many different types of representation comparisons are popular in cognitive science, such as brain versus behavior, model versus brain, model versus model [92], and language versus language.For example, the function of a brain region can be clarified by finding a correspondence between a latent space derived from a region's brain activity and a latent space derived from a cognitive model fit to behavior [93].All of these comparisons enable researchers to quantify differences and highlight deficiencies in a candidate embedding.
Latent spaces are in agreement if they display second-order isomorphism [94].For example, the image of a crow is the nearest neighbor of the image of a raven in both latent spaces.Second-order isomorphism does not require the coordinate values or the dimensionality of the two spaces to match.Instead, the notion of match is more abstract, the patterns between the points in the two spaces need to display the same relations.For example, consider a set of points randomly arranged in a two-dimensional space.If one made a copy of this space and rotated it by 90 degrees, the new space would have a completely different set of coordinate values, suggesting that the two representations are different.If instead one considered the relationships between the points, such as pairwise distances, one would conclude that the two spaces describe an identical set of relationships.In this simple example, the two spaces are perfectly isomorphic, but in practice matches are imperfect and evaluating alignment between two spaces can be challenging.
The optimal method for comparing representations is an ongoing research problem.Numerous techniques exist and each has limitations and built-in assumptions.Popular techniques for comparing representations include representational similarity analysis (RSA) [90,95] and canonical correlation analysis (CCA) [96][97][98].Briefly, RSA is a method for comparing two representations that assesses the correlation between the implied pairwise similarity matrices.CCA is a method that compares two representations by finding a pair of latent variables (one for each domain) that are maximally correlated.While RSA and CCA remain popular, their limitations have spurred researchers to develop more robust methods [99,100].Adjustments to RSA include the similarity metric centered kernel alignment (CKA; also known as the RV coefficient) [101][102][103], unbiased CKA [104], feature-reweighted RSA [105], and extensions for handling noise [106].Numerous CCA variants have been introduced to make the approach more widely applicable, such as probabilistic CCA [107,108], kernel CCA [109], deep CCA [110], sparse CCA [111], and projection weighted CCA [92].Techniques also exist beyond RSA and CCA, such as pattern component modeling [112].
One practical strategy for selecting between different latent spaces is to assess performance on a downstream task.For example, one could evaluate the ability of deep neural network embeddings to correctly predict different aspects of human behavior, such as similarity judgments [18,23], categorization performance [113,114], categorization performance using degraded images [115], and trial-by-trial categorization error consistency [116].The advantage of this strategy is that it allows comparing latent spaces with very different assumptions since the downstream task can be agnostic to the particular embedding details.The drawback of this approach is twofold: researchers must define how the embedded coordinates are mapped to observable behavior and the target behavioral data (e.g., categorization performance) may not be available.In practice, one can evaluate multiple methods for assessing correspondence and choose the one that performs best on some gold standard.Following machine learning best practice, one suggestion is to use holdout data, or simulated data to select the most appropriate procedure.Can the brain's representational spaces be dynamic and assembled on-the-fly?As the task context changes, different representational properties may be more or less important, as suggested by models of selective attention.methods.Defaulting to a single comparison method like intrinsic dimensionality, CCA, or RSA risks overlooking novel insight.While this focuses on comparisons between latent spaces for purposes of data analysis and theory evaluation, one exciting possibility is that biological systems also balance these trade-offs (Box 3) and learning itself may proceed by comparing and aligning different latent spaces in an unsupervised fashion [117].

Box 3. Trade-offs in biological systems
Biological systems, like researchers, also aim to balance potentially competing demands.The balance can shift depending on the task and available resources.In some cases, interpretability may be of primary importance, whereas in other cases compactness may be paramount.Almost all of the earlier points also apply to an organism's point of view.For example, brain regions need to communicate with one another and eventually affect behavior.This 'inside view' is sometimes neglected but should be adopted to understand the function of biological systems.
The nature of the neural computation may determine what is the best representational format.For example, when information is passed from one region to another region via biological neural networks akin to a linear transformation then globally interpretable dimensions may offer advantages.In this case, a linear transformation can more readily extract the relevant information than when the same information is embedded in a lower-dimensional manifold where dimensions have no obvious interpretation.Moreover, globally interpretable dimensions may offer communicative benefits as they can readily align with language (e.g., the dimension of size easily maps to codified relations in language such as 'bigger than').
In other cases, the opposite may be true and the brain may choose not to decorrelate different dimensions in order to promote redundant communication across brain areas [119] or a brain region may favor cells that have a mixed selectivity to boost coding capacity [89].Likewise, low-dimensional manifolds with local interpretability may better support inter-region communication when the readout network functions as a tuned receptive field, as in artificial radial basis networks.For example, the sparse coding of cells that selectively respond to a particular celebrity [120] may be repackaged into a low-dimensional representation to facilitate inter-region communication.
Latent representations may also exhibit varying degrees of specialized compartmentalization.At one extreme, a region may create a vast unified latent space, such as recently proposed for category-selective visual regions [121].At the other extreme, rather than extracting a single shared latent space, the brain may find it worthwhile to maintain a set of (nearly) orthogonal subspaces [122], as in the case of whisker contacts in rodents [123].Resource costs in terms of required cells, metabolism, and wiring may lead the brain to favor (when possible) more compact solutions that lack global interpretability, but invest in more expensive solutions when it pays off.
Given the different trade-offs associated with latent space properties, it is plausible that different brain regions emphasize different properties.The success of neuroscientists using imaging techniques with limited spatial and temporal resolution in uncovering embedding spaces suggests that in some cases neural representations are somewhat smooth and regular [124].The brain itself is not uniform in its circuitry and function, which means that globally interpretable dimensions may be less likely in certain regions, such as prefrontal cortex [125].Even within a region such as prefrontal cortex, the dimensionality found may be quite low [126] or high [127], perhaps due to task differences.For example, over-training on a task may lead to fine-grain, essentially overfit, representations.Even within the same region of visual cortex with the same stimuli, minor task differences in how many aspects of a stimulus is relevant for a decision can affect dimensionality estimates [79].Activity in some brain regions indicates representations live in a hyperbolic latent space [128], such as spatial representation in the CA1 region of the rodent hippocampus [53] and early visual cortex [129].One possibility is that people rely on multiple embedding spaces with the relative importance of each varying with context [130].

"Figure 1 .
Figure 1.Examples of latent dimensions derived from six different datasets.Each sub-panel depicts the input data and any input dimensions (gray), followed by the latent dimensions (red) that best capture the structure of the data.(A) Latent dimensions are linear and orthogonal to one another [e.g., principal component analysis (PCA)].(B) Latent dimensions are linear, but not necessarily orthogonal to one another [e.g., independent component analysis (ICA)].(C) Latent dimensions can be nonlinear, curving through the original space (e.g., ISOMAP, t-SNE, UMAP).(D) Latent dimensions are derived from a hierarchical graph and embedded in a hyperbolic space (e.g., Poincaré embedding).(E) Latent dimensions are inferred from ordinal similarity relations.For example, s 1 : s 2 > s 3 means that given stimulus s 1 , participants judge s 2 to be more similar than s 3 .(F) Latent dimensions are inferred from a text corpora.

1 0Figure 2 .
Figure 2. Example demonstrating the difference between global and local interpretability of multidimensional spaces.(A) Input dimensions (columns) of a set of input data composed of nine concepts (rows).A darker cell means more of a particular feature.(B) A two-dimensional projection of the first two input dimensions where each dimension is globally interpretable.(C) An analogous nonlinear embedding of the input data where the latent dimensions are locally, but not globally interpretable.

Table 1 .
Properties of a handful of latent space algorithms Embedding algorithms provide an exciting and powerful technique for understanding the content of mental representations.Embedding arbitrary input data is a key component of cognitive science research with many open avenues for research (see Outstanding questions).When studying and communicating results it is important to realize that different algorithms bestow different interpretations on the recovered dimensions.Multidimensional representations are extremely flexible: capable of condensing high-dimensional representations down to low-dimensional spaces, representing part-like features with high interpretability, and capturing graph-like relationships.The extreme flexibility of multidimensional spaces means that superficially different representations can capture underlying structure equally well.As a consequence, gross measures like dimensionality convey little on their own; more dimensions does not mean more information.Instead, researchers need to make controlled comparisons paying attention to the strengths and weaknesses of different comparison 10 Trends in Cognitive Sciences, Month 2024, Vol.xx, No. xx