New Perspectives on the Aging Lexicon

The field of cognitive aging has seen considerable advances in describing the linguistic and semantic changes that happen during the adult life span to uncover the structure of the mental lexicon (i.e., the mental repository of lexical and conceptual representations). Nevertheless, there is still debate concerning the sources of these changes, including the role of environmental exposure and several cognitive mechanisms associated with learning, representation, and retrieval of information. We review the current status of research in this field and outline a framework that promises to assess the contribution of both ecological and psychological aspects to the aging lexicon.

Past work suggests that normal and pathological aging are associated with changes in lexical and semantic cognition.
We review recent evidence on how life span changes in size and structure of the mental lexicon impact lexical and semantic cognition.
We argue that models of the aging mental lexicon must integrate both ecological and psychological factors and propose a research framework that distinguishes environmental exposure from cognitive mechanisms of learning, representation, and retrieval of information.
Our framework emphasizes the need for interdisciplinary collaboration between linguistics, psychology, and neuroscience to generate insights into the ecological and computational basis of the aging mental lexicon. and less efficient (i.e., the shortest path length between any two words in the network is greater relative to those of younger adults) ( [17][18][19]; Figure 1).
Crucially, evidence is also mounting that lexical and semantic structure is crucial to understanding individual cognitive performance in a variety of domains ( [7,[20][21][22]; for a review, see [8]). For instance, low clustering in semantic networks, a measure of the extent to which nodes in a network tend to cluster together, has been linked to poorer performance in cued recall of words [23]. Table 1 provides an overview of work that has linked different aspects of semantic network structure to cognitive performance. It suggests that uncovering the structural characteristics of networks may be useful to describe and perhaps predict cognitive performance of older individuals or distinguish between normal and pathological aging [24][25][26].
Although evidence is mounting concerning the links between aging and semantic structure and potential importance of lexical and semantic structure for cognitive performance, we have yet to gain a full understanding of the sources and mechanisms of these changes. Crucially, a variety of likely candidates have been proposed in the literature, including environmental factors, such as the cumulative nature of information exposure across the life span, and a suite of cognitive mechanisms, such as those concerning learning, representation, and retrieval of information. In what follows, we review past evidence for the role of such factors and discuss the need to assess the relative contribution of each in order to understand the aging lexicon.  [17][18][19]. There is now converging evidence that although network size appears to grow continuously across the life span [79], degree and shortest path length show mirrored nonlinear trends, with degree increasing across childhood and decreasing across adulthood and shortest path length decreasing across childhood and increasing across adulthood [17][18][19]. The findings for the clustering coefficient are more mixed [19]; however, the evidence points towards monotonically declining clustering coefficients throughout the life span [17,18].

A Framework for Understanding the Aging Lexicon
We introduce a novel framework to help us discuss a number of mechanisms that have been linked to age differences in the mental lexicon. Our framework spans both ecological and psychological aspects and consists of four components ( Figure 2): (i) the physical, social, and linguistic environment; (ii) the learning processes that build up a mental representation; (iii) the structure of the mental representation itself; and (iv) the processes of manipulating or retrieving information from the representation. Although our illustration may suggest a unidirectional information cascade from the environment to retrieval, our framework does not preclude a dynamic flow, with later components influencing earlier ones. For example, pronunciation tends to change with age, likely as a result of continued experience and efforts to optimize discrimination between words [27,28], and these perceptual/motor changes can be seen as influences on the linguistic environment of those exposed to the language of older speakers. In what follows, we review past evidence concerning each of these components below.

Cumulative Exposure
Over the course of a lifetime, an average European attends about 10.9 years of schooling [29], watches more than 100 000 hours of TV [30], works 10 different jobs [31], and is part of a countless number of conversations with family, friends, and coworkers. These experiences are the fundamental basis for learning and shaping an individual's mental representations [32]. Some have argued that older adults can be considered experts in a general sense [33] in that they possess different memory representations because they have been exposed to more environmental input overall, and these have important implications for cognition [4,6,22]. Consistent with Heaps' law, which states that the number of word types grows with the amount of linguistic input [34], both simulation [35] and empirical work suggest individuals' vocabulary increases continuously across the life span [15,36]. Moreover, computational models of lifelong word and association learning have been shown to successfully account for performance declines in older adults relative to younger adults, for instance, in word-pair learning [22] and recognition [4], suggesting that the exposure to different amounts of information alone could account for age differences in word-pair memory performance [6].

Different Environments
Older and younger adults differ not only in quantity of experience but also in its content. Younger and older adults differ in occupational status [37], social networks [38], and their use of the Internet and social media [39]. These differences in experience further contribute to shaping the contents of younger and older adults' lexical and semantic representations [40]. Regrettably, the extent to which differences in the amount and content of information exposed to younger and older adults determines their lexical and semantic representations and cognitive performance remains largely unexplored.
We should note the ecological approaches emphasized above do not logically exclude the contribution of additional mechanisms to age-related differences in cognitive performance, including age-related differences in learning and other factors that we review below. Given the body of knowledge concerning the biology of age-related cognitive decline [1] it is unlikely that ecological explanations alone provide a full understanding of differences in younger and older adults' mental lexicon. Nevertheless, the results above show that it would be naïve to neglect the role of ecological factors in models of the aging lexicon and that it remains to be tested to what extent additional psychological factors are needed to account for age differences in linguistic and semantic cognition.

Glossary
Aging lexicon: age-related changes in the mental lexicon (see mental lexicon). Association: the relationship between words that is often based, but not limited to, word co-occurrences in the environment (see environment). Free association tasks prompt participants to produce one or more words that come to mind when cued with another word. Clustering coefficient: (local) clustering coefficient of a node is defined as the number of edges between the neighbors of the node divided by the maximum possible number of edges. Connectionist model: type of model that views cognitive processes as cooperative and competitive interactions among large numbers of simple computational units (Box 1). Corpus: body of processed data derived from language, such as recorded conversations or written text (e.g., BNC), often including various metadata, such the age of the source. Degree: the degree of a node is the number of other nodes connected to it. Environment: here, the entirety of language and language related input to sensory organs. Heaps' law: empirical law according to which vocabulary size (i.e., the number of distinct word types) in a document grows with document size (i.e., the number of tokens). Also called Herdan's law. Learning: here, the processes involved in acquiring novel lexical and semantic information and storing them, at least temporarily, in the representation. Lexical decision: task requiring participants to decide whether a string of letters spells a true word of the respective language or not. Multiplex network: network containing multiple types of edges permitting the simultaneous representation of qualitatively distinct information such as semantic and phonological information. Mega-studies: large-scale behavior studies involving hundreds or thousands of stimuli and/or participants. Mental lexicon: repository of lexical and conceptual representations including semantic, phonological, orthographic, morphological, and other types of information [7]. Several computational accounts of lexical and semantic representation exist, including connectionist, network (see network), Learning Sensory Constraints Sensory acuity declines with age [41,42] and differences in cognitive performance, including the ability to learn new associations, have been linked to changes in sensory acuity [43]. Proponents of the information degradation hypothesis have argued that degraded perceptual inputs can lead to errors in perceptual processing, which in turn may affect nonperceptual, higher-order cognitive processes [44]. However, changes in learning and cognitive performance are found even when controlling for sensory limitations during testing [45], implying that age differences in sensory acuity are more likely to reflect general senescent alterations in the aging brain rather than simply sensory deficits in the processing of training and assessment stimuli. Nevertheless, whether specific impairments (e.g., hearing) represent direct contributors to age differences in the aging lexicon remains largely unknown.

Attention/Encoding Failures
Older adults suffer from difficulties in sustaining attention across an encoding episode [2] and in encoding associations between words [46]. As a consequence, a generally held position is that learning depends on executive or cognitive control abilities that are impaired in older adults [47]. Given the important role of cognitive control structures in the processing of linguistic and semantic information, it is likely that age differences in cognitive control play a central role in information acquisition [13], for instance, by impacting how well older adults can focus on the relevant and suppress irrelevant information during the learning episode [2].

Prior Knowledge
The encoding of new information is also moderated by an individual's pre-existing knowledge [48], such as knowledge accumulated over the life span [32,33,49]. For instance, new associations with words that occur overall frequently in the environment and that already possess strong associations with other words are more difficult to form than are associations with infrequent words [6,22]. Experiences consistent with pre-existing larger schemata in semantic memory have been found to consolidate faster into a long-lasting memory trace relative to inconsistent ones [50,51]. Along these lines, older adults have been found to encode new material more efficiently than younger adults but only when the information is encapsulated in a context that is natural for the respective material, for instance, when a target word was placed within a meaningful sentence [52,53]. These results imply that older adults' exposure to past environments can also have an indirect influence on the mental lexicon by impacting how new information is encoded. Overall, age-related differences in encoding are likely important drivers of differences between younger and older adult's cognitive performance. Nevertheless, their mediating role in shaping the structure of the aging lexicon is still largely unexplored.

Representation
Decay A longstanding hypothesis is that memory traces are subject to passive, gradual decay as a result of not using the particular trace [54,55]. Although decay accounts have been widely abandoned in memory research in favor of accounts focused on interference [56], the notion of passive decay has led to successful accounts of, in particular, pathological, age-related changes in mental representations. For instance, degrading the connection strength between words in an associative network could account for the increased semantic priming in patients with Alzheimer's disease (AD) [57]. Similarly, lesioning specific representational loci in a connectionist model could account for the behavior of patients with semantic dementia in both semantic and lexical tasks [58]. The notion of weakening connection strength lies at the heart of another representationbased account of age differences in cognitive performance. The so-called transmission deficit hypothesis [59] posits that as connections between nodes weaken with age, the transmission of and vector-space models (see vectorspace models) (Box 1). Naming: task requiring individuals to name an object from its picture, description, or spoken form. Network: collection of objects, called nodes, joined by edges. Nodes represent elementary components of the system (e.g., words) whereas edges represent the connections or associations between pairs of units (e.g., associations between a cue word with the word produced as a response). Recall: task requiring participants to retrieve, with or without supporting cues, words from a previously learned word list. Representation: here, the relatively stable storage of acquired lexical and semantic information. Retrieval: here, the processes involved in retrieving lexical and semantic information from the representation. Shortest path length: shortest number of steps required to connect a pair of nodes in the network. Vector-space model: computational models that learn high-dimensional word representations from their cooccurrences in language corpora (see corpus). Verbal fluency: constrained association task requiring participants to retrieve in a limited amount of time as many words as they can from a given category (e.g., animals; category fluency) or beginning with a certain letter (e.g., S; letter fluency).
activation between semantic and lexical word representations is especially affected. This progressive weakening is thought to produce states of semantic activation without lexical or phonological activation, resulting in a feeling of knowing without being able to actually pronounce a word, commonly known as a tip-of-the-tongue state.

Consolidation
Consolidation refers to the process in which an item in memory is transformed into a long-term form taking place both at the level of the synapse (synaptic consolidation) and the brain system (systems consolidation; [60]). Whereas the former works on relatively small timescales, the latter is believed to be ongoing for months or even years [51], altering not only where but also how memories are represented in the brain, including the transformation of episodic representations to more semantic ones [61,62]. That is, it has been argued that systems consolidation involves an active, well-organized decay process that systematically removes selective memories to produce sparser and more efficient memory representations [54]. Although its role across very long timescales, years to decades, is mostly unexplored [51], consolidation does represent a promising alternative for phenomena attributed to passive decay and, generally, a plausible neurophysiological mechanism for age-related changes in the mental lexicon.

Cognitive Control
Models of memory and language typically view the productions of the cognitive system not as direct readouts of internal representations but rather the result of a response mechanism that operates on them [63]. This mechanism is thought to involve cognitive control and retrieval

Trends in Cognitive Sciences
strategies. Cognitive control is conceptually related to working memory capacity [64] and, generally, refers to an executive ability that is needed to actively maintain relevant information and inhibit external and internal distractors [65]. Cognitive control is thought to mediate retrieval from memory by reducing interference and enhancing focus on currently activated, task-relevant representations [65,66]. Older adults typically exhibit lower cognitive control resulting in poorer memory retrieval performance in, for instance, verbal fluency or episodic memory tasks [67,68].

Search Strategies
Search in memory refers to the systematic, goal-directed foraging of memory representations [69] and is often modeled as a strategic combination of sustained, focused attention to local areas of the representation (e.g., a particular semantic category) and (random) global switches to distant areas of the representation [9,70,71]. Applications of this modeling approach to verbal fluency tasks have found older adults to exhibit shorter periods of local search than younger adults, which has been attributed to reduced levels of cognitive control [72].
Search strategies and cognitive control do not concern the question of age-related differences in lexical representations directly, but they are nonetheless important; they represent the link between representations and behavior that must be understood to be able to make inferences about the representations underlying observable behavior [8,18,73,74]. Behavior is inevitably determined by both representation and retrieval mechanism, and both are powerful explanations making it difficult to attribute the source of a particular age-related difference unequivocally to either one. This is a major challenge insofar as theoretical and empirical work has suggested age-related differences in both of these components. This has led, for instance, to very different accounts of age-related pathologies for similar types of behavior: Studies have found that semantic cognition of patients with semantic dementia and semantic aphasia could be best accounted for by changes in a controlled retrieval process [66], whereas that of patients with AD had Words with higher centrality in associative networks are retrieved more often as the first responses in letter fluency tasks (using PageRank) and are identified faster as a word (rather than a nonword) in a lexical decision task (using PageRank and node degree). [101,102] Neighborhood Words with many phonological or orthographic neighbors (or large neighborhood sizes) are more difficult to identify in spoken word recognition, are produced faster in a naming task, are more frequently involved in tip-of-the-tongue phenomena, and are subject to stronger inhibitory priming.
[ [103][104][105][106] Words with many semantic or associative neighbors are less likely to be remembered in a free recall task and cued recall tasks, trigger lower feelings of knowing, and are more likely to be accepted in new word combinations. [23,[107][108][109] Words with high phonological clustering are more difficult to identify in spoken word recognition and lexical decision tasks whereas high associative clustering are remembered better in a cued recall task. [23,110] Distance Words with short semantic or associative distance are judged as more semantically related, remembered better in paired-associate learning tasks, retrieved closer to each other in free recall or verbal fluency tasks, produce stronger priming effects in naming tasks, and lead to faster sentence verification and recognition. [22,48,70,102,[111][112][113][114][115][116] [117,118] Words with low phonological or orthographic distance produce stronger priming effects.

Large-scale structure
Shorter average distances between words in a network are assumed to facilitate the exchange of information exchange and have been empirically linked to creativity. [73,[119][120][121] Weak average connections between semantic and phonological representations of words are assumed to drive tip-of-the-tongue occurrences. [5,59] Associative schemata facilitate new learning, but also false-memory. [33,50] Trends in Cognitive Sciences previously been successfully attributed to representational decay [57]. The difficulty with disentangling representation and process has recently been addressed explicitly in an exchange of papers centering on the nature of search in a verbal fluency paradigm [9,74], which culminated in two insights. First, representations created from behavioral data, such as free associations, can contain signals of the retrieval processes involved in producing the behavioral data. Second, understanding the contribution of each component requires independent sources of data, which are seldom available.

All Together Now: Integrative and Interdisciplinary Approaches to Understanding the Aging Lexicon
Extant explanations of age differences in the mental lexicon and their behavioral consequences have typically relied on only a subset of the four components described above, environment,

Box 1. Models of Lexical and Semantic Representation
Multiple frameworks exist for representing lexical information, and each approach offers a unique lens through which to view the lexicon. Three of the most prominent architectures in the current literature are complex networks, connectionist models, and vector-space models ( Figure I).
Networks are a generic approach to represent relational data. In a network model of the lexicon each node represents a word, and the connections between nodes signify some form of lexical or semantic relation (for a recent overview, see [8]). Networks are commonly used in the cognitive literature to represent conceptual relations [90], morphological relationships such as neighborhood size [91,92], and behavioral relationships such as how likely a word is as a response to a cue in free association norms [93,94]. Rather than considering each relationship independently, recent work has begun to consider multiple relationships simultaneously via multiplex networks [95]. The utility of networks for modeling large datasets is bolstered by the availability of novel toolboxes for characterizing, comparing, and visualizing network representations [8].
Where networks are theory-agnostic, connectionist and vector-space models explicitly specify mechanisms by which lexical representations are learned. In a connectionist (also known as neural network) architecture, lexical representation of a word is a distributed pattern across connected layers of nodes. A typical connectionist model has a layer for input, a hidden layer, and an output layer, and representations are learned using an error-correction mechanism such as back-propagation [96]. Connectionist models have frequently been used to understand deterioration of lexical knowledge [66] and age-related impairments of semantic memory [97].
Vector-space models represent words as distributed patterns over latent dimensions (or points in a high-dimensional space). A key distinction of vector is that they learn their representations from statistical regularities in the environment, most typically a large-scale corpus of text. Words that frequently co-occur in text will develop similar representations, but so will words that frequently occur in similar contexts, even if they never directly co-occur (e.g., synonyms). Although classic vector models required batch learning [98], modern versions develop their representations continuously [75]. These continuous vector models are excellent candidates to study change in the lexicon as a function of environmental modulation and to evaluate candidate mechanisms of aging.

Trends in Cognitive Sciences
learning, representation, and retrieval ( Figure 2). For example, whereas some studies focused on the impact of cumulative experience to account for, for instance, paired-associate learning (environment; [6,22]), others considered damage to internal representations and controlled retrieval processes to account for semantic deficits [66], and yet others relied on a combination of attentional deficits (learning) and retrieval processes to account for age-related memory change [2].
Modeling approaches that encompass all four components as sources of age differences are lacking. Ideally, a full account of the aging lexicon should consider all four components to assess

Box 2. Models of Lexical and Semantic Cognition and the Aging Brain
Research on the neural basis of linguistic and semantic cognition has a long history, going back to Paul Broca's work on the localization of language functions. Throughout the 20th century, models evolved considerably, with a shift from localizationist to associative models involving multiple brain areas. Currently, prominent models of linguistic processing distinguish parallel information streams, including, a dorsal stream that maps phonological representations onto articulatory motor representations and involves parietotemporal and frontal brain areas, and a ventral pathway that maps phonological representations onto lexical and conceptual representations and involves mostly temporal brain areas [12]. Models that focus on semantic cognition postulate a distributed network associated with information representation. For example, the prominent hub-and-spokes model describes semantic cognition as emerging from the interaction of a transmodal hub situated in the anterior temporal lobes and linked to modality-specific areasspokesresponsible for the representation of sound, affect, functional, and other attributes that are distributed across the neocortex [13]. Importantly, such models also postulate an important role in control processes involving a distributed neural network that interacts with, but is largely separate from, the network for lexical and semantic representation, and relies heavily on prefrontal brain structures [13].
Evidence about the role of aging in linguistic and semantic cognition is accumulating from studies involving the comparisons of younger and normal (i.e., nonpathological) older adult populations using several different paradigms, such as lexical decision, naming, and semantic judgment tasks. A recent meta-analysis of neuroimaging (fMRI) studies identified age-related reduction in left hemisphere semantic network but increase in right frontal and parietal regions during lexical and semantic tasks. These findings may be interpreted as an age-related shift from language processing-specific to domain-general neural resources, perhaps indicating neurodifferentiation and a role for cognitive control deficits in accounting for age-related differences in linguistic and semantic tasks ( [99]; Figure I). We should note, however, that such crosssectional findings are not always observed longitudinally [100]. Concerning pathological aging, there are various forms of dementia known to be associated with linguistic and semantic cognition, including semantic dementia, which contributed significantly to current understanding of temporal lobe functioning, in particular the anterior temporal pole which is known to be important for cross-modal semantic knowledge, and suggests a role for representational deficits in at least some forms of pathological aging [13].

Figure I. Age-Related Neural Differences in Lexical and Semantic Cognition. Activation likelihood maps for
analyses comparing younger and older adults' lexical and semantic processing [99]. Overall, the results suggest that age groups activated similar left-lateralized regions, but older adults displayed less activation than younger adults in some elements of the typical left-hemisphere semantic network, and greater activation in right frontal and parietal regions. Adapted, with permission, from [99].

Trends Trends in in Cognitive Cognitive Sciences Sciences
Trends in Cognitive Sciences whether age differences can arise from each component independently, their cumulative action, or dynamic interactions among them. Modeling accounts omitting some of the components risk falsely attributing age differences to the subset of evaluated components, when their joint action is more likely.
The goal for future research should be to develop a more integrative formal account of the aging lexicon spanning all four components. To this end, we propose three steps for future research. First, we hope to see researchers build models that integrate ecological and cognitive accounts of age differences in the mental lexicon. Second, the field should deploy large-scale studies that investigate individual and age differences for several indicators of linguistic and semantic cognition to constrain these models. Third, we hope to see increased use of neuroimaging techniques to derive more detailed signals of the contribution of different cognitive components, such as learning, representation, and retrieval. In what follows, we outline a few steps in these directions.
Past research has modeled semantic cognition assuming that representational structure is shared among both younger and older adults [9,71]. Such approaches have favored accounts of aging in the mental lexicon focusing on cognitive aspects [66,71], rather than on the role of the environment, because such frameworks do not capture the impact of environmental exposure on individual and age differences in mental representations. As reviewed above, several results suggest that it is now essential to consider the role of environmental factors. Fortunately, tools to account for the influence of the environment are readily available. Researchers can now choose from a variety of offthe-shelf learning models that turn a continuous stream of environment input, typically large amounts of digitized text, into distributed representation of words and concepts ( [10,75]; Box 1). Recent research has demonstrated that varying the amount of text used for training such models can produce some behavioral patterns that are often otherwise attributed to cognitive decline [6,22]; however, the impact of qualitative differences in the environments remains unexplored. One reason for this is the lack of ambitious context-aware cross-sectional and longitudinal projects that could provide a characterization of the language environments of younger and older adults' over time. Although unprecedented large amounts of contextualized text and speech data are becoming available with the digital revolution [72,76], few of these datasets differentiate age groups or individuals. Thus, one of the challenges for future research is to create age-annotated language corpora. These efforts will need to include measurements of nonlinguistic sensorial information, such as pictures or videos of real-world scenes [77,78], if we are to distinguish the relevance of linguistic versus other types of input to learning and semantic representation. Another challenge is to complement existing learning models to account for the changes in learning arising from the accumulation of knowledge and cognitive and sensory development [6,22].
Representations that are created by training learning models using age-specific language environments have the potential to account for many age differences in cognitive function. To further dissociate the contribution of the four components, large-scale studies that capture a clear set of diverse empirical benchmarks are required. There is a recent trend to conduct so-called megastudies in the domain of memory and language; that is, studies involving the collection of behavioral data on a large number of linguistic stimulinow typically in the order of tens of thousands [79,80]. However, some of these resources on the linguistic environment require considerable effort to collect and do not often focus on age differences (Box 3). Future studies may want to seriously consider individual and age differences and capture multiple outcomes across both laboratory and naturalistic settings [81,82] from the same individuals because these can be linked to different aspects of linguistic and semantic performance that give insight into the learning, representation, and retrieval components [83]. Crucially, researchers should be aware that naturalistic settings can provide increased room for the use of compensatory strategies and contextual cues that older adults use to optimize linguistic performance and seem to contribute to differential age-related patterns of results between laboratory and naturalistic settings [81,82,84].
Moving forward, one particular challenge will be to distinguish the contribution of representational differences from the retrieval processes that operate on these representations. Past computational modeling approaches have found it challenging to separate these components [74].
Although it remains to be seen whether these issues can be addressed using computational modeling, neuroimaging approaches represent a promising source of data for dissociation [13]. For example, there has been some progress in using data-driven methods to provide a map of the neural representation of semantic information [85], and future work could use such techniques to quantify age differences in such representations to assess the degree of longitudinal change across individuals' life span. These neuroimaging techniques may also be used to distinguish or compare the neural representations of linguistic and non-linguistic stimuli [86,87]. Finally, there is significant promise in linking neural biomarkers of age-related decline to both cognitive control and representational aspects of linguistic and semantic cognition (e.g., graymatter density [88]; functional connectivity [85]; AD-specific biomarkers [89]) to understand the contribution of different neural structures and processes to age differences in the mental lexicon.

Concluding Remarks
This review suggests that it is important to consider several explanations to understanding the development of the mental lexicon across the life span, including environmental exposure, as well as age-related changes in learning, representation, and information retrieval. As a consequence, future work in this field will require interdisciplinary teams with expertise in linguistics, computational

Outstanding Questions
To what extent can purely environmental explanations account for reported age differences in lexical and semantic cognition? Are representational deficits necessary to account for differences in normal and pathological aging?
What is the level of dynamic interaction among the four components of our framework (environment, learning, representation, retrieval)? For example, to what degree can chronic differences in mnemonic retrieval strategies change mental representations? Does agerelated change in the structure of mental representations change the linguistic environment that is used as input by other speakers?
To what extent are different types of representation models, such as networkbased, connectionist, or distributional models able to predict and explain the same underlying effects and account for the age differences observed in linguistic and semantic cognition?
How can we build on existing corpora or develop new resources to measure the most important properties of individuals' linguistic environments? Can we annotate existing corpora to include age and sociodemographic information to investigate aspects of the aging lexicon? Is it feasible to deploy mega-studies to capture linguistic, sociodemographic, and biological properties of individuals longitudinally over periods of decades?
How can we integrate the results of different tasks and paradigms, such as reading, language comprehension, and production that may provide contradicting evidence concerning the role of specific mechanisms?
Which neuroimaging methods and analyses can provide the best ways to distinguish learning and search processes from representational deficits in the aging lexicon?
How can or should changes in motivation and goals that direct the cognitive system be captured in models of the aging lexicon? Past research and our overview primarily focused on the integrity and efficiency of information processing, without considering changes in motivation and goals.

Box 3. Resources on the Environment, Representations, and Behavioral Data for Studying the Mental Lexicon Across the Life Span iv
Capturing the (Linguistic) Environment New natural language processing techniques are making large amounts of richly annotated data increasingly available [76]. Currently, a description of single individual's linguistic environment is still challenging as large-scale corpora of written language derived from newspapers and online media has remained mostly aggregate and anonymous and may not be representative of an individual's natural environment. The advent of Internet-based resources and individual tracking is a promising avenue to address these issues [122].
Child-directed speech is already available through the CHILDES corpus [123], whereas adult speech across the life span is covered in a variety of corpora such as the Switch-board I corpus [124]. Written corpora for children based on children books have also been collected in various languages [125,126]. Written corpora for adults are more comprehensive than those of children, albeit annotations are often incomplete. For example, the widely used British National Corpus (BNC; [127]) corpus contains information about author age for only 26% of the sources.

Measuring and Modeling the Mental Lexicon across the Life Span
Mega-studies have become common to sample large amounts of lexical and semantic knowledge from individuals [79]. In most cases they have not directly targeted questions about age differences and do not sample individuals across the full adult life span. In turn, new modeling resources are becoming increasingly available and will facilitate and spur on future computational modeling of aging in linguistic and semantic cognition including prominently open-source software for learning representations i,ii [10] and simulating retrieval iii [128].
Comprehensive datasets on vocabulary development are increasingly available. Wordbank contains measurements of vocabulary in early life derived from over 75 000 children in 29 languages [129]. Also available are extensive measures of vocabulary size and prevalence obtained from hundreds to thousands of adults in various languages [130,131]. Semantic knowledge can be assessed from word association norms such as the Small World of Words project, which currently includes age-annotated word association corpora derived from adult Dutch and English speakers [93,102].
Other mega-studies that cover the life span have focused mainly on behavioral measures. These includes age-annotated lexicon projects (naming and lexical decision reaction times) in a variety of languages (see [131] for an overview).

Trends in Cognitive Sciences
modeling, psychological measurement and testing, and neuroscience, that can simultaneously tackle a description of individuals' linguistic ecologies as well as the cognitive representations and processes that build on an individuals' lifelong experiences (see Outstanding Questions).