Occipitotemporal Representations Reflect Individual Differences in Conceptual Knowledge

Through selective attention, decision-makers can learn to ignore behaviorally irrelevant stimulus dimensions. This can improve learning and increase the perceptual discriminability of relevant stimulus information. Across cognitive models of categorization, this is typically accomplished through the inclusion of attentional parameters, which provide information about the importance assigned to each stimulus dimension by each participant. The effect of these parameters on psychological representation is often described geometrically, such that perceptual differences over relevant psychological dimensions are accentuated (or stretched), and differences over irrelevant dimensions are down-weighted (or compressed). In sensory and association cortex, representations of stimulus features are known to covary with their behavioral relevance. Although this implies that neural representational space might closely resemble that hypothesized by formal categorization theory, to date, attentional effects in the brain have been demonstrated through powerful experimental manipulations (e.g., contrasts between relevant and irrelevant features). This approach sidesteps the role of idiosyncratic conceptual knowledge in guiding attention to useful information sources. To bridge this divide, we used formal categorization models, which were fit to behavioral data, to make inferences about the concepts and strategies used by individual participants during decision-making. We found that when greater attentional weight was devoted to a particular visual feature (e.g., “color”), its value (e.g., “red”) was more accurately decoded from occipitotemporal cortex. We also found that this effect was sufficiently sensitive to reflect individual differences in conceptual knowledge, indicating that occipitotemporal stimulus representations are embedded within a space closely resembling that formalized by classic categorization theory.

magnetic resonancing imaging [fMRI] data). In doing so, we aim to bridge behavioral and neural levels of analysis at the individual level using cognitive models.
When identifying specific objects, agents must typically consider all stimulus features, and the psychological distance between stimuli closely reflects their perceptual attributes (Shepard, 1957;Townsend & Ashby, 1982). During categorization, however, groups of distinct stimuli must be treated equivalently, and both learning and generalization can be improved by selectively attending to relevant stimulus dimensions (Nosofsky, 1986;Shepard, Hovland, & Jenkins, 1961). Although categorization models differ in how stimuli are represented in memory (e.g., as individual exemplars, as prototypes, or as clusters that flexibly reflect environmental structure; Love et al., 2004;Minda & Smith, 2002;Nosofsky, 1987;Nosofsky & Zaki, 2002;Smith & Minda, 1998;Zaki, Nosofsky, Stanton, & Cohen, 2003), they similarly assume that categorization involves learning to distribute attention across stimulus features so as to optimize behavioral performance. Although they differ in their mathematical details, these models also posit that endogenous (i.e., "top-down") attentional control (Miller & Cohen, 2001;Tsotsos, 2011) can modulate the influence of the exogenous (or perceptual) stimulus dimensions on the behavioral choice. The attentional parameters play a key role in allowing the models to capture patterns of human generalization across different goals and different rules. As they also predict human eyemovements during category decision-making (e.g., Rehder & Hoffman, 2005a, 2005b, they are thought to reflect the strategies used by individual decision-makers to integrate information from the external world. In the brain, effects of endogenous attention have been observed across the visual cortical hierarchy (Buffalo, Fries, Landman, Liang, & Desimone, 2010;Jehee, Brady, & Tong, 2011;Kamitani & Tong, 2005Luck, Chelazzi, Hillyard, & Desimone, 1997;Motter, 1993). A general finding is that when attention is devoted to a specific visual feature, its neural representation is more accurately decoded. For instance, in human fMRI, when multiple visual gratings are concurrently presented, representations of attended orientations in areas V1-V4 are more easily decoded than those that are unattended (Jehee et al., 2011;Kamitani & Tong, 2005). Similarly, when random dot stimuli move in multiple directions, representations of attended motion directions in area MTϩ are more easily decoded than those that are unattended (Kamitani & Tong, 2006). Whereas these studies have relied on explicit cues to guide attention to relevant aspects of the stimulus array, in realworld environments, decision-makers must typically rely on knowledge gained through past experience in order to selectively attend to relevant information sources.
Categorization tasks mirror this aspect of real-world environments; decision-makers must rely on learned conceptual knowledge in order to selectively attend to relevant stimulus dimensions. Several studies have investigated whether neural representations of exogenous information sources are modulated by learned conceptual knowledge (e.g., Folstein, Palmeri, & Gauthier, 2013;Li, Ostwald, Giese, & Kourtzi, 2007;Sigala & Logothetis, 2002). Sigala and Logothetis (2002), for instance, trained macaques to categorize abstract images, which varied according to four stimulus dimensions. Neural representations of the two behaviorally relevant stimulus dimensions (i.e., the dimensions that reliably predicted the correct response) in the inferior temporal lobe were enhanced relative to those of the irrelevant dimensions. Using fMRI with human participants, Li et al. (2007) investigated whether neural representations of stimulus motion and shape were influenced by their relevance to the active categorization rule. Using multivariate pattern analysis (MVPA), they similarly found that representations of these stimulus dimensions reflected their relevance to the active rule.
Across studies involving explicit attentional cues and categorization studies involving learned conceptual knowledge, a general finding is that occipitotemporal representations of behaviorally relevant information sources are enhanced relative to those that are irrelevant (this may not hold for integral stimulus dimensions; Garner, 1976). These effects are compelling, as they imply that occipitotemporal representational space may closely resemble that conceptualized by classic cognitive theory (e.g., Kruschke, 1992;Love et al., 2004;Nosofsky, 1986). Specifically, it may expand and contract, along axes defined by perceptually separable stimulus dimensions (Garner, 1976), in ways that closely reflect the idiosyncratic concepts and strategies used by individual participants during decision-making.
Previous studies have relied on contrastive analyses, in which neural representations of attended stimulus dimensions are compared to those of unattended dimensions. Although statistically powerful, this approach defines selective attention in terms of the experimental paradigm (but see O'Bryan, Walden, Serra, & Davis, 2018), and therefore sidesteps effects associated with individual differences in conceptual knowledge (e.g., Craig & Lewandowsky, 2012;Little & McDaniel, 2015;McDaniel, Cahill, Robbins, & Wiener, 2014;Raijmakers, Schmittmann, & Visser, 2014). These effects can be substantial, particularly for illdefined categorization-problems (such as the 5/4 categorization task), which are common in every-day life (Hedge, Powell, & Sumner, 2017;Johansen & Palmeri, 2002). Here, we bridge this divide by combining model-based fMRI Turner, Forstmann, Love, Palmeri, & Van Maanen, 2017) with multivariate pattern analyses. This allowed us to abstract away from individual differences in neural topography (Haxby et al., 2001;Haynes, 2015;Kriegeskorte & Kievit, 2013), to investigate whether neural stimulus representations reflect individual differences in conceptual knowledge. Specifically, we sought to Figure 1. Example: Attention influences psychological space. Left: In an object identification task, both psychological dimensions should receive equivalent attention, as they are equally relevant. Right: In a one dimensional rule-based categorization task, only a single dimension is relevant (in this example, size), and decision-makers could ignore the irrelevant dimension (shape). This is often described as "warping" psychological space such that differences along relevant dimensions are accentuated (or "stretched"), and differences along irrelevant dimensions are downweighted (or "compressed").
investigate whether the attentional parameters derived from formal categorization models predict contortions of occipitotemporal representational space during decision-making.
We investigated this hypothesis using two publicly available data sets (osf.io). In the first (Mack, Preston, & Love, 2013), participants categorized abstract stimuli that varied according to four binary dimensions (Figure 2A), according to a categorization strategy they learned prior to scanning. In the original paper, the authors fit both the Generalized Context Model (GCM; Nosofsky, 1986) and the Multiplicative Prototype Model (Nosofsky, 1987;Nosofsky & Zaki, 2002) to the behavioral data, and used them to compare exemplar and prototype accounts of occipitotemporal representation. Using representational similarity analysis (Kriegeskorte, Mur, & Bandettini, 2008), Mack et al. (2013) additionally identified regions of the brain (lateral occipital cortex, parietal cortex, inferior frontal gyrus, and insular cortex) sensitive to the attentionally modulated pairwise similarities between stimuli. Although these results (particularly those in lateral occipital cortex) imply that neural representations of the individual stimulus features might be modulated by selective attention, in principle, this could also reflect modulation within an abstract representational space where stimulus features are not individually represented. For instance, although visual cortex reflects sensory input (and is known to represent individual stimulus dimensions), prefrontal cortex can flexibly represent conjunctions of features, abstract rules, and category boundaries in a goal-directed manner. Representations in parietal cortex display intermediate characteristics, as they can reflect both sensory and decisional factors (Brincat, Siegel, Nicolai, & Miller, 2018;Jiang et al., 2007;Li et al., 2007).
In the second dataset (Mack, Love, & Preston, 2016), participants learned, while scanning, to categorize images of insects that varied according to three binary perceptual dimensions ( Figure  2B), according to Types I, II, and VI problems described by Shepard et al. (1961). 1 Importantly, although the same stimuli were included in each task, the degree to which each of the features predicted the correct choice differed between rules. The authors fit the SUSTAIN learning model (Supervised and Unsupervised STratified Adaptive Incremental Network; Love et al., 2004) to the behavioral data and used it to investigate hippocampal involvement in the development of new conceptual knowledge. Using representational similarity analysis, they found that SUSTAIN successfully predicted the pairwise similarities between hippocampal stimulus representations across rule-switches. This suggests that hippocampal representations are updated according to goaldirected attentional selection of stimulus features.

Description of Data Sets
In both experiments, participants categorized stimuli that were characterized by multiple perceptually separable stimulus dimensions. As the mapping of perceptual attributes to their role in each category structure was randomized for each participant, it is possible to differentiate effects associated with intrinsic perceptual stimulus attributes from effects of behavioral relevance. For example, while color strongly predicted the correct category choice for some participants, it provided unreliable informative for others. In both experiments, participants were not instructed as to which cues were informative and learned to perform each task through trial-and-error.
We used the GCM for the first dataset (the winning model from Mack et al., 2013), as participants learned how to perform the categorization task prior to scanning. We used SUSTAIN for the second dataset, as it learns on a trial-by-trial basis, and participants learned to perform each task during scanning. SUSTAIN was additionally fit in such a way that the learning of one task carried over to the next. Importantly, although the GCM and SUSTAIN differ in how stimuli are represented in memory (i.e., as exemplars or clusters), they similarly posit that attention "contorts" psychological space, as illustrated in Figure 1. Thus, these studies and models provide a good test of whether attention weights in successful cognitive models are plausible at both behavioral and neural levels of analysis.
The "5/4" Dataset. The first dataset (Mack et al., 2013) was collected while 20 participants (14 Female) categorized abstract stimuli (Figure 2A), which varied according to four binary stimulus dimensions (size: large vs. small, shape: circle vs. triangle, color: red vs. green, and position: left vs. right). Prior to scanning, they learned to categorize the stimuli according to the "5/4" 1 In their paper, Mack et al. (2016) focused on effects associated with the Type I and Type II rules.  categorization task (Medin & Schaffer, 1978) through trial-anderror. During this training session, participants were shown only the first nine stimuli shown in Table 1 (i.e., five Category A members: A1-A5 and four Category B members: B1-B4), and experienced 20 repetitions of each stimulus. During the anatomical scan, they additionally performed a "refresher" task, involving four additional repetitions on each training item. Each training trial involved a 3.5-s stimulus presentation period in which participants made a button press. Following the button press, a fixation cross was shown for 0.5 s, and feedback was then presented for 3.5 s. Feedback included information about the correct category, and about whether the response was correct or incorrect. During scanning, participants were required to categorize not only the training items, but also the seven transfer stimuli (i.e., T1-T7). In the scanner, stimuli were presented for 3.5 s on each trial, no feedback was provided, and stimuli were separated by a 6.5 s intertrial interval. Over six runs, each of the 16 stimuli were presented three times. The order of the stimulus presentations were randomized for each participant. Whole-brain images were acquired using a Medical Systems Signa scanner. Structural images were collected using a T2weighted, flow compensated spin-echo pulse sequence (TR ϭ 3 s, TE ϭ 68 ms, 256 ϫ 256 matrix, 1 ϫ 1 mm in-plane resolution, 33 slices, 3-mm slice thickness, gap ϭ 0.6 mm). An additional T1weighted 3D SPGR structural image was also collected (256 ϫ 256 ϫ 172 matrix, 1 ϫ 1 ϫ 1.3 mm voxels). Functional images were collected using an echo planar imaging sequence (TR ϭ 2 s, TE ϭ 30.5 ms, flip angle ϭ 73°, 64 ϫ 64 matrix, 3.75 ϫ 3.75 in-plane resolution, bottom-up interleaved sequence, gap ϭ 0.6 mm).

A) 5/4 Experiment
The SHJ Dataset. In the second dataset (Mack et al., 2016), 23 right-handed participants (11 female, mean age ϭ 22.3 years) categorized images of insects ( Figure 2B) varying along three binary dimensions (legs: thick vs. thin, antennae: thick vs. thin, and mandible: pincer vs. shovel). We excluded data from two participants who each had corrupted data on one run. This resulted in 21 participants for the final analyses. During scanning, participants learned to categorize the stimuli according to the Types I, II, and VI problems described by Shepard et al. (1961; Table 2). In the Type I problem, the optimal strategy required attending to a single stimulus dimension (e.g., "legs") that perfectly predicted the category label, while ignoring the other two dimensions. In the Type II problem, the optimal strategy was a logical XOR rule, in which two stimulus features had to be considered together. In the Type VI problem, all stimulus features were relevant to the decision, and participants had to learn the mapping between individual stimuli and the category label. To maximally differentiate endogenous and exogenous factors, the irrelevant feature in the Type II rule was used as a relevant feature of the Type I problem for each participant.
Each problem was performed across four scanner runs. Although all of the participants learned to perform the Type VI problem first, the order of the Types I and II problems was then counterbalanced across participants. Each trial consisted of a 3.5 s stimulus presentation period, a jittered 0.5-4.5 second fixation period, and feedback. Feedback was presented for 2 s and consisted of an image of the presented insect, as well as text indicating whether the response was correct or incorrect. Each trial was separated by jittered intertrial interval (4 -8 s), which consisting of a fixation cross. Each run included four presentations of each of the eight stimuli.
For consistency across data sets, we used the group-derived region of interest (ROI) used in "5/4" dataset ( Figure 3B) and performed a similar analysis. As participants in the SHJ experiment learned to perform the Types I, II, and VI problems during scanning, we mirrored the strategy used by the original authors, and divided the scanning sessions into early (first two runs of each problem) and late learning epochs (last two runs of each problem). We investigated the relationship between occipitotemporal representation and attention only during this late learning phase, in which behavior had largely stabilized.
SUSTAIN was initialized with no clusters, and with equivalent weights assigned to each stimulus dimension. Its learning parameters were first fit to the learning performance of each participant using a maximum-likelihood genetic algorithm procedure. The model was fit in such a way that, after learning one problem, the resultant model state was used as the initial state for the subsequent problem. In this way, the model was fit under the assumption that learning of one task would influence later behavior. Once the learning parameters of the model were optimized, they were fixed, and the attentional parameters were extracted from the second two runs of each task (in which learning had largely stabilized). This yielded distinct sets of attentional parameters for each participant Table 1 The "5/4" Category Structure Note. Prior to scanning, participants learned, through trial and error, to categorize the first nine stimuli (Category A: A1-A5; Category B: B1-B4) illustrated in Figure 2A. During scanning, they categorized both the training and the transfer (T1-T7) stimuli. Perceptual stimulus dimensions ( Figure 2) were pseudo-randomly assigned to category dimensions for each participant.
and each task. More information about the model can be found in Appendix B.

Image Processing
Preprocessing included motion correction, and coregistration of the anatomical images to the mean of the functional images (using statistical parametric mapping [SPM], Version 6470). All MVPA analyses were performed in native space without smoothing. For group-level analyses, the statistical maps from each participant were warped to Montreal Neurological Institute (MNI) atlas space using Advanced Normalization Tools (ANTs; Avants, Tustison, & Johnson, 2009), and then smoothed with a 6 mm full-width at half maximum Gaussian kernel. The ROI derived from group-level analyses were transformed back into each participants' native space for ROI-level analyses. We performed MVPA on the unsmoothed, single-trial, t-statistic images (Misaki, Kim, Bandettini, & Kriegeskorte, 2010) derived from the least-squares separate procedure (LSS; Mumford, Turner, Ashby, & Poldrack, 2012). We used SPM to estimate the LSS images for the "5/4" dataset but used the NiPy python package (http://nipy.org/nipy/index.html) for the SHJ dataset, as it tends to run more efficiently, and this study used a multiband sequence with smaller voxel dimensions.

Representations of Individual Visual Features
To identify regions most strongly representing the stimulus features, we performed a cross-validated searchlight analysis (sphere radius ϭ 10 mm.; Kriegeskorte, Goebel, & Bandettini, 2006) 2 in which we decoded each of the four visual features (position, shape, color and size). We performed the analysis in native anatomical space, using a linear support vector classifier (SVC; C ϭ 0.1; using the Scikit-learn python package; Pedregosa et al., 2011) in conjunction with a fivefold, leave-one-run out, cross-validated procedure. This involved repeatedly training the model on four of the five runs, and testing whether it could accurately predict the stimulus features associated with the heldout neuroimaging data.
After centering each of the resultant statistical maps at chance (50% for each visual feature), we created a single map for each participant, which reflected the average, above chance, decoding accuracy across features. We then normalized each map to to MNI space and, in order to identify regions supporting above chance feature decoding, performed a group-level permutation test. This involved randomly flipping the sign of the statistical maps 10,000 times (using the randomise function from the Oxford Centre for Functional MRI of the Brain Software Library [FSL]; Winkler, Ridgway, Webster, Smith, & Nichols, 2014). The familywise error rate was controlled using a voxelwise threshold of p Ͻ .001. This identified right middle frontal gyrus (BA9) and left postcentral motor cortex, as well as widespread visual and association cortex, extending dorsally from occipital pole to the bilateral superior extrastriate cortex and bilateral intraparietal sulcus, and ventrally into the bilateral lingual gyrus (see Appendix Table A1). As this procedure yielded a spatially-distributed pattern of activity, we increased the minimum t-statistic threshold (from 6.24 to 9) to isolate voxels most strongly representing the individual stimulus features. This removed voxels belonging to the bilateral inferior occipital cortex, left lingual gyrus, bilateral intraparietal sulcus, and bilateral precuneus. The resultant ROI is illustrated in Figure  3B.

Effects Associated with Conceptual Knowledge
"5/4" Dataset. First, we confirmed that each stimulus feature could be decoded significantly above chance from the ROI illustrated in Figure 3B. Although estimating effect sizes on voxels selected through nonorthogonal criteria is circular, testing significance at the ROI-level has been recommended to confirm that information exists, not only at the level of the searchlight sphere, but also at the level of the ROI (Etzel, Zacks, & Braver, 2013). This analysis also allows us to illustrate the individual feature decoding accuracies for each participant ( Figure 3C). The analyses were performed in the native anatomical space of each participant using the cross-validated SVC analysis described above (but setting the C parameter to 1 instead of 0.1, which was chosen for the searchlight analysis to improve computational efficiency). 3 Each feature could be decoded at rates significantly above chanceshape: M ϭ 0.60, SE ϭ 0.02, t (19)  Next, we investigated whether the decoding accuracy of the individual perceptual dimensions covaried with the GCM attentional parameters. To do so, we fit a mixed-effects linear regression analysis (as implemented in the lme4 package for R) using restricted maximum likelihood. We included fixed-effects terms for the intercept, the attentional weights, and each visual dimension (e.g., "color"). We also included random effects terms (which 2 This involves moving an imaginary sphere throughout the brain; repeatedly investigating how well the voxels within the sphere can decode a variable of interest. 3 The C parameter modulates the penalty associated with training error. With large values, the classifier will choose a small-margin hyperplane, and training accuracy will be high. With smaller values, out-of-sample performance is often improved, but more training samples may be misclassified. Cϭ1 is a common default setting for fMRI. Note. Participants learned by trial-and-error to perform the Type I (a one-dimensional rule-based categorization task), Type II (a twodimensional XOR rule-based categorization task), and Type VI (a threedimensional task requiring memorization of the individual stimuli) problems during scanning. For each participant, perceptual stimulus dimensions ( Figure 2) were randomly assigned to these abstract category dimensions.
were free to vary between participants) for the intercept and the attention weight parameters. This allowed us to control for baseline differences in decoding accuracy between participants, and for shared (group-level) differences in decoding accuracy between visual dimensions. We used the Kenward-Roger approximation (Kenward & Roger, 1997) to estimate degrees of freedom (reported below) and used single-sample t tests to calculate p values for each coefficient (using the pbkrtest package for R; Halekoh & Højsgaard, 2014). 4 We computed 95% confidence intervals (CIs) using bootstrap resampling (1,000 simulations). The decoding accuracy of each stimulus dimension positively covaried with the behaviorally derived GCM parameters (b ϭ 0.08, 95% CI [0.01, 0.16], SE ϭ 0.04, t(28.71) ϭ 2.26, p ϭ .032), indicating that the decoding accuracy of these representations reflected their importance during decision-making.
To investigate the sensitivity of occipitotemporal feature representations to individual differences in GCM attentional weights, we conducted a permutation test. This involved shuffling the attentional weight parameters between participants (i.e., swapping the weights derived from one participant with those derived from another), and repeating the regression analysis (described above) 10,000 times. On each permutation, the correspondence for category dimensions (i.e., the dimensions depicted in Table 1, as opposed to the stimulus dimensions illustrated in Figure 1) was preserved, such that the dimensional weights derived from the behavior of one participant were assigned to the same dimensions, but to a different participant.
The unpermuted beta coefficient (b ϭ 0.08) was significantly greater than those composing the null distribution (P ϭ .994), indicating that the decoding accuracy of the occipitotemporal representations was sensitive to between-subjects differences in the attentional weights. This could reflect idiosyncratic differences in behavioral strategy, and/or effects associated with perceptual saliency. Therefore, to investigate whether visual salience may have influenced attention, we conducted a repeated measures analysis of variance (ANOVA) for the perceptual features. There was no significant relationship between these visual features and the attentional parameters, F(3, 57) ϭ 0.68, p ϭ .56. A Bayesian repeated-measures ANOVA (Rouder, Morey, Verhagen, Swagman, & Wagenmakers, 2017), additionally indicated that the null model was 4.65 times more likely than the alternative hypothesis. These results provide evidence that the observed effects were not driven by visual characteristics of the stimulus features.
SHJ dataset. First, we confirmed that each stimulus feature could be decoded significantly above chance from the ROI illustrated in Figure 3B. Using a fourfold, leave-one-run out crossvalidation strategy, we used a linear support vector classifier (C ϭ 1) to decode each visual feature across all runs (including both early and late learning epochs), retaining only estimates for the last two runs (which corresponded to the late-learning phase in which behavior had largely stabilized). This fourfold cross-validation strategy yielded better decoding accuracy than a twofold approach based on only the last two runs. This improvement reflects the increased amount of training data available in the fourfold approach and suggests that the multivariate patterns reflecting the individual visual features were stable across learning. Each feature could be decoded at rates significantly above chance ( Figure 3D Next, we investigated whether the decoding accuracy associated with the features covaried with SUSTAIN's attentional parameters. To do so, we used a mixed-effects linear regression analysis to predict decoding accuracy from attention weight, visual dimension, run and rule. As described in the Methods section, distinct attentional weights were derived for each subject and each rule. The decoding accuracy for each separate run was included in the analysis. The model included fixed-effects parameters for these four variables, and random-effects parameters for the intercept, attention weight, and run (which were free to vary by participant). This allowed us to control for differences in decoding accuracy across visual dimensions and participants (as with the model used for the "5/4" dataset), while additionally controlling for effects of rule and idiosyncratic differences in behavioral performance during the last two runs. Mirroring the findings from the "5/4" dataset, we found that the decoding accuracy of these patterns positively covaried with the attention parameters derived from SUSTAIN (b ϭ 0.09, 95% CI [0.004, 0.17], SE ϭ 0.04), t(61) ϭ 2.13, p ϭ .038.
To investigate the sensitivity of occipitotemporal feature representations to individual differences in SUSTAIN's attentional parameters, we conducted a permutation test similar to that described above (i.e., for the "5/4" experiment). This involved shuffling the attentional weight parameters between Participants 10,000 times (preserving the correspondence for both rule and abstract feature). This means that the attentional weight derived from the behavior of one participant, for one particular rule and one particular category feature, was assigned to the same rule and feature, but to a different participant. The slope parameter associated with the unpermuted data (b ϭ 0.09) was significantly greater than those composing the permuted null distribution (P ϭ .979), suggesting that the visual feature representations were sensitive to idiosyncratic differences in attentional weights. A repeated measures ANOVA indicated that the perceptual dimensions did not influence the attentional parameters, F(2, 44) ϭ 1.27, p ϭ .291. A Bayesian repeated measures ANOVA additionally indicated that the null model was 1.98 times more likely than the alternative hypothesis, providing evidence that the attentional weights were not influenced by visual properties of the stimulus features.

Overview
Although differing substantially in how concepts are represented (e.g., as exemplars, prototypes, or clusters), formal categorization theories (e.g., Kruschke, 1992;Love et al., 2004;Nosofsky, 1986) tend to share a similar conception of selective attention. In these models, conceptual knowledge contorts multidimensional psychological space such that differences along behaviorally relevant dimensions are accentuated, and differences along irrelevant dimensions are down-weighted (Figure 1 and Equations 1 and 4 in Appendix B). In two data sets (Mack et al., 2013(Mack et al., , 2016, we evaluated the neurobiological plausibility of this idea by investigating whether occipitotemporal stimulus feature representations covaried with attention parameters derived from formal categorization models. We found that this effect was not only apparent at the group-level but was sufficiently sensitive to reflect individual differences in conceptual knowledge. Several previous studies have demonstrated that occipitotemporal stimulus representations are modulated by selective attention (e.g., Buffalo et al., 2010;Jehee et al., 2011;Kamitani & Tong, 2005Luck et al., 1997;Motter, 1993;Reynolds & Chelazzi, 2004;Reynolds, Pasternak, & Desimone, 2000) and by learned conceptual knowledge (e.g., Folstein et al., 2013;Li et al., 2007;Sigala & Logothetis, 2002). These studies have relied on statistically powerful contrastive approaches, in which representations of attended stimulus dimensions are compared to those of unattended dimensions. A general finding is that attended stimulus dimensions are more easily decoded than those that are unattended. This implies that occipitotemporal representational space might resemble that conceptualized by formal categorization theory (e.g., Kruschke, 1992;Love et al., 2004;Nosofsky, 1986). Specifically, the expansion and contraction of this space might closely reflect individual differences in the importance assigned to each stimulus dimension. However, as the contrastive approach defines selective attention with regards to the experimental paradigm, it is insensitive to individual differences in categorization strategy (e.g., Craig & Lewandowsky, 2012;Little & McDaniel, 2015;McDaniel et al., 2014;Raijmakers et al., 2014). Here, we link individual differences in behavior to individual differences in neural representation through consideration of the attentional parameters derived from formal categorization models.
We are not the first to link brain and behavior via latent model parameters. In the perceptual decision-making literature, for instance, several groups have fit the drift diffusion model (Ratcliff, 1978) to behavioral data, and identified regions of the brain where the BOLD response reflects variation in its drift rate, bias, and threshold parameters (e.g., Forstmann et al., 2008;Mulder, Wagenmakers, Ratcliff, Boekel, & Forstmann, 2012;Purcell et al., 2010). As in the present study, several of these studies demonstrated that individual differences in behavioral strategy are reflected in the brain. Instead of linking latent model parameters to univariate BOLD amplitude, however, we used MVPA to link latent parameters to multivoxel representations of the stimulus features. This provided a precise test of the idea that selective attention contorts neural representational space.
Finally, it is worth noting that, although we observed effects of selective attention across two different stimulus sets (abstract shapes in the "5/4" experiment, and insects in the SHJ experiment), and across multiple category structures (the "5/4" problem described by Medin & Schaffer, 1978, and the Types I, II, and VI problems described by Shepard et al., 1961), these effects might not be apparent for all stimuli and tasks. For instance, although category training can improve perceptual discriminability of relevant stimulus features when stimuli consist of perceptually separable features (Garner, 1976), this may not occur for integral dimensions (Op de Beeck, Wagemans, & Vogels, 2003) or for stimuli defined according to "blended" stimulus morphspaces (Folstein et al., 2013). More work is needed to better understand how attention influences occipitotemporal representations for such stimuli. One possibility is that selective attention does not warp perceptual representations of integral stimulus dimensions but might operate on abstract cognitive or "decisional" representations in higher-order cortex (Jiang et al., 2007;Nosofsky, 1987).
By linking brain and behavior through the latent attentional parameters of cognitive models, we also link two (somewhat) disparate literatures. In the neuroscience literature, effects of selective attention are typically examined using highly structured decision problems, and selective attention is investigated by contrasting different aspects of the experimental design (i.e., relevant vs. irrelevant stimulus dimensions). In the cognitive categorization literature, researchers have focused on developing models that accurately account for behavioral patterns of generalization across different goals and tasks. Our results indicate that these cognitive models can be used to examine effects of selective attention in the brain. This is the case, even for ill-defined decision problems (such as the "5/4" task), as the models are able to successfully account for individual differences in conceptual knowledge.

Context
Brad Love has a longstanding interest in models of categorization. He developed the SUSTAIN model (Love et al., 2004) used here and subsequently became interested in how to theoretically relate such models to the brain (Love & Gureckis, 2007). Later, he used category learning models in model-based fMRI analyses, such as in the two papers from which this contribution draws its data (Mack et al., 2013(Mack et al., , 2016. Through several papers, Kurt Braunlich has investigated neurobiological mechanisms associated with categorization and generalization. Recently (Braunlich, Liu, & Seger, 2017), he found that occipitotemporal category representations are highly flexible, in that they are sensitive to transient generalization demands (i.e., strict vs. lax decision criteria). This dovetails with the present work, which examines attentional effects associated with task demands.
where the ␥ parameter (which is always non-negative) modulates the influence of the parameters on the choice outcome. When ␥ is large, attended dimensions (which are associated with large values, and narrow RFs), dominate the activation function (Eq. 5); when ␥ is zero, the parameters are ignored, and all dimensions exert equal influence on the choice.
SUSTAIN was fit to the SHJ dataset in a supervised fashion, using the same trial order experienced by the participants; it was also fit across rule-switches, such that learning from one task was carried over to the next. Thus, SUSTAIN was capable of reflecting learning, as well as carry-over effects associated with previously learned rules.