Legitimate Reactivity in Measuring Social Phenomena: Race and the Census

As a result of being measured, individuals sometimes alter their behavior and attitudes to such extent that subsequent measurement results are affected. This ‘reactivity’ to measurement problematizes prediction and explanation, but some reactivity is nevertheless legitimate. Using the example of the measurement of race in the US Census, this article demonstrates that some forms of reactivity do not affect the accuracy of research. The article argues that legitimacy of reactivity depends on the metaphysical status of the phenomenon being measured. It is argued that reactivity’s legitimacy is affected by the arbitrariness of a measure and the voluntariness of measurement results.


Introduction
If a research subject knows they are being studied, this knowledge may affect their attitudes or behavior to such an extent that it affects subsequent research. This broad range of effects is called reactivity. For instance, reactivity occurs in the study of race when individuals or whole groups respond to census instruments and the social categories therein to such a degree that subsequent census categories must evolve. As I will show below, reactivity can take many forms: from a subject reinterpreting the key terminology in a survey (leading to different answers), to recalibrating their own position relative to others, to simply altering their behavior because they know they are observed.
Reactivity problematizes prediction and explanation since the content of the measure changes from one measurement to the next. Therefore, it is common to try to mitigate for reactivity, since reactivity might make measurement results 'unstable', e.g., measure different things from one survey to the next (French and Sutton 2010;2011;Jiménez-Buedo 2021;Webb et al. 1966). Strategies that mitigate reactivity often center on finding some biological mechanisms that underly the phenomenon of interest (cf. Tsou 2007). What makes an answer 'right,' it is argued, depends in large part on whether these mechanisms remain respected. A measurement result which tracks the underlying mechanisms is more accurate than one that does not. But what should we do when such mechanisms do not exist, e.g., in the case of race? What then determines whether reactivity is acceptable?
In this article, I argue that some reactivity is legitimate, that is, some cases of reactivity do not affect the accuracy of research. I show that the extent to which one accepts reactivity as legitimate will depend on one's metaphysical commitments. In the case of the US Census, legitimacy of reactivity depends on one's metaphysics of race.
The article is set up as follows. In section 2, I lay the foundation for the article, describing the stages that measurement consists of and the different forms reactivity can take. In section 3, I show that when a measure chosen for a phenomenon of interest is to some degree arbitrary (a term I define in detail), reactivity may be legitimate, meaning that this reactivity does not change the accuracy of the measure. In such cases, individuals can reinterpret phenomena in the measurement tool without thereby being 'false to nature.' In section 4, I use the example of the measurement of race in the US Census to show that the arbitrariness of the measure alone is not sufficient to decide whether reactivity is legitimate. Reactivity in the measurement of race by census instruments is only legitimate under certain metaphysical assumptions. Specifically, as I will argue, legitimacy also depends on a second dimension of voluntariness. In section 5, I combine arbitrariness and voluntariness in a single taxonomy and discuss how it can be applied to studying the (legitimacy of) reactivity in the measurement of social phenomena in general. In section 6, I briefly discuss two topics for further research, viz., the idea of reactivity as giving authority to the 'known' and the difference between individual and group reactivity.

Stages of Measurement
To start, let us investigate in brief the process of measurement and connect this to the different forms that reactivity can take. The first methodological step in measurement of any phenomenon is to delineate the phenomenon's boundaries in such a way that it is fit for measurement, i.e., to design a measurement concept. Let us call this conceptualization (Cartwright and Runhardt 2014;Cartwright, Bradburn, and Fuller 2017;Runhardt 2021). A conceptualization goes together with a particular representation (e.g., as a two-valued variable, a list of mutually exclusive categories, a continuous variable, or a table of indicators). The final stage of measurement is choosing apt procedures for collecting data that fit with this conceptualization-representation pair.
To illustrate the three stages of measurement, consider the example of the U.S. Census data collection on race, which is tethered to guidelines from the Office of Management and Budget (OMB), in particular its Directive No. 15 and the revisions to this directive in 1997 (cf. Office of Management andBudget 1997, 2000). In 2020, the Census asked the question "What is this person's race?", with the five main racial categories from OMB guidelines as pre-print options to choose from, viz., White, Black or African American, American Indian or Alaska Native, Asian, and Pacific Islander (with different subcategories for both the Asian and Pacific Islander categories based on national origins). Following OMB guidelines, the respondent could mark more than one of these categories and/or print their own race or origin (Olmsted-Hawala and Nichols 2020). Households received instructions for filling in the Census in the mail and were able to respond online, by mail, or by phone.
In the 2020 US Census (and the underlying OMB guidelines), race can be seen as referring to an identity, conceptualized as a racial self-classification. 1 This is not the only possible conceptualization for race; as Wendy Roth (2016) shows, race can refer to self-defined identity, observed race, reflected race, phenotype, or ancestry. In general, the conceptualization of race for measurement practice, such as the Census, can be seen as an attempt at transforming complex racial identities "into a more familiar language, followed by a purposeful recoding of race using a comprehensible, standardized, easier-tounderstand syntax" (Thompson 2016, 13-14). This initial step includes normative choices "about what characteristics are most important for determining racial similarity and difference and what criteria should be used to evaluate claims to identity" (Thompson 2016, 14).
While conceptualization concerns what phenomena the term race is thought to refer to, representation concerns the subcategories of the term and their boundaries. Until the 2000 US Census, respondents were not allowed to report more than one race, since the OMB representation of race at the time considered the categories to be mutually exclusive. As a result, one could not e.g., mark both White and Black. When the OMB revised its representation of race in Directive No. 15 in 1997, insisting that surveys ought to allow respondents to report more than one race, the 2000 US Census representation changed accordingly (American Anthropological Association 1997). One could now check multiple categories, allowing for multiracial self-identification and thereby greatly expanding the number of possible answers.
For procedures, we must look amongst others at the effects of the circumstances under which a measurement is performed. Measurement is affected by the structure of answers: in the US Census, individuals are given categories from which they must choose their identity, rather than asked what that identity is without any suggested answers. Table 1 below sums up the three stages of measurement.

Types of Reactivity
In the remainder of this section, I will outline three different types of reactivity which can be found in the literature (cf. Golembiewski, Billingsley, and Yeager 1976;Runhardt 2021).
Alpha change refers to cases when an individual changes something about themselves because of being measured. An earlier categorization has resulted in changes in behavior or individual properties, which affects subsequent measurement results. For example, an initial look at the average number of steps an individual takes daily, as measured by that individual's smartphone pedometer, makes some individuals start walking more (Bravata et al. 2007; 2010;Michie et al. 2009). In other words, the next time the individual consults their phone, the average number of steps taken daily will likely be higher.
To show alpha change occurs in the measurement of race, we would have to show that there is a clear link between, firstly, initial categorization and a change in behaviors or individual properties, and secondly, between these changed behaviors or properties and later categorization results. There exists evidence that at least this second link exists for racial self-identification (the measurement target of the Census). Aliya Saperstein and Andrew Penner (2012) discuss several factors that may cause individuals to change how they identify themselves, e.g., upwards social mobility causes some individuals to move away from self-identification as Black. Note that as defined above, for true alpha reactivity the first causal link (between initial categorization and the change in behaviors or properties) must exist as well. In this example, one's attempt at social mobility would have to be the result of an earlier Census categorization. To the best of my knowledge, this link has not yet been investigated, and thus requires further research. However, the Saperstein and Penner conclusion that racial population membership at least is not "fixed for each individual" (Saperstein and Penner 2012, 679) is a crucial first step in this direction.
In beta change, an individual recalibrates their place on a scale, or changes their categorization, because of being measured. For example, several studies show that if a quality of life survey prompts the research subject to make social comparisons (e.g., downward comparisons to people worse off), this impacts on the quality of life this subject reports. (Gibbons 1999;Sprangers and Schwartz 1999;VanderZee, Buunk, and Sanderman 1995;Wood, Taylor, and Lichtman 1985). The research subject shifts their answer because they interpret the scale underlying the concept for quality of life anew; the subject 'recalibrates' their position. Beta change can thus be linked to the representation stage of measurement design, above. Some scientific research has shown that multiracial individuals' racial identity is also fluid in this sense (Rockquemore and Brunsma 2002; Saperstein and Penner 2012).
Finally, in gamma change, the individual changes their interpretation of a key term because of being measured: "gamma change involves a redefinition or recharacterization of some domain, a major change in the perspective or frame of reference within which phenomena are perceived and classified, in what is taken to be relevant in some slice of reality" (Golembiewski, Billingsley, and Yeager 1976, 135). Thus, this change is intricately linked to the conceptualization stage of measurement. Carina Marsay and colleagues (Marsay, Manderson, and Subramaney 2018) report such reactivity in the measurement of antenatal anxiety and depression: in their case study, women gained self-knowledge of their mental health during initial measurement, and subsequently reported different levels of depression since they were better able to put their own feelings into words (cf. Runhardt 2021). There is evidence that gamma reactivity exists in measurement of race: Saperstein and Penner (2012) have shown that individual will, under specific measurement instructions, believe a question on their race refers to 'observed race', and answer accordingly. Under different instructions a research subject will instead answer the race they personally identify as. Such changes are related to the conceptualization stage of measurement, and thus gamma reactivity. To sum up, Table 2 outlines the different types of reactivity and their associated measurement stage.

Arbitrariness
Whether reactivity is legitimate depends on the phenomenon being measured. Intuitively, we may think that accurate measurement of a simple phenomenon such as 'number of steps taken each day' becomes inaccurate if the underlying concept were reinterpreted (gamma change) to refer to any kind of movement, including car rides, or if the person's place on a scale or placement in a category were recalibrated (beta change) by making a pedometer more sensitive, e.g., counting twice as many steps as the person is physically taking. The reason that reactivity is illegitimate here is that the choice to use a pedometer for measuring daily steps instead of, say, an odometer or surveyor's wheel, is not arbitrary. Given that we are interested in actual steps taken, the solution of using a pedometer is especially apt.
Compare the above to measuring 'happiness,' e.g., in a simple self-report tool like the UK Annual Population Survey question "Overall, how happy did you feel yesterday?" Arbitrariness is pervasive in this example. The representation, on a scale of 1-10, for how happy one feels is at least in part the result of chance occurrences in our social history, as is the choice to conceptualize happiness a certain way (life satisfaction, hedonic state, emotional state, or something else). Even given that we are interested in happiness, the solution to use this question in the Annual Population Survey could certainly have been otherwise. As Dan Haybron points out, "Investigators [of happiness] may never enjoy the precision of the 'hedonimeter' once envisaged by Edgeworth to show just how happy a person is (Edgeworth 1881). Indeed, such a device might be impossible even in principle, since happiness might involve multiple dimensions that either cannot be precisely quantified or summed together." (Haybron 2020) The intuitive distinction between measures that are more or less arbitrary can be made more rigorous by utilizing a concept from the philosophical literature on conventions. There, Cailin O'Connor has recently developed a helpful arbitrariness concept (O'Connor 2021) defined in terms of "an information-theoretic measure intended to capture the degree to which a solution to a certain social problem could have been otherwise" (O'Connor 2021, 579). 2 Her scale of arbitrariness ranges from fully 'functional' to fully 'arbitrary/conventional'. O'Connor uses her arbitrariness concept to explain the evolution of social traits and the degree to which such traits could have been otherwise. For instance, O'Connor applies her measure to the gendered division of labor and argues that there are some functional constraints there: the parent who breastfeeds can best take care of infants, the one who does not can hunt big game. This division is strongly constrained by biological factors and therefore on the functional end of the arbitrariness scale. On the other hand, O'Connor cites empirical evidence that shows there exist gendered labor divisions for which there are no clear functional constraints, e.g., the making or rope, which is seen as a male task in many societies without there 2 The specific information theoretical details in O'Connor's definition, including the formula for calculating the degree of arbitrariness exactly, are not as relevant for this article as her more fundamental argument that arbitrariness comes in degrees. For more details on the information-theoretic measure she uses, see O'Connor (2021, 586-90). existing a clear biological reason (Murdock and Provost 1973). In this case, the gendered division of the task is arguably due to chance occurrences and therefore on the more arbitrary/conventional end of the scale.
In order to capture the intuitive differences in arbitrariness between, e.g., the pedometer and happiness survey question, I suggest we expand O'Connor's definition of arbitrariness to the degree to which a particular measurement (i.e., the trio of conceptualization, representation, and procedures) could have been otherwise. For instance, this would allow us to rank not just to the arbitrariness of some gendered division of labor (a solution to a social problem) but also to rank the arbitrariness of some measurement of gender (e.g., that in the U.S. Census Bureau's Household Pulse Survey). Is this particular measurement functionally constrained or are its conceptualization, representation, and procedures due to chance occurrences in our social history? If we were to 'restart history', could we end up with, e.g., a measurement of gender that uses a different but equally valid set of gender categories?
Note that in this expansion of O'Connor's arbitrariness concept, arbitrariness is a property not of the underlying phenomenon 'gender' but rather of some measurement developed to capture this phenomenon. Yet the degree of arbitrariness of a measure it is not completely independent of the phenomenon being measured. My new notion of arbitrariness deals well with the distinction between phenomena which have certain central properties (a fixed set of necessary and sufficient conditions) and phenomena which have no such properties (sometimes called 'Ballung concepts'). A measurement of the former needs to track the fixed set of necessary and sufficient conditions and should therefore not be arbitrary, if developed to accurately reflect this set. A formalization of the latter, on the other hand, will involve somewhat arbitrary choices, e.g., a particular delineation with necessary and sufficient conditions being determined by context and research purpose, say, rather than the phenomenon's inherent properties.
Summing up, if we were to apply O'Connor's degree of arbitrariness to the examples in the preceding section, a measure like the happiness question in the UK Annual Population Survey has a greater degree of arbitrariness than, say, a smart phone's pedometer for 'steps taken.' The latter is more readily determined by one concrete physical aspect of our world and there is a sense in which the conceptualization, if accurately modeling this aspect, could not have been otherwise. This directly affects the legitimacy of reactivity, as can be represented with the simple scale in Figure 1.

How Arbitrary is the Measurement of Race?
Coming back now to the main example of this article, the measurement of race in the US Census, we ought to ask how arbitrary this measurement is. Many have argued that race phenomena do not have a straightforward biological basis (Block 1995;Glasgow 2003;Goodman 2000;Rosenberg et al. 2002;Templeton 2013;Yudell et al. 2016), with the consensus position in the literature being 'anti-essentialist', against race-as-biology, arguing that instead race is socially defined. As Charles Mills puts it: "Race is not 'metaphysical' in the deep sense of being eternal, unchanging, necessary, part of the basic furniture of the universe. But race is a contingently deep reality that structures our particular social universe (…)." (Mills 1998, 48).
An essentialist theory of race would be on the functional end of the spectrum in Figure 1. Take the US Census. If race were a purely biological characteristic (and thus measures of it were not arbitrary), certain proxies would be better than others. If developed accurately to model this characteristic, the measure could not have been otherwise. This also implies, for example, that there would be a single right answer to the 2020 US Census question "What is this person's race?". However, if we assume that race is not essentialist, this means that we should put the US Census measurement of race on the arbitrary end of the Figure 1 spectrum.
A possible objection to this argument derives from Quayshawn Spencer's biological racial realist interpretation of specifically the Office of Management and Budget (and by extension the US Census's) use of 'race' and its five racial categories (Spencer 2019a). Spencer argues that OMB race talk refers to a real set of five human continental populations (Africans, East Asians, Eurasians, Native Americans, and Oceanians), i.e., five empirically verifiable "continentlevel human genetic clusters" (Spencer 2016, 791). So, while Spencer agrees with the anti-essentialists that some racial classifications are indeed not 'biologically real', he argues that OMB classifications do make biological sense, in terms of "genomic ancestry" of each of the racial categories. For example: Figure 1. Degree of arbitrariness of a measure.

"[T]he [OMB] meaning of 'Black' is the African population. Thus, a Black person is a person with genomic ancestry from the African population. That's it.
In other words, if any allele in a person's genome originated from the African population, that person is Black. Furthermore, the degree to which a person is Black is equal to the proportion of her alleles that originated from the African population." (Spencer 2019a, 101) Spencer supports his claims amongst others by referencing several geneticists who are "able to predict the self-reported OMB race of most US adults with very high accuracy" (Spencer 2019a, 102).
A critic may now argue, based on Spencer's biological racial realist view, that there is a single correct way of measuring race in the Census question. Though previous iterations of the Census did not accurately measure race (being influenced by racialist views), the OMB is getting closer to this 'single correct way'. Crucially, this objection would imply that reactivity to measurement by the US Census is always illegitimate.
To see how this objection fails, we should recognize that the decision by the OMB to conceptualize race as these five continental populations is itself a decision that has some degree of arbitrariness as we defined it in the previous section. Spencer himself has argued convincingly that "[t]he OMB's meaning of 'race' is not the only meaning of 'race' used in US race talk" (Spencer 2019b). 3 Spencer's view, which he calls radical racial pluralism, is that "there's a plurality of natures and realities for race" in US race talk (Spencer 2019b, 27). I would argue that radical racial pluralists, like anti-essentialists, would recognize the degree of arbitrariness in the OMB definitions of race and the resulting US Census question. 4 What are the consequences of the arbitrariness of measurement of race for our discussion of reactivity? Given the scale of arbitrariness in Figure 1, both philosophers of race who argue that there is no biological 3 Katherine Jenkins has shown that this has direct implications for interpreting respondents' answers to OMB-based survey questions. Consider Jenkins' discussions of the Becca Khalil case (Jenkins 2019, 55-56). Khalil's response to an (OMB-based) college application question on her racial self-classification does not match with her actual self-classification, but rather with her reflected race (how she thinks others, in this case the college, will classify her). This is arguably a (complex) case of reactivity in which it would be difficult to defend that Khalil's answer was somehow 'incorrect'. On the one hand, if Spencer is right, geneticists would be able to predict Becca's answer with very high accuracy. On the other hand, it does not match with Becca's own use of racial terms. 4 Further discussion of metametaphysical positions like Spencer's radical racial pluralism is beyond the scope of the paper. basis to (the measurement of) race and radical racial pluralists like Spencer, may be tempted to argue that some, or indeed any, amount of reactivity in the measurement of race is acceptable. In the next section, however, I will show that this view is too simplistic. To do so, I will focus on two different anti-essentialist metaphysical positions under which reactivity is legitimate and illegitimate, respectively, and use this to expand on Figure 1.

Reactivity and the Social Definition of Race
To see what is missing from the picture if we only consider arbitrariness of conceptualization, I will now discuss two different metaphysical positions within racial anti-essentialism. I will show that depending on whether one accepts subjectivism or constructivism, a different degree of reactivity is legitimate. After introducing the two positions, I will consider each variant of reactivity (i.e., alpha, beta, and gamma reactivity) in turn. I describe whether either metaphysical position would find this type of reactivity acceptable. I then use these various positions to develop a second dimension to the legitimacy of reactivity, the degree of voluntariness.
Subjectivism (sometimes called 'voluntarism') and constructivism are both responses to racial essentialism. 5 However, while subjectivists about race therefore argue that racial designations are freely up to the individual, constructivists argue that this is not the case. Constructivists believe that racial designation is instead determined by our social history, an external factor to the individual's judgement. As Ron Mallon puts it, for a constructivist: "folk concepts [of race] seem to apply even to people who do not believe they do, even among people who do not believe they do" (Mallon 2004, 656).
To see how either position would respond to reactivity, let us begin by discussing alpha change. Recall that alpha change would mean that the individual changes something about themselves because of being measured, which results in a new measurement result even though the underlying concept and representation remain stable. In the case of the 2020 US Census, alpha change would mean that because of behavior or attitude change from, say, the 2010 US Census to the 2020 US Census, a person's answer to the race question has changed. 5 For a helpful introduction, consider Mills (1998). Mills also discusses relativist and error theorist responses to essentialism. To show that arbitrariness alone is not sufficient to judge the legitimacy of reactivity, however, the subjectivist/constructivist distinction is sufficient. I will briefly discuss relativism in section 6.2.
Whether such a change is legitimate, however, depends on one's metaphysical position. Is one able to choose one's self-identification freely, or is it determined by outside factors? This question is different from asking whether self-identification is determined by biology: even in a non-essentialist metaphysics there are potential external factors that determine which identification choices are acceptable. While a subjectivist is fine with any alpha change (since racial designation is voluntary), a constructivist will only accept this as legitimate if it fits with the social history of race.
Aliya Saperstein and Andrew Penner are clear on the stakes here: "If one assumes that race is fixed, that everyone has a 'true' racial origin, and that this inherent characteristic predicts attitudes and important life outcomes, then observing fluidity is problematic. It compromises both the reliability and validity of survey data by distorting the objective information being sought." (Saperstein and Penner 2012, 681) Given that race is not fixed by biology, however, they continue: "If, instead, race represents an evolving social hierarchy, the divisions of which have been shaped by the legacies of past domination (...) -that is, if race is real to the extent that we believe it is and construct our social interactions accordinglythen both the stability of racial identification (or classification) and the collective belief that these perceptions should be stable have been created as part of that process. Thus, far from being problematic, data on racial fluidity present an opportunity to study the active 'construction' and often hidden meaning of race in the United States." (Saperstein and Penner 2012, 682) Similar metaphysical considerations affect the legitimacy of beta change. Recall that beta change means that the individual recalibrates their categorization because of being measured, while the underlying concept itself remains stable. Unlike in alpha reactivity, this change is not due to an individual's changes in attitudes or behavior, but due to an individual's changing their perspective on what the correct boundaries between categories are for them. In the context of the US Census, beta change would only be legitimate if we believe that the boundaries between racial categories (such as between answering Black and answering both White and Black) are a matter of personal choice, unconstrained by external factors. Again, constructivists and subjectivists will differ in whether they consider this acceptable: for the constructivist, the fluidity would only be acceptable if it still fits with the social history of race. For a subjectivist, there are no such restrictions, and a change in perspective is freely up to the individual.
Finally, consider under what conditions gamma change is legitimate. Recall that gamma change means that a concept term means something different to the individual because of being measured. Just as in alpha and beta reactivity, they are only legitimate to the extent that one believes that an individual can choose a particular conceptualization.
The above considerations all show that the degree of voluntariness matters for the legitimacy of reactivity. Here, I define voluntariness as follows: a property P is voluntary to the extent that an individual's being P is the result of that individual judging that they are P, rather than the result of external factors. A subjectivist about race will argue that one can freely choose one's racial selfidentification without any external pressures and thus that racial selfidentification has a high degree of voluntariness. A constructivist, on the other hand, will judge the degree of voluntariness of racial self-identification to be much lower; even self-identification, in this view, should fit with external factors, which in this example include the social history of race. 6

The Taxonomy
Let us sum up the argument so far. We have seen that according to the antiessentialist philosophers of race, there does not exist a single correct measure for race phenomena. As such, they place the US Census measurement of race far on the 'arbitrary' end of our scale from Figure 1. However, we have also seen that the arbitrariness of the measure does not always mean that reactivity is legitimate. Whether we believe there is room for changes in the measurement that do not impact on this measurement's accuracy depends on our metaphysical position. As I have argued above, if we do not believe that race is freely chosen by the individual, there are still limitations on what reactivity is acceptable, dependent on the social history of the concept race (Mills 1998), because of the second dimension, viz., the degree of voluntariness. Depending on whether we accept subjectivism or constructivism, a different degree of reactivity is legitimate. Now that we have laid bare both dimensions of legitimate reactivity, we can expand the scale from Figure 1 with a second dimension, resulting in Figure 2: In the remainder of this section, I will show how this two-dimensional taxonomy of reactivity can be applied and what its limitations are. Then, in the concluding section of the article, I turn to the key questions this taxonomy raises for future research.

How to Apply the Taxonomy
How does the taxonomy in Figure 2 help researchers measure? The taxonomy does not tell us the entire story. Before one can figure out where in this twodimensional picture a given measure falls, we need to go through the three stages of measurement (conceptualization, representation, and procedures). For instance, if we decide to focus on 'self-identified race', we will end up in a different part of the figure (e.g., in the bottom right corner, for a voluntarist) than if we focus on the OMB's 'genomic ancestry' (where, arguably, we end up in the top left corner, given that a measurement of ancestry is not arbitrary and ancestry not something one freely chooses). There are, then, different levels of analysis for which we can determine the legitimacy of reactivity.
Having said this, however, there arguably exist phenomena that must always be measured at the functional end of the horizontal axis. Thinking back to the pedometer example from Section 3.1, we have already seen that in measuring 'distance travelled by foot' there is little arbitrariness in the choice for an appropriate conceptualization (though arguably some degree of arbitrariness creeps into both representational and procedural choices). On the other hand, in measuring thick concepts such as 'happiness,' the measure chosen is always going to have a greater degree of arbitrariness, since the concept chosen depends on different value judgements. Generalizing here, a socially constructed measure is going to be more or less arbitrary depending on its path-dependency (O'Connor 2021).
Where a measure falls on the voluntariness scale will depend on context as well. If we found evidence, for instance, that one's race is more strictly societally determined (less voluntary) in some country of interest than in others, this will directly affect where one places race on the vertical axis. For instance, there is some empirical evidence to support that race is more strictly societally determined in the US than it is in Brazil (cf. Loveman 2014).
There are other context-related questions that can help us determine where in Figure 2 a measure is placed, besides which society is under study. The research goals and the research discipline matter as well: for instance, there will be a difference between measurement with a census (which is used primarily for legal and political purposes) and measurement for public health purposes. In the latter case, Yudell et al. have argued (Yudell et al. 2016), race is a problematic variable that has no clear referent. For this reason, the authors advice that public health researchers should avoid measuring race altogether, but instead focus on 'population' or 'ancestry.'

Conclusion
In this article, I argued that whether reactivity affects the accuracy of research will depend on one's metaphysical assumptions. I presented a taxonomy of reactivity, based on the two dimensions of arbitrariness and voluntariness, and highlighted the assumptions that must be met for the different types of reactivity to be legitimate, given particular metaphysical commitments. For example, I showed that if we believe that racial categorizations are arbitrary and that one can choose one's race voluntarily (subjectivism), then any form of reactivity is acceptable. On the other hand, if we believe that "the arbitrariness of racial designation is rooted in a particular social history and cannot be overturned by individual fiat" (Mills 1998, 49), then individuals cannot respond to measurement of race with legitimate reactivity. In this section, I highlight two important limitations to my framework to be explored in further research, viz., the matter of who has epistemic authority in measurement and the matter of reactivity at the level of groups rather than individuals.

Epistemic Authority
While so far I have defined a reactive effect as legitimate if it does not affect the accuracy of a measure, there is a second sense in which reactivity may be legitimate. If we believe that the 'known,' i.e., the research subjects, have epistemic authority on how they ought to be measured, this authority should be respected. In the politically charged context of race phenomena, this second aspect of legitimacy is especially relevant. For example, reactivity is more legitimate in this sense if it empowers the individual who displays it. Reactivity will be less legitimate in this new sense if it reflects, for instance, the individual's sudden awareness of discriminatory practices against them, or if it is associated with stereotype threat or stereotype vulnerability (cf. Steele and Aronson 1995). This relation between epistemic authority and legitimacy should be explored in further research. reactivity cannot be reduced to an 'unorganized' sum of individual reactive responses. Nevertheless, I expect that the framework presented in this article provides a furitful jumping-off point for future analysis of the interplay between individual and group reactivity.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.