Fictitious consumer responsibility? Quantifying social desirability bias in corporate social responsibility surveys

Corporate social responsibility (CSR) surveys repeatedly indicate significant consumer interest in products and services of businesses that follow virtuous business practices. Yet the existence of a causal relationship between company responsibility and its financial performance is a contested area, and clear evidence that CSR would create a competitive advantage is missing. As ethical evaluations are deeply embedded in responsibility, this discrepancy casts doubt on the genuineness of the integrity consumers express in surveys. A social desirability (SD) bias leading to fictitious responsibility—be it an intentional attempt to appear ethical or an unconscious tendency to exaggerate moral behaviour—is thus plausible and it threatens the reliability of research in the field. Despite this, a SD bias construct is mostly excluded from CSR marketing research, an omission likely due to a lack of appropriate measurement tools. The aim of this article is to narrow this gap and construct a SD bias variable that can be employed to control statistical analysis. Data collected using the Balanced Inventory of Desirable Responding is used to develop a new, continuous scale variable for SD bias. The results support the reliability of the new variable and the robustness of the development process. The subsequent ability to include SD bias in statistical models opens exciting opportunities to improve the value of consumer-oriented CSR surveys by highlighting the difference between fictitious and genuine consumer integrity and offering tools to quantify the severity of the former. This article is published as part of a collection on integrity and its counterfeits.


Introduction
C orporate social responsibility (CSR) has been highlighted as a potential source for strategic advantage in business (McWilliams and Siegel, 2001). Similarly, surveys repeatedly suggest consumers willing to choose products and services from responsible companies and to pay more for them (Edelman, 2012;Accenture et al., 2014;Nielsen, 2014). However, as the impact of CSR on company financial performance is unclear (Orlitzky et al., 2003;Peloza, 2009;Wang et al., 2015), we lack decisive evidence to support these notions. This article delves into why consumer interest in CSR does not transform into improved financial performance for responsible companies by analysing how survey participant attitudes may distort results by invoking insincere response behaviour. The focus is to improve the reliability of research on consumer preferences over CSR by developing a tool that can help to control for such behaviour and deepen knowledge on consumer ethics and integrity expressed in studies.
Integrity and CSR are intertwined. This link is evident in corporate decision-making related to responsible practices (Veríssimo and Lacerda, 2015), and when CSR reports are analysed (Sethi et al., 2015). Scherer and Palazzo (2011) highlight the need for businesses to gain moral legitimacy through ethical conduct. For Veríssimo and Lacerda (2015), the term integrity signified moral or ethical behaviour, and their results suggested an indirect link between leader integrity and CSR practices. This definition echoes the early conceptualization of CSR by Carroll (1979), who positioned ethical responsibilities as one of the four fundamental building blocks in the field. While a company must fulfil its profit and legal expectations, it cannot be responsible without adhering to ethical norms. Even Friedman (1970), a fierce opponent of encompassing social responsibilities in the corporate domain, agreed that companies must adopt ethical business conduct. Thus, integrity is profoundly embedded in the concept of CSR and business in general, and questions on responsibility inevitably lead to ethical evaluations.
Another cornerstone of the CSR field, stakeholder theory, emphasizes equal and ethical treatment of all parties affected by the operations of a company (Freeman and Reed, 1983). This suggests that a truly responsible company would not let interests of powerful stakeholders trump those of minority groups particularly when ethical considerations are involved. Yet as postulated by Mitchell et al. (1997), it is possible to deduct the salience of various stakeholders based on management decisions. Öberseder et al. (2013) maintained that at least part of consumers are interested in balanced treatment of stakeholders and thus entrenched in ethical evaluations when choosing products or services. The results of the consumer-oriented CSR surveys mentioned earlier serve as examples of such ethical interest. However, ethics as a domain is prone to bias and Devinney et al. (2006) ask whether we are "as individuals as noble as we say in the polls" (p. 2). The authors conclude by emphasizing the importance "to understand not what a consumer is concerned about, but how much they are willing to pay to care in circumstances as close to those they will be facing in reality" (p. 10). They maintain that survey results measuring willingness to support ethical business practices through purchase decisions are biased, and this bias suggests consumers project fictitious integrity when responding surveys, be this intentional or not.
This study employs the definition of integrity as ethical conduct, but instead of companies it focuses on consumer integrity and how responsibility impacts purchase decisions. An inconsistency between stated attitudes and real-life choices in relation to ethical products seems evident. Hence the integrity manifested in surveys as responsible attitudes or choices could be counterfeit, defined as fictitious, imitation or insincere. The cause for this could lie in social desirability that "refers to a need for social approval and acceptance and the belief that this can be attained by means of culturally acceptable and appropriate behaviors" (Crowne and Marlowe, 1960: 109). In survey research, this is known as social desirability (SD) bias.
Questions that may prompt answers considered socially undesirable are a form of sensitive enquiry (Tourangeau and Yan, 2007). Drug use or political views are the most common topics associated with sensitivity, and to avoid undesirable answers respondents may become biased. SD bias is also likely in consumer studies that focus on ethical behaviour, even when researchers take precautions such as anonymity and a nonthreatening survey situation (Fernandes and Randall, 1992). CSR surveys, intertwined with ethics, are profoundly prone to this bias and it likely contributes to the attitude-behaviour gap between stated and real consumer choices linked with responsibility (Roberts, 1996;Kuokkanen and Sun, 2016). Of even greater concern are claims that the methodology applied in a CSR study will dictate research results; Beckmann (2007) suggested that quantitative studies produce a positive link between responsibility and purchase decisions while qualitative inquiries lean toward no connection. She contributed this phenomenon to increased honesty of an interview situation. Despite these issues, SD bias is regularly excluded from marketing research (Steenkamp et al., 2010). A primary reason is likely the long-standing controversy over whether SD measurement scales are a valid tool to control for the bias (Barger, 2002), and thus Kuncel and Tellegen (2009) call for new measures of the construct. As CSR surveys offer a way to appear ethical without corresponding actions, the risk of counterfeit integrity because of socially desirable responding seems evident.
On the basis of these arguments, this study aims to progress SD bias measurement in marketing survey research and sets out to create a novel quantitative variable that can be employed to control CSR studies for biased response behaviour. The potential for counterfeit, or fictitious, consumer integrity is a major source of unreliability for responsibility studies independent of whether the respondents behave in an outright insincere manner or are merely unaware of their fictitious response patterns. The ability to control for the bias would narrow the frequently encountered attitude-behaviour gap in CSR research and encourage the inclusion of this behavioural aspect in research. It would also allow researchers to recognize the extent of fictitious conduct. The next section will review the evolution of the SD bias construct and its measurement to lay the foundation for the methodological development. §

Literature
After their original definition of social desirability as the pursuit for social approval and acceptance presented earlier, Crowne and Marlowe (1960) became the seminal authors in the field for several decades. In 1991, Paulhus highlighted that social desirability bias is "a systematic tendency to respond to a range of questionnaire items on some other basis than the specific item content" (Paulhus 1991: 17). More recently, Kuncel and Tellegen (2009) complemented the field by defining "socially desirable responding as behaving in a manner that is consistent with what is perceived as desired by salient others" (p. 202). Authors broadly agree on the definition of SD bias, and this study adopts the definition by Kuncel and Tellegen. However, the seminal work by Marlowe and Crowne assumed SD bias to be a single construct, and subsequent attempts to conceptualize this behaviour in more detail created controversy.
Social desirability bias conceptualization. Paulhus (1984) initiated the quest for more thorough understanding of SD bias with ARTICLE PALGRAVE COMMUNICATIONS | DOI: 10.1057DOI: 10. /palcomms.2016 his two-component model that attempted to explain different flavours of biased answers. To reflect the underlying behavioural patterns that lead to distorted responding, he formulated two subconstructs to SD bias: self-deception and impression management. While persons who engage in self-deception truly believe their real behaviour is to match their reported responses, impression management is a conscious effort to appear merely to behave well. In 2002, Paulhus divided the construct into egoistic and moralistic biases, with both items further accounting for intentional and unintentional exaggeration of desirable qualities. In his new model, an egoistic SD bias was linked with "unrealistically positive selfperceptions on such agentic traits as dominance, fearlessness, emotional stability, intellect and creativity" (Paulhus, 2002: 63). The second sub-construct, a moralistic SD bias or a "tendency to deny socially-deviant impulses and claim sanctimonious, 'saintlike' attributes" (Paulhus, 2002: 64) is highly relevant to CSR and ethical surveys. Following the conceptualization by Paulhus, the pursuit of moral high ground may lead to unintentional (selfdeceptive denial) or intentional (communion management) moralistic bias when responding to surveys and distort the answers to seem more responsible than real behaviour. Paulhus's model created an active debate around the nature of SD bias with findings that support and reject the existence of his proposed division. Conceptualization of the construct is still perceived weak because of the complexity of the underlying behaviour (Kuncel and Tellegen, 2009;Lee and Woodliffe, 2010). Findings that support the model exist (Jowett, 2008), but opposite conclusions suggesting a model with only one construct are also common (Leite and Beretvas, 2005;Lönnqvist et al., 2007). A recent study by Dodaj (2012) found partial support for the division between egoistic and moralistic biases, but it did not maintain differentiation between the conscious and unconscious levels. Similarly, Steenkamp et al. (2010) supported the division between the moralistic and egoistic types of bias. A particular challenge in the field is the scarcity of alternative models, with the five-dimension construct of Lee and Woodliffe (2010) a notable exception. Thus the discussion has centred on the Paulhus model.
Research on organizational behaviour (OB) has also contributed to the debate over SD bias measurement and conceptualization. Ganster et al. (1983) presented three potential models of SD impact on OB findings but concluded that the effects are not widespread. They advocated including SD effect in responses as a separate variable. Zerbe and Paulhus (1987) connected various OB research areas with the two types of biased behaviour and urged inclusion of the concepts in organizational research. Empirical work contradicted this postulation when Moorman and Podsakoff (1992) found that inclusion of SD bias had no significant impact on their results even when Paulhus's twocomponent model was employed. Chan (2001) advocated focus on impression management, though his findings supported only a limited impact of the construct. Donaldson and Grant-Vallone (2002) criticized methods employed to control for SD bias and called for new analytic techniques to be developed. While in organizational research the problem SD bias poses is well noted, further attempts to support the Paulhus conceptualization have not yielded results.
Methods to cope with social desirability bias in surveys. Researchers have employed a multitude of ways to cope with SD bias in survey situations, roughly split into two groups: survey design measures and the inclusion of separate measurement questions to quantify bias (Paulhus, 1991;Tourangeau and Yan, 2007). While this study focuses on the latter group, it is essential to visit briefly the first to highlight quantification as an essential approach in marketing research.
Survey design methods. Gordon (1987) suggested that instructions to emphasize anonymity of a survey and the need for honest answering will reduce biased responding. Fisher (1993) demonstrated that indirect questioning reduces distortion from SD bias. A typical example of indirect questioning is to ask for a respondent's opinion on how "other people", or "people in general" would perceive a situation instead of inquiring the respondent's personal attitudes. However, this manipulation may sometimes risk the validity of the questions (Fisher and Tellis, 1998). More sophisticated methods for SD bias reduction include the randomized response (RR) and unmatched count (UC) techniques (for example, used by De Jong et al., 2010 andLippit et al., 2014). In both techniques, responses and respondents are disconnected to reduce the incentive for bias. The bogus pipeline method, which creates an illusion that deception is impossible (for example, by employing a fake polygraph), is not commonly employed due to the element of deception (Tourangeau and Yan, 2007).
In RR and UC many additional or unrelated questions are asked, and this would add complexity to quantitative marketing surveys and risk respondent fatigue. Indirect questions would shift the focus from the individual to a generalized population, and not reveal heterogeneity in preferences. While respondents can be urged to answer questions as honestly as possible with anonymity guaranteed, computerized online surveys may weaken trust on such promises as tracking respondents could be possible. Tourangeau and Yan (2007) and Krumpal (2013) provide indepth reviews on survey design methods to reduce SD bias. As such, the methods described are likely to reduce SD bias when employed, and in a CSR context Beckmann (2007) supports this by noting that consumer interviews tend to result in a less positive view of responsibility compared to quantitative surveys. However, for many marketing studies in-depth interviews of customers, or lengthy surveys with a design to increase anonymity, are not feasible, and this article focuses on how to quantify SD bias, or consumer integrity, through traditional survey questions.
Quantitative measurement of social desirability bias. The Marlowe-Crowne Social Desirability Scale (MCSDS, Crowne and Marlowe, 1960) has dominated the quantification of SD bias in surveys. The length of the original instrument at 33 questions prompted the development of multiple shorter versions (for example Strahan and Gerbasi, 1972;Reynolds, 1982;Fischer and Fick, 1993;Stöber, 2001;Andrews and Meyer, 2003). The validity of such shorter forms has created debate; some findings indicated improvement over the full scale (Fischer and Fick, 1993;Loo and Thorpe, 2000), yet other studies argued there were significant problems with shortening the MCSDS (Barger, 2002;Beretvas et al., 2002).
Complementing his conceptual formulation of SD bias dimensions, Paulhus (1984) developed the Balanced Inventory of Desirable Responding (BIDR). He updated the original instrument along with the conceptual development, and the latest version separates between egoistic and moralistic response tendencies (Paulhus, 2002). A benefit of the BIDR is the division of questions into categories targeting the different sub-constructs, and because of this Steenkamp et al. (2010) strongly supported the use of BIDR over MCSDS in marketing research. However, both scales are rarely employed; an analysis of empirical articles published in three leading marketing journals for 1968-2008 revealed that out of the approximately 190 survey-based articles only 26 employed the MCSDS, while seven included the use of BIDR (Steenkamp et al., 2010).
Both of the leading measurement scales employ binary answering to a range of statements. MCSDS respondents indicate whether the statements are true or false of them, while the BIDR employs a seven-point Likert-scale of agreement; the two extreme choices imply the potential for biased behaviour while the rest do not. As noted by Kuncel and Tellegen (2009), a person prone to SD bias may avoid the most extreme alternative, and thus a Likert scale cannot be interpreted in the conventional sense where "strongly agree" would indicate heavier bias than "agree". However, binary responding limits the incorporation of the data in statistical analysis. As synthesized by Beretvas et al. (2002), there are three common ways to employ SD bias measurement results in quantitative analysis. These are (1) to calculate the correlation between the SD scale and the focal instrument, (2) to conduct a factor analysis simultaneously on both instrument results to reveal biased behaviour, and (3) to remove responses that indicate high levels of bias. With the first two, an analyst hopes not to find significant joint variation, and the SD measurement cannot be employed to reduce bias if identified. The last option will reduce the amount of data and potentially lead to a loss of valuable insights. None of the options incorporate SD bias in an analysis model as a control variable, depriving researchers of knowledge on how severely the results are biased.
SD bias in CSR surveys: the challenges. It seems plausible that a SD bias may influence results of consumer-oriented CSR surveys. The discrepancy between such results and the literature that reveals the mixed relationship between corporate social and financial performances supports this argument. Respondents to surveys likely fall prey to partly fictitious integrity and exaggerate their ethical and responsible attitudes or behaviours either intentionally or without being aware of this. Both psychology and CSR research suggest that interviews could paint a more realistic picture of the situation, but quantitative methods are a tool far too common in marketing research to cast aside. The challenges SD bias creates in quantitative marketing research can be synthesized in three categories.
First, many of the non-numerical methods to combat SD bias are not applicable when surveys target consumers. For example, student samples may be subjected to techniques such as randomized response or bogus pipeline, but consumer studies need to be brief and clear. The non-numerical methods tend to create unwanted complexity. The need for simple methods supports the use of measurement scales for SD bias.
The second challenge is the need for shorter SD instruments (Blake et al., 2006). While psychology research with designated samples for a particular project may administer long SD instruments separately, marketing researchers focusing on consumers cannot rely on this. However, the accuracy of existing short forms is disputed. In addition, the existing true/false methodology prevents shades of bias from being discovered, and this is particularly problematic for the short forms as fewer questions allow for less variation.
The final challenge connects directly with the second one: Binary answers allow only limited options for employing the results and narrow the alternatives to using SD bias data in analyses. Without a continuous measurement scale, it is hard to control statistical models for the tendency to report socially desirable opinions or analyse whether the construct mediates or moderates behaviour. This gap weakens the reliability of CSR survey findings and limits our understanding of real consumer integrity.
To address these three issues this study transforms traditional SD bias measurement methods to a variable that can reflect shades of bias. This tool can be employed as a control variable in various statistical techniques to improve their accuracy. Furthermore, the measurement of shades of bias will allow for a shorter, yet valid, measurement instrument, similar to many common questionnaire-based variables employed in most surveys.
Together these advances are aimed to improve the reliability of surveys focused on responsible and ethical attitudes and to narrow the attitude-behaviour gap evident in the domain of CSR.

Methodology
The aim of this study was to develop a SD bias measurement variable on a continuous scale that can be employed to control statistical models for distortion. However, the goal was not to create new measurement questions, a laborious process involving multiple stages and revisions, but to transform the use of an existing scale to deliver the desired outcome. Steenkamp et al. (2010) suggested the BIDR to be the preferred SD bias scale in marketing research, and the twenty original BIDR questions that measure moralistic response tendencies (MRT) formed the basis for the questions in this study. They were defined applicable because CSR is likely to prompt "saint-like" rather than dominance-related biases. Furthermore, Steenkamp et al. concluded that consumers in Switzerland, the data collection country of this study, are prone to moralistic over egoistic bias. Yet this choice was made as consideration was necessary to create the intended short SD bias measurement instrument, and it does not rely on or imply the existence of the two biases. Future research should investigate whether further evidence on the division between moralistic and egoistic biases can be discovered, and the method developed in this article may offer a tool for this work.
Both the MCSDS and the BIDR scales have been shortened previously (Barger, 2002;Steenkamp et al., 2010). The 10 questions selected by Steenkamp et al. were complemented with two additional ones from the original BIDR, chosen based on the likelihood that respondents of a marketing survey would feel comfortable answering them. The partner company providing access to the sample contributed to this evaluation. As with the original instrument, half of the questions in the shortened 12question version were negatively keyed to increase the reliability of the results. To achieve this balance, one of the original questions was adapted from its original positive form "I sometimes drive faster than the speed limit" to negative "I never drive faster than the speed limit". Box 1 presents the questions selected for the study.
The study was conducted among customers of a medium-sized tour operator in Switzerland, defined as individuals belonging to the marketing distribution list of the company. The response rate matched with the low response rates normally experienced with internet-based questionnaires. Qualtrics survey platform was used

Box 1 Questions from the balanced inventory of desirable responding (BIDR)
Question Key I never take things that don't belong to me. I always obey laws, even if I am unlikely to get caught. I have received too much change from a salesperson without telling him or her.

Neg
When I hear people talking privately, I avoid listening. I never drive faster than the speed limit.* I don't gossip about other people's business. I sometimes try to get even rather than forgive and forget. Neg I never cover up my mistakes. I have said something bad about a friend behind his or her back.

Neg
When I was young I sometimes stole things.
Neg I have done things that I don't tell other people about.
Neg I sometimes tell lies if I have to. Neg to collect 379 responses and after removing incomplete responses and inspecting the data, 370 qualifying responses were available. These were split into two groups in chronological order based on the time the responses were received, as this was considered equivalent to a random order. The variable was developed using the first group (n = 185), and tested and validated with the second (n = 185). The questions were administered in May 2016 as part of a CSR survey on tourism destinations. The study was initially developed in English and subsequently translated into German and French to accommodate the population. A professional service provider did the initial translations to both languages. The German version was subsequently reviewed by a native speaker employee of the partner company to verify that the terminology and language were comprehensible to the respondents. Next, the French draft translation was reviewed against the German revised translation by another employee fluent in both languages. Finally, the revised French version was retranslated to English and compared with the original version. The survey link was sent out by email as a customer letter, with the anonymity of the study emphasized. No identifying details on the respondents were collected, and no financial incentive to participate offered.
Variable development process. Figure 1 depicts the development process of a continuous SD bias variable. The 12 BIDR questions selected from the MRT subscale were answered on a 7-point Likert scale ("strongly agree" to "strongly disagree"). Negatively keyed questions were reversed and following Paulhus (2002), the two extreme answers to each question were considered to indicate a potential tendency for SD bias. These questions were coded with value 1, while the rest were marked 0 (phase 1). This transformation created 12 binary SD variables per respondent. Next, the mean SD intensity in each question was calculated as an ordinary average of the binary results: where 0 ≤ SDAve j ≤ 1 for each j = 1,…,12. The binary variables were ranked according to their average SD intensity, and based on this ranking the 12 original binary variables were recoded into binary ranked variables (phase 2); the variable ranked first represented the lowest average SD tendency.
Next, the 12 ranked variables were divided into three groups to facilitate the creation of 5-point measurement scales (phase 3). All the questions of the BIDR MRT subscale are aimed to measure the same phenomenon of moralistic SD bias, and thus combining questions with extreme answers would minimize artificial variation among the new groups. Questions ranked 1 and 2 were combined with the two last ones, and questions ranked 3 and 4 with ranks 9 and 10. Finally, questions with ranks between 5 and 8 were grouped together. This logic facilitated the creation of three SD measurement variables on interval scales while maintaining heterogeneity between the components. The three new variables were calculated as: for each respondent i = 1,…, n.
If none of the questions in the group indicated a SD bias, the new variable would receive a value of one, indicating no SD bias tendency. Each question with a value of one (potential SD bias) would increase the target variable and should all questions in the group suggest a tendency for SD bias, the variable would receive a value of five or strong SD bias. Employing this method, all the three groups of questions were recalculated into variables on a five-point scale. Finally, the three subscales were averaged to create a continuous measurement variable for SD bias (phase 4).
Two alternatives were employed to test for reliability of the new variable. First Cronbach's α, likely the most common measure of internal consistency in organizational research (Cho and Kim, 2015), was calculated. However, the validity of Cronbach's α as a measure of reliability has been criticized, and techniques based on structural equation modelling (SEM) have been suggested superior for analysing the reliability of measurement scales (Graham, 2006;Bonett and Wright, 2015;Cho and Kim, 2015). Following the guidelines of Cho and Kim (2015), a confirmatory factor analysis (CFA) was conducted with AMOS 22 The second part of the development employed the remaining half of the responses to create a test SD bias variable with the same process. The only exception was that the questions were not re-ranked based on the average bias indices of the test sample; instead, the rankings of the development sample were used. For example, if binary variable four was ranked and recoded in the first position, the same ranking was applied to the test sample to avoid overfitting the data. Apart from this exception, the calculation of the SD bias test variable followed the process described earlier.
Both the development and the test SD bias variables measure the same attitude, and as both samples originate from the same population, there should not be a significant difference between their values. A difference would suggest that the calculation process creates a distortion in the values, and this would not support the validity of the new variable. Thus the means of the development and test SD bias variables were tested for statistical difference. On the basis of the above, an insignificant test result would suggest that the two variables measure the same population and support the robustness of the development process.

Results
The first half of the sample (n = 185) was employed to develop the variable. After transforming the original answers into binary variables (Phase 1), the binary variables were ranked based on their average SD index in equation (1). Table 1 presents this ranking and indicates the potential average SD bias of the respondents per question. The index values ranged from 0.168 to 0.768; depending on the question, 17 to 77% of respondents could be prone to socially desirable response behaviour. The values indicate a good range of questions, supporting the possibility to measure degrees of biased behaviour. SD for all variables were clearly more stable, and the range was within 0.125, suggesting a stable quality of responding within the sample. On the basis of the index the variables were recoded as indicated in Table 1 (Phase 2).
During Phase 3, the 12 ranked binary variables were recalculated, employing equations (2,3,4), into three interval scale variables that measure tendency for socially desirable behaviour. As seen in Table 2, the first two variables demonstrated means close to each other, but the standard deviations suggested the variables to differ and represent heterogeneity among the respondents (SDLik 1 , M = 2.827, SD = 1.001; SDLik 2 , M = 2.832; SD = 1.132). Cronbach's α (α = 0.662) supported the three variables to create a reliable measurement variable. As indicated earlier, the criticism expressed toward the alpha led to the use of SEM to explore variable reliability further.
Following Cho and Kim (2015), the three variables were first examined to define the optimal reliability measure. Two unidimensional CFA models were tested to define whether the item could be considered tau-equivalent, suggesting the use of Cronbach's alpha. In both models, the variance of the latent SD bias variable was fixed to 1 for identification purposes. In addition, in the congeneric model (Fig. 2) factor loading for the third measurement variable was fixed to one, while the tauequivalent model assumed all factor loadings to equal one. As presented in Table 3, a χ 2 test of overall congeneric model fit (χ 2 = 0.315, p = 0.576) supported the model, with goodness-of-fit    index (GFI) and root-mean-square error of approximation (RMSEA) providing additional backing. The significant chisquare test on the difference between the fit of the two models (χ 2 = 30.427 p = 0.000) further suggested the model not to be tauequivalent, and therefore Cronbach's α was deemed unsuitable for measuring reliability. Instead, a congeneric reliability coefficient (ω) was calculated (Cho and Kim, 2015). This coefficient is based on the squared sum of non-standardized factor loadings and the sum of estimated error variances (Table 4), and it further supported the reliability of the new SD bias measure (ω = 0.694). With this support, the three five-point interval scale SD variables were averaged to create the new variable that measures tendency for socially desirable responding on a continuous scale (Phase 4).
Validation of the process. The second half of the data (n = 185) was used to test the robustness of the development process; a bold typeface indicates a test variable. In Phase 2, the binary variables were recoded to become the ranked binary variables as presented in Table 2 despite a few minor differences in the SD bias indices of the second sample binary variables. As noted earlier, this approach was chosen to avoid overfitting data. A comparison of the resulting three interval scale test variables (Table 5) with the original variables (Table 2) revealed minor differences in both means and standard deviations. Cronbach's α (α = 0.700) supported the reliability of the three variables to reflect the same construct. Similar to the development variables, CFA indicated the model to be congeneric, with the congeneric reliability coefficient (ω = 0.732) providing further support for reliability. Thus the three variables were considered a valid test measurement scale for SD bias and averaged to create the test variable.
An independent samples t-test was employed to test the equality of the two variable means. Before this, the normality of the variables was investigated. Both SD bias variables failed Shapiro-Wilkins test for normality (SDBias, p = 0.002; SDBias, p = 0.000). An observation of the descriptive statistics revealed issues with both skewness and kurtosis (Table 6). A square root transformation of both variables was conducted to address these issues, but the transformed variables still failed Shapiro-Wilkins test (SDBias sqr , p = 0.007; SDBias sqr , p = 0.001). However, both transformed variables were acceptable in terms of skewness and kurtosis as demonstrated in Table 6, with absolute values of skewness divided by kurtosis less than one and the statistic divided by its standard error less than two. Visual observation of the Q-Q plots further supported the use of a parametric t-test.
The independent samples t-test supported the expectation that the means of the transformed SD bias variables (SDBias sqr , M = 2.706; SDBias sqr , M = 2.586) measure the same population (t = 1.448, p = 0.149), while Levene's test for equality of means (F = 1.635, p = 0.202) supported the variances of the two variables to be equal. Thus, the proposed process for developing a continuous variable for measuring social desirability bias can be interpreted robust and to produce consistent results.

Discussion and conclusions
There is an evident gap between the attitudes consumers express toward CSR in marketing surveys and the relative financial performance of companies that embrace responsibility. Ethics and ethical treatment of stakeholders are an integral part of responsibility, and frequent positive survey results would suggest high levels of consumer integrity. Company financial performance does not support these results to be fully realistic and the gap can be partly explained by social desirability bias that causes a part of consumers to express a fictitious positive attitude toward responsibility. The underlying cause could be a conscious attempt to fake integrity, or an unintentional exaggeration of moral beliefs, but in either case the reliability and value of quantitative CSR research are reduced. To narrow the gap this study aimed to develop a new variable for measuring SD bias that allows controlling survey results for its impact.
On the basis of the review of current SD bias reduction methods three objectives were defined for this study. The first was to contribute to bias detection methods applicable to CSR surveys targeted at consumers and the second to develop a short measurement instrument for this. The third objective was to add the opportunity to express shades of SD bias and create a variable that can be incorporated in statistical models as a control item. Instead of the laborious process of developing new questions for SD bias measurement, the study was based on existing measurement scales, with the BIDR (Paulhus, 2002) selected as the most relevant for the purpose. The overall goal was  to develop a tool to detect fictitious integrity in questions that require ethical evaluation.
The SD bias variable developed in this paper addresses the goals set at the beginning. A marketing survey can include the 12 measurement questions without extending completion time too far. While earlier proposals for short forms of SD bias instruments at comparable lengths exist, this process deviates significantly from the predecessors. It transforms binary (true/ false) answers to questions into a continuous SD bias variable, and the resulting opportunity to measure shades of bias will increase the validity of a shorter instrument and provide more opportunities to incorporate SD bias in marketing analyses. The earlier long forms of measurement have relied on the number of questions to reveal biased behaviour. With a continuous variable available the need for long (30+) question instruments, unfeasible in a marketing study, will disappear. Crucially, the creation of a continuous SD bias variable will provide the opportunity to incorporate it in a range of statistical analyses from ANOVA to regression and further to various moderation and mediation models. Thus statistical analysis requiring continuous variables becomes an option when socially desirable responding is expected or suspected, as is often the case with CSR research. As a result of an ability to differentiate between real and fictitious consumer integrity the reliability of CSR surveys should increase in the future, and results should better reveal the type of responsibility that interests consumers.
While the main focus of this study was to offer the possibility to include SD bias as a control variable in CSR analysis, the findings also provide some interesting insights into how prone to biased behaviour consumers on average are. The results suggested an average SD bias of 2.585-2.701 and measured on a scale of "no bias" to "very strong bias" this would indicate a moderate bias among the respondents. However, earlier research has not been able to quantify this and further studies should validate the finding and define what moderate SD bias means when stated choices are compared with real ones. Such research would contribute to understanding the depth of fictitious ethical responding. Another exciting avenue would be to generalize SD bias tendencies in different populations. While future studies can include the questions proposed here in a new survey instrument, such generalizations could open interesting avenues for adjusting previous consumer-oriented CSR studies belatedly for the bias. The adjustment could potentially shed new light on the controversy between the survey findings on consumer interest in responsibility, and the analyses of causality between corporate social and financial performances that support such interest only occasionally. Furthermore, outside the context of responsibility and ethics researchers in the field of psychology might be able to use the new variable to develop moderation or mediation models where SD bias enters as one of the variables explaining human behaviour.
This study relies on the validity of the original BIDR questions selected. These questions have undergone a multiple phase development process including panels of leading experts in the field. Whether a situation prompts tendencies to indicate behaviour that differs from reality, be it intentionally or instinctively, is always debatable. Further research into which questions are best for measuring shades of responsibility, and a comparison between continuous variables based on the BIDR and the MCSDS, would significantly contribute to this debate. The methodology could also be employed to continue investigation of whether multiple SD bias constructs exist. This research used questions directed at measuring a moralistic bias, as they have been deemed fitting for marketing research, without addressing the issue of the constructs, but a comparative study employing SD bias measurement questions from each subscale could nudge the debate over SD bias conceptualization further.
The hope is that the method presented in this paper advances the debate over the reliability of consumer-oriented CSR surveys and highlights the importance of including SD bias in methodology employed. Fictitious consumer integrity is manifested as favourable responses to questions related to CSR in surveys without respective actions in a purchase situation, also known as attitude-behaviour gap. The bias may be outright insincere with consumers consciously pretending to care about good business practice, but it could also be a result of "saint-like" views of personal behaviour projected in surveys in a false but sincere manner. Both paths lead to unreliable survey results, and while further work is required before the attitude-behaviour gap in responsibility research can be bridged, this study aims to contribute to the discovery of the true value of responsibility.