The evaluation of personnel selection methods by HR practitioners: The effect of reference and its interaction with information about validity

Abstract This study is an experiment that examines the effects of positive reference, information about predictive validity, and their interaction on how HR professionals evaluate selection methods. It contributes to understanding why HR practitioners use personnel selection methods that are considered to have low predictive validity. A sample of 173 HR professionals from the Czech Republic was asked to evaluate six selection methods that could be used to select a project manager for a telecommunications company. Each participant was randomly assigned to two experimental conditions as the selection methods were presented together with/without positive reference and with/without information about their predictive validity. The results of repeated measures ANOVAs with two between-subjects factors, one within-subject factor, and their interactions showed that information about predictive validity did not significantly influence how HR professionals evaluated selection methods. The analyses also did not support the effect of positive reference on the evaluation of methods with low validity. In contrast, the analyses provided support for the effect of positive reference on the evaluation of selection methods with high predictive validity. The interaction of reference and information about validity had no significant effect on the evaluation of selection methods by HR professionals.


Introduction
Studies by Mann and Chowhan (2011) and König et al., (2010) have questioned why professionals do not follow scientific knowledge and use invalid methods and procedures when selecting new employees.
According to Rynes (2012), employee selection is identified as having the largest discrepancies between research and practice. The gap between research and practice in employee selection has become a major issue in human resource management and related fields (König et al., 2010;Rousseau & Barends, 2011) and has led scholars to call for research that would explain this gap and help find ways to narrow it (e.g. König et al., 2010;Rynes, 2012). Although some studies have attempted to explain the gap (e.g. Diab et al., 2011;Roulin & Bangerter, 2012;Rynes et al., 2007), a limited amount of research focused on the gap has been conducted (Furnham & Jackson, 2011;Nolan & Highhouse, 2014), and the question of why practitioners ignore research findings and use methods that are not appropriate for a valid selection process remains. This study investigates two effects that might influence why HR professionals use selection methods with low predictive validity. The first of these effects is the direct influence of reference (i.e. information that a particular method is used by major companies). The second is the moderating effect of the reference. More specifically, we are interested in whether the presence of a reference weakens the influence of information on the predictive validity of a selection method on the evaluation of the quality of that method. This study aims to contribute to explaining the gap between research and practice. To the best of our knowledge, it is the first experiment focusing on the effect of reference and validity information on HR professionals' evaluation of selection methods.
There are some research studies that support the assumption, that the reference influences an HR professional's perception of a selection method. Williamson and Cable (2003) showed that on an organizational level, the hiring patterns of Fortune 500 firms are shaped by 'mimetic isomorphism' , which is the process of imitating to become similar to another organization (Yang & Hyland, 2012). DiMaggio and Powell (1983) introduced mimetic isomorphism as one of three mechanisms of institutional isomorphic change besides coercive and normative isomorphism. Mimetic isomorphism occurs when an organization faces uncertainty and decides to copy the practices of another organization it perceives to be successful (DiMaggio & Powell, 1983). Organizations and HR professionals face a lot of uncertainty when it comes to selecting employees, as even the best selection methods have a limited correlation with an employee's future performance (Schmidt & Hunter, 1998) and there is a high risk of wrong selection. Thus, mimicking a well-known and successful organization can bring both certainty and justification to a chosen selection procedure.
On an individual level, König et al. (2010) showed that an individual's perception of a selection method being diffused in the field relates to the likelihood of them using that method. König et al. (2010) explain this effect using institutional theory. They argue in line with Klehe (2004) that institutional pressures affect the adoption of personnel selection procedures and point out that mimicking others leads to a feeling of legitimacy. Mimicking or imitation is a part of the acquiescence strategy which is s consistent with the concept of mimetic isomorphism and which is therefore preferred in a situation of uncertainty (Oliver, 1991).
From the perspective of behavioral economy, mimicking others can be attributed to the social-proof heuristic which can be defined as viewing 'a behavior as correct in a given situation to the degree to which we see others performing it (Cialdini, 2007, p. 88)' . People affected by social proof do not carefully consider the pros and cons but are subject to social influence. They care more about the popularity of their position or decision than the evidence (MacCoun, 2012). In the case of selection methods, HR professionals may assume that the method is of high quality if it is used by major companies in the field and may disregard research-based information on its validity.
In our research, we wanted to test the effect of social influence on HR practitioners' evaluations of employee selection methods. König et al. (2010) showed that methods that HR managers consider to be diffused in the field are used more often in their own organizations. However, their research was not able to provide evidence on the causality of this relationship. HR professionals might also perceive a method as being more diffused in the field if they use it themselves or if they consider it an appropriate method. Therefore, we used an experimental design to test how reference about the use of a method by a major company influences the perception of the quality of that method.

H1:
A positive reference about a selection method positively influences the assessment of the quality of that method.
We also wanted to test if a reference to a major company using a particular selection method influences the effect of information on the predictive validity of that method. The survey by König et al. (2010) found that selection procedures that are perceived to be valid by HR managers are also more likely to be used in their own organizations. According to an experiment by Highhouse et al. (2017), information on predictive validity that is presented in an understandable context affects how HR professionals evaluate a selection method. In this experiment, the authors presented two types of selection interviews to 201 HR professionals. The participants did not express a preference for the more valid structured interview over the less valid unstructured interview when the information on predictive validity was presented in a difficult-to-evaluate manner. However, when the validity coefficients of the structured and unstructured interviews were presented in an easy-to-evaluate way (i.e. it was possible to compare them with each other and with coefficients of methods with very high or very low predictive validity), HR professionals preferred the structured interviews over unstructured interviews.
Our study follows the evidence that understandable information on predictive validity influences the attitude towards a particular method (Highhouse et al., 2017) and focuses on what can weaken this effect in organizational practice. We are interested in what causes HR professionals to disregard understandable information on the low predictive validity of a selection method and leads them to the conclusion that it is high quality and appropriate for the selection process.
We assumed that the factor that moderates the effect of information on predictive validity might be the reference that a particular selection method is used by influential companies. HR practitioners are poorly informed of the last research findings (Carless, 2009). HR periodicals cite managers and consultants much more frequently than academics to support the case for recommended recruitment and selection practices. They also cite surveys carried out by organizations and consulting firms much more often than academic research studies (Rynes et al., 2007). König et al. (2010) showed that the perceived diffusion of a method in the field is more strongly related to the use of that method than its perceived validity. From the point of view of an HR professional, the reference might be a sufficient clue for a positive evaluation of the method and might suppress ideas arising from the available information on validity, as people care less about the evidence if they are under the effect of social proof (e.g. MacCoun, 2012).

H2:
A positive reference is a moderator of the effect of information about low predictive validity on the assessment of the quality of a personnel selection method.
More specifically, we hypothesize that a positive reference weakens the effect of information about validity on the evaluation of a selection method.
We wanted to test our hypotheses on a sample of real HR professionals. We did not want to present them with misleading information regarding existing selection methods, as this could influence their future perception and usage of such methods. Therefore, we needed to choose various methods with low and high predictive validity and ones with an available reference that the method was used by a major company. The methods we chose are used to select new employees in the Czech Republic where we recruited respondents for our research. However, we did not choose methods that are used in almost all selection processes (i.e. interviews and selection based on CV details, such as work experience, education, and interests). To test the first hypothesis, we used three methods with high predictive validity (assessment centers, GMA tests, and work sample tests, see Schmidt & Hunter, 1998) and three methods for which there was no evidence of high predictive validity in the selection of employees (Lüscher's color test, MBTI, and graphology). To test the second hypothesis, we used only the three latter methods for which there was no evidence of high predictive validity and which are unsuitable for selecting personnel. From a meta-analysis by Schmidt and Hunter (1998), we chose graphology with a mean predictive validity of r = .02. We could not include the other methods in Schmidt and Hunter (1998) because they had at least weak predictive validity (e.g. reference checks), or they were not used in the Czech Republic (e.g. T & E point method), or they are part of widely used assessment based on CV details (e.g. education or interests). Therefore, we chose widely used methods (Urbánek, 2012) for which there is no convincing evidence of their predictive validity (in relation to organization-related outcomes) and whose validity and reliability are called into question in the academic literature. The Lüscher color test is considered a discredited method (Norcross et al., 2006). There have been no studies regarding its predictive validity while existing empirical studies do not support its construct validity (Braun & Bonta, 1979). MBTI is considered an inappropriate method for personnel selection (e.g. Coe, 1992;Druckman & Bjork, 1991). The outcome of MBTI, classification into one of 16 personality types, has rather low test-retest reliability (Pittenger, 2005) and the test fails to predict job performance (e.g. Furnham & Stringfield, 1993).

Procedure
The study was an online administered between-subjects experiment that consisted of two parts. One part was focused on selection methods with high predictive validity and the other part focused on methods with low predictive validity. Both parts contained two manipulations with information that the respondents received. The participants completed a questionnaire on personnel selection methods, and we manipulated with the presence of a reference (i.e. with/without reference) and information on the predictive validity (i.e. with/without information about validity) of six personnel selection methods.
At the beginning of the questionnaire, we gave the participants a scenario for selecting applicants: A telecommunications company needs to choose a suitable candidate for the position of project manager. Your task is to select a suitable method for the selection process. Judge the suitability of each of the following methods for ensuring the high-quality selection of a suitable candidate for this position, regardless of cost (e.g. time, expense, management input, etc.).
Then, we gradually introduced the six methods in a fixed order -the Lüscher color test, assessment centers, GMA tests, MBTI, graphology, and work sample test. Each method was introduced with a brief description of the method. Depending on the experimental condition, the description might have been followed by information on predictive validity and/or a reference to an organization that uses the method (see Appendix). The participants assessed the suitability of each method by answering one general question ('Do you consider name of the method to be a quality method for selecting new employees?') and one specific question linked to the scenario ('Do you consider name of the method to be a suitable method for selecting a project manager in a telecommunications company?'). The answer scale ranged from 1 (very poor quality/completely unsuitable) to 7 (very high quality/completely suitable). The answers to the two questions for each of the six selection methods strongly correlated (Spearman's rho ranged from .50 to .77; Spearman-Brown Coefficient was .82 for Lüscher test, .65 for AC, .73 for GMA test, .87 for MBTI, .88 for graphology and .81 for work sample tests). We operationalized the dependent variable 'perceived quality of the selection method' as the mean answer for both questions.
The participants were then asked whether they were familiar with each of the methods before completing the questionnaire (the answer scale ranged from '1 -Did not know it at all' to '5 -Knew it very well'). In the preliminary analyses, we tested whether knowledge of the method was similar in each experimental condition so that the results were not distorted due to unequal experimental groups. At the end of the questionnaire, we also asked whether the participants had looked for additional information on the presented selection methods during the experiment. As the experiment was not conducted under laboratory conditions, unsupervised searching for information on the methods could have distorted the results. The participants then answered the socio-demographic questions we needed to describe the sample. The experiment was conducted as a part of the bachelor thesis of the second author. The project was approved by the supervisor and by the head of the department of Psychology.

Treatments
Upon opening the online questionnaire, one of four randomly selected versions of the questionnaire appeared. These versions differed in the presence or absence of a positive reference and the presence or absence of information on the predictive validity of the individual methods. The reference and/or information on predictive validity appeared with either all or none of the six selection methods.
Information on predictive validity included an explanation of the term predictive validity and information on the strength of the relationship, including an interpretation of its strength.
The presence of a positive reference was based on informing participants that the method was used by a well-known organization and briefly describing the organization (see Appendix for an example).
We accessed information on the use of methods from the websites of the organizations we referred to and from articles on this topic.

Sample
By means of the LinkedIn network and emails obtained from its website, we contacted 1,200 HR professionals who worked in various companies in the Czech Republic. We asked them to complete a questionnaire on methods for selecting employees. When they opened the link, they were presented with basic information regarding the research, including an assurance that their data would be processed anonymously and that there was no financial remuneration for their participation in the research. The only benefit was the opportunity to participate in a draw for a six-month subscription to a professional journal (Profi HR). The participants had to provide consent before starting the questionnaire. The questionnaire was completed by 174 (14.5%) of the HR professionals contacted, most of whom were women (80.5%). Only 20.7% of participants did not have a university degree. Their average experience in personnel selection was 6.2 years (SD = 6). See Table 1 for detailed characteristics of the sample.
Most of the participants answered all questions in the electronic questionnaire. One participant was excluded from the data processing because she had provided an answer to a question on the quality of only one of the six selection methods and failed to answer those regarding the other methods. One participant did not answer the question regarding the quality of one of the methods (work sample tests). Therefore, the analyses contained data from 173 participants and data from 172 participants for the work sample test. The sample enabled the testing of between-subjects effects in a 2 × 2 repeated measures ANOVA with 98% power for a medium-effect size (f = .25, α = .05, correlation among repeated measures estimated at .20) and with 80% power for a weak-to-medium-effect size (f = .175, α = .05, correlation among repeated measures estimated at .20) (G*Power 3.1; Faul et al., 2009).

Preliminary analyses and descriptive statistics
The participants were most aware of assessment centers (M = 4.43, Med = 5; on the response scale 1 − 5), work sample tests (M = 3.72, Med = 4), graphology (M = 3.34, Med = 3) and MBTI M = 3.08, Med = 3) and least aware of GMA tests (M = 2.90, Med = 3) and the Lüscher color test (M = 1.94, Med = 1). Prior knowledge of the selection method only very weakly correlated with the assessment of the quality of the methods, except for MBTI (Spearman's rho = .32, p < .001) and work sample tests (rho = .32, p < .001), which moderately correlated with the assessment of the methods. However, prior knowledge of the methods did not differ significantly across the groups with and without information on validity (rho ≤ .04) and with or without a reference (rho ≤ .10). Note. respondents could provide multiple answers for 'Position' and 'organization/sector' .
The groups with differing experimental conditions had similar prior knowledge about the selection methods.
Only 19 participants (11%) admitted that they had searched for information about the selection methods during the experiment. Searching for information had a weak negative correlation with the assessment of the quality of all six methods (the lowest was r = −.05, p =.515 for graphology, and the highest was r = −.17, p = .023 for the Lüscher test and r = −.19, p = .014 for the GMA test). As there was a similar negative effect of searching for information on the evaluation of both low and high validity methods, it does not seem to relate to the effect of new information, but rather to the tendency of the specific group of participants who searched for information to evaluate the methods more critically. The proportion of people who had searched for information did not differ significantly across the experimental conditions (Information on predictive validity: Spearmen's rho = .10, p = .212; Reference: rho = .00, p = .960).
Generally, participants perceived the methods with low predictive validity (see Table 2) as being of lower quality than the methods with high predictive validity (see Table 2). Among the methods with low predictive validity, only MBTI had an average score above 4 on the 1-7 response scale and was evaluated similarly to GMA tests, which had the lowest score of methods with high predictive validity.
Although graphology and the Lüscher test had an average evaluation that was lower than 4, more than 16% (16.8% for the Lüscher test and 22.5% for graphology) of participants rated them as medium-quality methods (i.e. a score of 5 or higher) and more than 12% of participants (12.2% for the Lüscher test and 12.1% for graphology) considered them quite suitable (i.e. a score of 5 or higher) for the selection procedure described in the scenario. Only 16.8% (also 15.5% of participants who received information about the low predictive validity) of participants considered the Lüscher test to be completely unsuitable for the selection procedure in the scenario. Graphology was deemed to be completely unsuitable by 28.3% of participants (26.2% of participants who received information about the low predictive validity) and MBTI by only 3.5% (3.6% of participants who received information about low predictive validity) (see the dataset for more detailed descriptive statistics).

Hypotheses testing
We tested the hypotheses within the repeated measures ANOVA with one within-subject factor (i.e. selection method) and two between-subject factors (i.e. reference and information on predictive validity) and their interactions. We designed two separate models: one for methods with low predictive validity and one for methods with high predictive validity. This is because information on predictive validity should have opposite effects when evaluating methods with high and low predictive validity. As can be seen in Table 3, neither reference nor information on predictive validity nor their interaction predicted significantly the evaluation of the three methods with low predictive validity. Therefore, we did not find support for our hypotheses. The significant effects were only among within-subject predictors (i.e. the effect of a type of selection method).
Also, as Table 2 shows, the participants perceived MBTI as a more appropriate selection method than graphology (t(172) = 13,68, p < .001) and the Lüscher color test (t(172) = 13,78, p < .001). The significant interaction between the method and information on validity is connected to the fact that the effect of information on validity is weakly positive for graphology and weakly negative for the Lüscher test and MBTI (see Table 2). However, the post-hoc test (with Holm correction) did not find any of these effects to be significant. The effect for the Lüscher test was close to zero (d = −.06, p Holm > .999), and the effects for MBTI (d = −.12, p Holm = .592) and graphology (d = .14, p Holm = .415) were very small and nonsignificant. The second model for methods with high predictive validity showed a weak effect of reference on the evaluation of the selection methods (see Table 3). This result provides partial support for H1 (i.e. the positive effect of reference). We did not test H2 (i.e. reference as a moderator) within the model for methods with high validity as the hypothesis is relevant only for non-valid methods. Nevertheless, the effects of information on validity and the interaction between validity and reference were very small and nonsignificant. As with the first model, only the same two within-subject effects were significant. As can be seen from Table 2, GMA tests were perceived as a less appropriate method when compared to assessment centers (t(172) = 7,35, p < .001) and work sample tests (t(171) = 12,34, p < .001). The significant interaction between method and information on validity is connected to the fact that the effect of information on validity is weakly positive for GMA tests and weakly negative for assessment centers and work-sample tests (see Table 2). However, as with the first model, the post-hoc test did not find any significant effect for information on validity across all three methods. The effect for work sample tests was close to zero (d = .02, p Holm = .795), the effects for assessment centers (d = .17, p Holm = .139) and GMA tests (d = −.09, p Holm = .491) were very small and nonsignificant.

Discussion
Our study partly supported the hypothesis concerning the effect of positive reference on a method's perceived quality. Information on the use of a selection method by a well-known organization increased the perceived quality of the method in the case of methods with high predictive validity. However, the same effect was not observed in the case of methods that had low predictive validity. It is possible to explain the difference in influence of reference for high validity and lower validity methods through confirmation biases, which refer to the tendency to seek and prefer evidence that supports previous attitudes (Stone & Wood, 2018). Even the participants in the control group (i.e. without information about predictive validity and reference) were able to distinguish between the high validity selection methods from those that had low predictive validity. When the participant received a positive reference about a method they had perceived positively before the experiment, it could have further strengthened their positive attitude. However, if they received a positive reference about a method they had perceived negatively or ambivalently beforehand, cognitive dissonance (see, e.g. Stone & Wood, 2018) may have occurred and their attitude did not change. Our study shows that one positive reference is not enough to make the HR professional uncritically accept a low-validity selection method. Therefore, we believe that more than just one reference is needed to induce mimicking of non-valid selection procedures. In the study of Diekmann et al. (2015), HR professionals were resistant to irrelevant 'neuroscientific' descriptions when evaluating selection methods. Our study shows that they are similarly resistant to a lone reference that actually says nothing about the quality of the method itself.
We did not find any support for the second hypothesis that a positive reference weakens the effect of information about low predictive validity on the perceived quality of a selection method. This might be because the effect of information on predictive validity was very small and nonsignificant, and, therefore, there was no relevant effect to be weakened. This is a surprising result as a study by Highhouse et al. (2017) found evidence of an influence of interpreted information about the validity of a selection method on the evaluation of that method by HR professionals. The difference in the results of our study and the conclusions of Highhouse et al. (2017) may be due to differences in research design and materials. Our study compared the evaluation of the same method with and without information on predictive validity. Highhouse et al. (2017) based their conclusions on the fact that their participants preferred one selection method (i.e. structured interview) over the other (i.e. unstructured interview) when they informed them about their different predictive validity. The effect that they found thus comprised of both the positive effect of information on the high validity of structured interviews and the negative effect of information on the lower validity of unstructured interviews.
Recent research by Voss et al. (2020) showed that the effect of information about validity on the evaluation of a selection method may differ across various populations and contexts. In particular, the authors showed that the level of numeracy of respondents moderates the effect positively, because the respondents with high numeracy understand the information better than the respondents with low numeracy. We assume that the level of numeracy was rather high in our sample as 79% of our respondents had a university degree. We also tried to present the information about validity in a very understandable manner, as we have included its interpretation. However, other so far unknown factors may have weakened the effect of information about the validity in our study. We believe that further research should focus on factors that may moderate the effect of information about validity on the evaluation of a selection method. Such research should distinguish the effect of information about high validity from the effect of information about low validity. A better knowledge of moderators would help researchers both to better communicate their research findings to practitioners and to better design their future studies that operate with the information on validity.
The descriptive statistics presented in our study provide some good news about HR professionals in the Czech Republic, as the participants in our research generally assessed the high validity methods more positively than those of lower validity. However, a more detailed examination of our participants' assessment of the quality and suitability of methods sends out a warning sign. More than one-tenth of the HR professionals in our research had a positive attitude towards methods that are inappropriate for selecting new employees. And as our results demonstrate, their attitudes were not even swayed by information about the low predictive validity of those methods.

Limitations and recommendations for future research
One of the limitations of our study is associated with the sample. Most importantly, it is impossible to consider it representative in terms of the population of Czech HR professionals. Only 14.5% of those people we contacted participated in the research, so the sample was biased by self-selection. On the other hand, we contacted all of the HR professionals we were able to find via the internet and social networks and analyzed the data from 173 of them, which allowed us to conduct an analysis with relatively high test power. However, the generalizability of our results could be enhanced by replication studies conducted on further samples, ideally from other countries, and including evaluations of other selection methods than the six methods that we chose for our study.
The HR professionals in our sample probably had some idea about the predictive validity of the selection methods that were featured in our research and about the use of those methods in various organizations. Therefore, our results reflect the influence of information that was not necessarily new to the participants. The results might have been different if we had presented the participants with methods to assess that they did not know (e.g. new methods or fictional methods, see, e.g. Diekmann et al., 2015), or if we had carried out the research with students of I/O psychology or with students of HR management. According to Voss et al. (2020), both the students and professionals (hiring managers) are similarly influenced by the information about the validity of a selection method, but the effect of context on the evaluation of a selection method might differ across these groups. Nevertheless, we believe that our approach ensured the high ecological validity of the results. It simulated the influence of reference and partial information on predictive validity on the attitudes of people who actually use these methods in practice. A similar influence on their attitudes could be induced by an article in an HR magazine or a marketing campaign by a test producer. However, further research could focus on the effect of reference when evaluating new unfamiliar methods to eliminate the influence of prior experience on selection method evaluation. Such research would have to include quality debriefing if it deceived participants about the references and the evidence on the predictive validity of unknown methods, as such information typically does not exist for methods, that are completely new for HR practitioners.
According to DiMaggio and Powell (1983), companies tend to imitate successful companies that operate in the same industry. Therefore, the effect of reference found in our experiment could vary depending on whether the reference company operated in the same industry as the respondent. We were not able to control such moderation as our sample included a large number of HR professionals who operated in multiple industries. However, we did not expect the moderating effect of the industry to be large. Recent research focused specifically on isomorphism in recruitment (Simón & Esteves, 2016) did not find a significant effect of industry in a cross-industry sample. This can be explained by the fact that recruitment and other HR practices do not differ much across industries and a successful company can easily be a model for HR from another industry. For example, research by Zibarras and Woods (2010) found differences in the use of selection methods between the private and public sectors, but only negligible differences across manufacturing, business services, and other services industry sectors.

Conclusions
This study contributes to the understanding of the research and practice gap. It showed that HR professionals are generally able to distinguish between high and low-quality methods and that only one partial reference does not affect their (negative) evaluation of methods with low validity. The results show that a positive reference may slightly strengthen HR professionals' positive attitudes toward methods with high predictive validity. Therefore, promoting high-validity methods in marketing campaigns is worthwhile, even if only by highlighting that a major company that uses a particular method for selecting personnel. On the other hand, a single positive reference is not enough to change attitudes towards a selection method which is generally perceived as having low predictive validity. However, this conclusion does not mean that HR professionals cannot be influenced by a marketing campaign based on multiple positive references, as our research focused on the influence of a single reference only. Multiple references could activate also other effects such as normative pressure and influence the attitude of the practitioners stronger than a reference to just one major company.