From Likert scales to images: Validating a novel creativity measure with image based response scales

Article history: Received 31 May 2016 Received in revised form 6 October 2016 Accepted 8 October 2016 Available online 22 October 2016 The use of image-based testing to assess individual differences has increased substantially in recent years, with proponents arguing that they offer amore engaging alternative to text-basedpsychometric tests. Yet research examining the validity of these tests is near to non-existent. Traditional image-based formats have been little more than an adaptation of self-reports, with images replacing questions but not response options. The current study develops a novel image-based creativitymeasure,where images replace conventional response scales, and scores on themeasures are obtained using a linear regression scoring algorithm to predict three self-reported creativity measures. Using sequential forward selection on a set of 77 image-based items, an optimal solution of 14 items that were valid predictors of self-reported creativity scores were identified. The image-based measure had good test-retest reliability. Implications are discussed in terms of the usefulness of image-based testing for practitioners seeking engaging and short test formats. © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Perhaps in response to these criticisms, and fuelled by technological advances, recent years have seen mounting interest in more engaging forms of assessment (Attali & Arieli-Attali, 2015), including gamification (Chamorro-Premuzic & Steinmetz, 2013;Landers & Callan, 2011;Reeves & Read, 2013) and social media analytics (Kosinski, Matz, & Gosling, 2015;Pennebaker, 2011). However, innovative assessment tools often serve entertainment purposes, with little indication to their validity (Naglieri et al., 2004). The increase in the quantity of these instruments has not been synonymous with an increase in research into their quality, that is, their reliability and validity. Indeed, the desire to use innovative assessment by professionals has outpaced the peer-reviewed literature (e.g., Roth, Bobko, Van Iddekinge, & Thatcher, 2013). This gap between research and practise is problematic if tests are used to make hiring decisions or provide clinical diagnosis.
Consequently, developing scientific evidence for the validity and utility of image-based tests is critical, not only from an academic, but also an applied perspective. The current study takes a step in this direction. Specifically, an image-based creativity assessment and a predictive scoring algorithm are developed. The test-retest reliability, as well as its concurrent validity in relation to three text-based, self-report creativity measures are assessed, so that practitioners may better understand how such image-based tests compare to traditional tests.

Advantages of image-based formats
One of the most common innovations in psychological assessment formats has been to replace item questions with visual representations, thereby increasing user engagement (Barrett & Ebbeling, 2003;Downes-Le Guin, Baker, Mechling, & Ruylea, 2012;Hamari, Koivisto, & Sarsa, 2014;Lugtigheid & Rathod, 2005). Beyond engagement, imagebased formats could provide theoretical and practical advantages over text-based psychometric tests. First, they may be more suitable for culturally and linguistically diverse test takers, and remove misunderstanding of text items (Paunonen, Jackson, & Keinonen, 1990). Second, responding to image-based items may require less attention, reducing test taker fatigue. Finally, image stimuli evoke stronger preferences in respondents than verbal stimuli, providing for reduced length of imagebased tests (Lugtigheid & Rathod, 2005;Meissner & Rothermund, 2015).

Past research on image-based tests
Despite being innovative, image-based formats in assessment are not new. Geist's (1959) Pictorial Interest Inventory pictures a person engaged in three activities, of which respondents pick the most appealing one. More recent image-based tests adapt text-based personality measures such that the question is replaced with an image: The Nonverbal Personality Questionnaire (Paunonen et al., 1990) measures Murray's (1938) psychological needs such that participants report the likelihood that they would engage in visually displayed behaviours. A version of the test measuring the Big Five also exists (Paunonen, Ashton, & Jackson, 2001).
These adaptations of verbal personality tests have gained support in the academic literature for their internal reliabilities and validities (Hong, Paunonen, & Slade, 2008;Moore, Schermer, Paunonen, & Vernon, 2010;Paunonen, 2003;Paunonen, Jackson, Trzebinski, & Forsterling, 1992;Paunonen, Zeidner, Engvik, Oosterveld, & Maliphant, 2000). However, research examining the validity of image-based tests is scarce, and their use is mostly limited to special populations, such as children or illiterates. In addition, the use of response scales and scoring methodologies developed for verbal formats is not ideal: by using images to replace the question stem, questions are limited to those that can be visually represented.

Assessment of creativity
Creativity encompasses both personality and cognitive aspects related to the production of unique and useful ideas (Runco & Jaeger, 2012;Simonton, 2000). Three of the many components associated with creativity are: Cognitive Flexibility, the ability to switch cognitive sets to adapt to changing environmental stimuli (Scott, 1962); Curiosity, the recognition, pursuit, and intense desire to explore novel and uncertain events ) and Openness to Experience, the Big Five personality trait considered as a proxy of creativity (Feist, 1998;Furnham & Bachtiar, 2008;Martindale, 1989).
Because of the broadness of the construct, multi-trait, multi-method approaches have been proposed as most suitable (Cropley, 2000;Plucker & Makel, 2010). An image-based measure of creativity may add to this array of measurement methodologies available for creativity testing. In addition, image-based response scales may be particularly effective in measuring creativity because images elicit aesthetic preferences, such as preferences for complexity, which in turn are indicative of self-reported creativity and aesthetic styles (Barron, 1953;Chamorro-Premuzic, Reimers, Hsu, & Ahmetoglu, 2009;Rawlings, 2003;Swami, Stieger, Pietschnig, & Voracek, 2010;Wiersema, van der Schalk, & van Kleef, 2012). A preference for complex polygons is associated with higher self-reported creativity, such that Robinson (1967, 1968) suggested the use of polygons varying in their level of complexity as measures of creativity. Accordingly, the present research aimed to a) develop a novel format image-based creativity measure, b) investigate its concurrent validity in relation to three textbased measures of creativity, and c) assess its test-retest reliability.

Measures
2.1.1. Curiosity and Exploration Inventory-II (CEI-II;  A 10-item, five-point Likert self-report scale. The CEI-II measures two traits: stretching (e.g., 'I actively seek as much information as I can in new situations') and embracing (e.g., 'I am the type of person who really enjoys the uncertainty of everyday life'). The CEI-II demonstrates reliability estimates of 0.85, construct validity, discrimination, desirable breadth of difficulty , and predictive validity for task performance (Kashdan, Rose, & Fincham, 2004).

Cognitive Flexibility Inventory (CFI; Dennis & Vander Wal, 2010)
A 20-item, seven-point Likert scale, self-report measure of adaptive thinking in stressful situations. Thirteen items assess behaviours related to alternatives (e.g., 'I consider multiple options before making a decision'), and seven items behaviours related to control ('When I encounter difficult situations, I feel like I am losing control'). The CFI shows a reliable factor structure, internal consistency, test-retest reliability, and concurrent validity (Dennis & Vander Wal, 2010).
2.1.3. Openness to experience (Goldberg, 1999) Measured on a five-point Likert scale ('very inaccurate' to 'very accurate') using the 10-item Openness scale from the International Personality Item Pool (e.g. 'I enjoy hearing new ideas').

Item design
The question stem of image-based items retained its verbal format, but the response scale presented a range of images (see Fig. 1). Each item consisted of a text-based question and between two and eight image response options. The image response options took one of two forms: they either assessed varying levels of the same trait, or they represented different traits. Seventy-seven items were designed to reflect Cognitive Flexibility, Curiosity, and Openness.

Scoring
The scoring algorithm was developed on a sample of 964 participants, recruited using a UK panel company, and compensated for their participation. The panel had an equal distribution of males and females, and participants were UK residents. Approximately half of the users were 18-25 and the other half 25-36 years old. Participants completed the three creativity measures as well as all 77 image-based items.
Rather than stipulating which responses were indicative of which underlying trait, responses to image-based items were scored in relation to standard measures. This method is commonly used in measure validation procedures when testing concurrent validity between new and existing measures (Rust & Golombok, 2009), as well as for predictive personality measures (Bachrach, Kosinski, Graepel, Kohli, & Stillwell, 2012;Boyd et al., 2015;Lambiotte & Kosinski, 2014;Youyou, Which is more like you? You're visiting a new country-How immersed do you get in local culture? Fig. 1. Example image response scales. Schwartz et al., 2013;Wang, Kosinski, Stillwell, & Rust, 2014).
Responses to all 77 items were dummified, with each image response option being transformed into a binary variable. This resulted in 321 dummy variables. Dummified responses were used as the independent variables (predictors) and the creativity scores as the respective dependent (predicted) variables in linear regression models to estimate the creativity scores.

Item selection
As the large number of dimensions resulting from 321 dummy variables can cause over-fitting, two methods of feature selection were used. The first method, LASSO (Least Absolute Shrinkage and Selection Operator) regression with 10-fold cross validation was applied to reduce the number of image response options. LASSO is a regularized regression, which penalizes variables with large coefficients and discounts variables with inconsistent performance across the sample. Thereby LASSO selects response options that are most indicative of creativity.
LASSO cannot take into account that some image response options were taken from the same question. In order to account for the contribution of single questions, a second feature selection method, Sequential Forward Selection, was used (Devijver & Kittler, 1982). Starting with an empty set of questions, LASSO regression with 10-fold cross validation was used to estimate the relevant scale. The predicted and measured scores were correlated, and additional questions added at each step until no new question improved the correlation by N 0.1. Questions with individual correlations higher than 0.2 were also retained. This resulted in a final set of 14 questions, or 64 dummy variables.

Results
With the selected 14 questions as predictor variables, LASSO regression with 10-fold cross validation was performed to predict creativity scores. Coefficients for the models predicting Cognitive Flexibility, Curiosity, and Openness, are presented in Table 1.

Validation
1071 participants (605 females) were recruited using Amazon's Mechanical Turk (MTurk). Participants completed the text-based creativity measures and the 14-item image-based measure, which is part of the Red Bull Wingfinder assessment. To assess test-retest reliability, a subset of 162 participants retook the test after 60 days. MTurk panellists were US citizens paid for their participation. 15% were aged 18-24, 47% aged 25-34, 24% aged 35-44, and 14% aged 45 to 59. Responses to the image-based measure were scored using the algorithm described in Section 2.3.
Creativity scores were normally distributed on the text-and imagebased measures (see Table 2). The three text-based creativity scores had moderate intercorrelations (average r = 0.47, with p b 0.001), as had the three image-based scores (average r = 0.5, with p b 0.001) (see Table 2).
Correlations between text-and image-based scores were moderate to high. Concurrent validity was higher for Curiosity and Openness than for Cognitive Flexibility (see Fig. 2). The average test-retest reliability of the image-based measures was r = 0.63 (p b 0.001) (see Fig. 2).

Discussion
The aim of this study was to examine the psychometric properties of a newly developed image-based creativity measure. Results obtained from two large samples provided preliminary evidence for the test-retest reliability and concurrent validity of the 14-item measure.
The developed scoring algorithm accurately predicted creativity scores on two of the three existing scales. This finding is in line with studies demonstrating the use of predictive models for measuring personality (Chen, Hsieh, Mahmud, & Nichols, 2014;Lambiotte & Kosinski, 2014;Yarkoni, 2010) and indicates that predictive scoring algorithms are suitable for scoring image-based response scales. Moderate correlations between the image-based and the text-based measures for Curiosity and Openness demonstrated good concurrent validity of the image-based format (r = 0.5, p b 0.001). Furthermore, the measure exhibited good test retest reliability (r = 0.65, p b 0.001), indicating that the selected image-based items are able to reliably measure aspects of creativity. On the other hand, the concurrent validity for Cognitive Flexibility was relatively low (r = 0.35, p b 0.001), suggesting that the selected images may not assess this particular aspect of creativity equally well.
The predictive scoring algorithm used fewer items than established text-based measures, assessing all three creativity aspects with 14 items, compared to 40 items on the text-based measures. Both the predictive scoring algorithm and stronger associations evoked by images may be reasons for achieving shorter length (Meissner & Rothermund, 2015).
The image-based measure demonstrated good test-retest reliability (average r = 0.63, p b 0.001), in particular taking into account factors that might have reduced the correlation including the small number of items, long interval between test and retest (six weeks), and the small to moderate sample size. Indeed, the observed test-retest reliability for the image-based Openness measure was higher than that reported in other studies for the ten-item, text-based Openness measure (reported r = 0.55 in Kosinski, Stillwell, & Graepel, 2013). Participants were more likely to consistently select the same image than they were to consistently select the same point on a Likert scale. This could be due to the relatively broader construct of creativity as compared to Openness (i.e. broader constructs tend to display better reliability; Chamorro-Premuzic, 2011). In addition, some image-based items had only two images as response options, compared with five to seven response options on Likert scales, which could lower the probability of changing responses.

Implications
The current study has a number of implications for the development of image-based assessments, particularly those focusing on creativity, but also beyond. For researchers and practitioners interested in alternatives to traditional self-report Likert response scales, this study provides preliminary support for the validity and reliability of an image-based measure. Although additional research is needed to replicate and extend these findings, this study takes a step towards providing evidence for the utility of innovative psychological assessments.
The image-based measure has a number of advantages in practice. The measure is shorter than both existing text-and image-based creativity measures. Image response scales may be less obvious in what they are measuring than Likert scales. As a consequence, image-based scales could be less prone to faking and appear less intrusive to the test taker.

Limitations and future research
The current study has a number of limitations. It assessed the concurrent validity of image-based measures in relation to text-based, self-report inventories only. Although this provides initial evidence of the validity of image-based measures, additional studies are needed to investigate its relationship to non-self-report measures of creativity, such as divergent thinking tests. In addition, reliability of the imagebased measure should be improved by identifying additional imagebased items, in particular if the measure is to be used in selection contexts. The incremental validity of the image-based measures in predicting creative performance, beyond existing tests, should be established. This research would be necessary for demonstrating the value of using image-based measures alongside, in addition to, or as a replacement of, current creativity measures.
There has been a call for a multi-method, multi-trait approach to the study of creativity (Batey & Furnham, 2006;Cropley, 2000;Park, Chun, & Lee, 2016;Plucker & Makel, 2010). Image-based measures may tap into the same performance variance as other creativity measures. But provided that image-based measures are more engaging than self-reports and easier to administer than divergent thinking tests, they could provide an alternative to existing creativity measures. Otherwise, image-based tests may predict distinct performance variance from current measures. In this scenario, they could be used alongside traditional tests. Accordingly, the purpose of developing an image based assessment is not only to provide more engaging alternatives for established (self-report) methods, but to provide an optional methodology for a valid multi-method approach to assessing creativity (and perhaps other personality traits like the Big Five).

Conclusion
This study supports the proposition that creativity can be measured via preferences for image-based stimuli. It may encourage research into innovative assessment formats and help practitioners in applying alternative assessments in settings where evidence of validity and reliability is required. Image-based assessments may provide a solution to an  18-20, 2 = 21-24, 3 = 25-29, 5 = 30-34, 6 = 35-39, 7 = 40-44, 8 = 45-49, 9 = 50-54, 10 = 55-59, 11 = 60-64, 12  evolving need for alternative assessments, and this study was one of the first to attempt to bridge the gap between practise and research.