Elsevier

Computers in Human Behavior

Volume 22, Issue 5, September 2006, Pages 791-800
Computers in Human Behavior

Equivalence of standard and computerized versions of the Raven Progressive Matrices Test

https://doi.org/10.1016/j.chb.2004.03.005Get rights and content

Abstract

The present study examined the equivalence of the computer administered version of the Raven Standard Progressive Matrices (RSPM) with the standard paper-and-pencil administered version of the RSPM. In addition, the effects of state and trait anxiety as well as computer anxiety were investigated. Fifty undergraduate volunteers were administered the RSPM twice under one of four conditions: computer–computer, standard–standard, computer–standard, or standard–computer. No significant differences were found between mean scores and standard deviations across administrations or formats. Rank-order correlations revealed similar ranking across formats. Tentative support for the equivalence of the computerized version of the RSPM was found. Analyses revealed no significant differences in anxiety across formats and no significant correlations between anxiety and RSPM performance. Explanations and implications for further research are discussed.

Introduction

Computers have been used in psychology for decades. Computerized testing is a growing field, and the availability of computers to the public is now greater than ever and will continue to grow. As the use of computers in psychological assessment continues, it is important to understand which factors interact with the format of computerized testing. Computer anxiety is one of these factors. Most of the research on computer anxiety took place in the 1980s, when computer use increased dramatically. Though there are numerous advantages to using computer administration (e.g., control over the presentation of material, accurate response times), the most common question is whether or not traditional tests can be transferred to computer administration without the development of new norms.

One problem facing a test developer is the evidence of equivalence between formats of administration. The two forms of equivalence that are of concern are experiential equivalence and psychometric equivalence (Honaker, 1988). Experiential equivalence is important as an attempt to measure construct validity. Two versions of a test could be equivalent in terms of producing similar scores; however, both may be measuring different constructs. Psychometric equivalence is determined by showing no significant differences between group means and distributions. Showing that mean scores are statistically similar demonstrates that differing formats will not bias overall group scores.

In addition, it is important to examine variance in relation to format. Extreme variance differences indicate that range and standard deviation are not comparable to the norms set for the standard format, causing equivalence to be questioned. Rank order correlations provide reliability and similarity by directly comparing computer administered group scores to the standard format. Experiential equivalence is determined by showing no significant differences between attitude and perception of the test takers toward the test.

In general, tests that use presentation of verbal items and response have been shown to be psychometrically equivalent. However, nonverbal-type tests have had mixed results in terms of equivalence. None of the studies which investigated the equivalence of nonverbal-type tests considered the effects of individual perceptions of the test (experiential equivalence). Experiential equivalence, particularly the effects of computer anxiety, has been studied with verbal-type tests. However, nonverbal tests have not been studied with regard to computer anxiety. It could be that visual-spatial-type items are more susceptible to effects of computer anxiety than verbal-type items. Kubinger, Formann, and Farkas (1991) suggest that computer-related anxiety might explain the performance deficit found on a computerized form of the Raven Stanard Progressive Matrices (RSPM). Computer anxiety is assumed to cause greater randomness in responding and to interfere with the ability to learn techniques for solving the matrices and the ability to use those techniques on each additional item presented.

Most of the computer administered psychological tests that are in use today are personality tests such as the MMPI and other tests that involve reading simple questions and answering with simple responses. The computerization of tests involving complex graphics and manipulation of materials on the computer screen, such as the Wechsler series of intelligence tests or the RSPM, has not been as widespread. One of the first attempts to automate the administration of IQ tests was the use of the Totally Automated Psychological Assessment Console (TAPAC) (Gilberstadt, Lushene, & Buegel, 1976). The TAPAC was a machine that used a reel-to-reel tape player, a slide projector, and a console with buttons to be pushed for responses. The TAPAC was used to administer a battery of tests designed to correlate with the Wechsler Adult Intelligence Scale (WAIS). This battery consisted of the Shipley–Hartford Vocabulary subtest, RSPM Standard 1938 edition, WAIS Digit Span, WAIS Digit Symbol, and the Halstead Category test. Unfortunately, this battery was not compared to the standard versions of the included tests. It was assumed that the automated versions were comparable to their standard versions.

Two other studies addressed the issue of automating psychological assessment (Calvert and Waterfall, 1982, Watts et al., 1982). These two studies specifically addressed the issue of equivalence between the standard format of administration and the automated administration for the RSPM. Calvert and Waterfall used an apparatus similar to the TAPAC. They found no significant differences for group means or time to complete the tests between formats of administration. Watts et al., however, did find significant differences between formats of administration and time to complete the test. The time difference was also statistically significant, with the automated version being faster during both testing sessions. Watts et al. used an apparatus almost identical to the TAPAC and the apparatus used by Calvert and Waterfall.

A noticeable difference between the two studies was the use of a tailored and adaptive version of the RSPM for the automated administration by Watts et al. Watts et al. decreased the number of items from 60 to 29 for the automated version. They did not adapt the standard administration. Due to the decreased length of the automated test, it is not surprising that individuals completed the automated form significantly faster than the traditional version. Watts et al. also suggested that by making the automated version shorter, it would make the test easier. However, they found that individuals performed approximately five points higher on the traditional version. Their results are unclear because they were attempting to compare tests which not only differed in format of administration but also in content and process of administration. The latter issues were not addressed in the study.

The technology to transfer performance or nonverbal-oriented tests directly to computers has recently become available. However, the little research done on nonverbal tests has found mixed results. Most of the studies that did determine equivalence have serious methodological shortcomings. Rock and Nolen (1982) found no significant differences between a computerized version of the Raven Coloured Progressive Matrices and the traditional version of the test. This study did not address computer anxiety, which is assumed to be lower in subjects from younger generations (Loyd et al., 1987, Rosen et al., 1987).

The cognitive processing involved in such a task as drawing is continuous and requires feedback on responses (Levy & Barowsky, 1986). It may be that computerized tests that involve such tasks as spatial relations and visualization are affected by the format change more than tests that involve simple presentation of text on a screen. Kubinger et al. (1991) investigated the equivalence of a computerized form of the RSPM to the standard version. Effects of format difference were investigated as one of the questions of the study. The authors concluded that the computerized version underrated the IQ scores of the standard version, on average by 13 IQ points. They stated that one possible explanation of this discrepancy could be “the stress-evoking characteristic of a computer in general, as for instance it may induce a testee to precipitate item responses”(Kubinger et al., 1991, p. 300). They suggest that certain items may be more susceptible to basis or computer anxiety than other items.

The purpose of the present study is to evaluate the equivalence between the standard form of the RSPM and a computer administered version, considering the possible participant response differences due to computer anxiety. In order to investigate equivalence, a repeated measures, counterbalanced design was used. This allows for information about reliability, similarity in mean scores, distribution, and ranking across formats to be obtained. In addition, it is hypothesized that:

  • (1)

    Computer anxiety will increase as scores on the computer administered RSPM decrease.

  • (2)

    Participants with low anxiety and/or low computer anxiety will show similar results regardless of administration format.

  • (3)

    Participants, regardless of level of computer anxiety, will complete the computerized version faster than the standard version.

Section snippets

Participants

The participants in this study were undergraduate volunteers who enrolled in psychology courses at a local state university. Participation in a study was a required part of their coursework. The total sample consisted of 30 females and 20 males, with a mean age of 18.3 years. All but 2 of the 50 participants were freshman. The students were randomly assigned to one of four groups, and tested on two occasions. Groups did not show a significant difference in terms of gender [χ2(1) = 2.00], or age [F

Outliers

Fig. 1 shows the difference scores between administrations 1 and 2 of the RSPM. Based on these difference scores, three participants’ data were excluded from further analyses. These outliers were greater than 2.5 standard deviations below the mean. In addition, 1 participant was excluded due to missing data.

RSPM mean scores

Mean scores and standard deviations for the RSPM were compared among the four group conditions and are summarized in Table 1. In order to compare similarity of scores across formats, a

Discussion

Results of the current study support the view that the RSPM standard version and the RSPM computerized version are compatible and equivalent. No significant differences were found between individuals taking the RSPM on the standard version or the computerized version. The lack of significant difference in mean scores corresponds with the idea that different groups of random individuals will obtain relatively similar scores regardless of the format. The finding that distribution and variances

References (19)

There are more references available in the full text version of this article.

Cited by (55)

  • Neocortical Age and Fluid Ability: Greater Accelerated Brain Aging for Thickness, but Smaller for Surface Area, in High Cognitive Ability Individuals

    2021, Neuroscience
    Citation Excerpt :

    Also, (a) one hundred and twenty average cognitive ability (ACA) individuals were carefully selected from the Human Connectome Project (HCP) database (http://www.humanconnectomeproject.org) and (b) nine ACA individuals were selected because of their comparable sociodemographic features to those of the recruited HCA individuals –these were scanned with the same MRI scanner. The 189 individuals completed the same general reasoning ability test: the 24-item Penn Matrix Reasoning Test (PMAT-24) (Williams and McCord, 2006; Bilker et al., 2012). One ACA participant from the HCP database failed to pass the image processing algorithm and, therefore, was removed from the sample.

  • Development of inductive reasoning in students across school grade levels

    2020, Thinking Skills and Creativity
    Citation Excerpt :

    A number of media effect studies have indicated that if the tasks maintain their main characteristics (content as well as visual appearance) after the digitization process, the media effect is insignificant. The media effect was not found for IR (Csapó, Molnár, & Tóth, 2009) or for the Raven Progressive Matrices Test (Williams & McCord, 2006). For young children who still cannot read, tests are individually administered face to face.

  • An eye-controlled version of the Kaufman Brief Intelligence Test 2 (KBIT-2) to assess cognitive functioning

    2016, Computers in Human Behavior
    Citation Excerpt :

    To this end, we assessed whether scores obtained from a group of healthy volunteers on the eye-controlled version of the KBIT-2 paralleled those gained using the standard paper-based version. Available research investigating the equivalence between computer-based cognitive assessments has highlighted that psychometric equivalence across modalities should not be assumed (Arce-Ferrer & Guzmán, 2009; Schulenberg & Yutrzenka, 2004; Thompson, Ennis, Coffin, & Farman, 2007; Williams & McCord, 2006). Indeed, while scores on traditional and computerized versions of assessments have been found to be well-correlated in a number of studies (e.g. Chen, White, McCloskey, Soroui, & Chun, 2011; Choi, Kim, & Boo, 2003; Williams & McCord, 2006), suggesting that they are measuring the same construct, investigations into the differences between correlated scores, which gauge the relative difficulty of the test medium, have produced mixed results (Newton, Acres, & Bruce, 2013).

  • Employment testing online, offline, and over the phone: Implications for e-assessment

    2016, Revista de Psicologia del Trabajo y de las Organizaciones
    Citation Excerpt :

    The current study built on previous research (Grieve & de Groot, 2011) examining the equivalence of electronic assessment methods in a vocational context by including telephone administration and a specific applicant profile. Overall, the results support previous research indicating equivalence between online and pen-and-paper test administration (e.g., Bates & Cox, 2008; Carlbring et al., 2007; Casler et al., 2013; Williams & McCord, 2006), and between online, pen-and-paper, and telephone administration (Knapp & Kirk, 2003). When responding as the ideal police applicant, scores did not differ between administration modes on any of the personality scales.

View all citing articles on Scopus
View full text