Use of computerized test batteries for quantifying neurobehavioral outcomes.

Neurobehavioral testing provides for noninvasive assessment of the functional integrity of the nervous system. Neurobehavioral tests have been used as quantitative outcome measures in a number of epidemiologic investigations of the potential effects on the nervous system from exposure to organic solvents, heavy metals, and pesticides. Because of the functional complexity of the nervous system, sets of tests assessing a range of functions have been used, with inconsistency from one study to another. Although there has been recent progress in standardizing a core set of tests for use in occupational epidemiology, major consensus on testing methods has not emerged. Standardization of test methods is essential to provide a consistent database for risk analysis. Automation of data collection provides not only standardization, but also improved time efficiency, of data collection and analysis. The computerized Neurobehavioral Evaluation System (NES) has been developed to address the need for standardized, efficient data acquisition of a range of neurobehavioral variables. Examples of application of NES in epidemiologic studies of workers exposed to solvents are summarized. The need for use of NES as a tool for surveillance and in prospective epidemiologic investigations is emphasized.

Examples of applition of NES in epidemolgic sudies of orke enposed to solvents are rd. The need for use of NES as a tool for srveice and in pospeive epidemio Environmental Neurotoxicity As a Problem Certainly, there are substantial environmental exposures to potential neurotoxicants, particularly in the workplace. Neurotoxic diseases are among the list oftop ten leading workrelated diseases and injuries prepared by the U.S. National Institute for Occupational Safety and Health (NIOSH) (1). Ofthe more than 60,000 chemicals in commerce, at least 750 have been reported to have adverse effects on the nervous system. Sixty-five chemicals ofthese 750 are also on the list of200 chemicals having exposure to more than 1 million U.S. workers (2).
It is well known that clinically overt effects occur from high exposures to a number ofsolvents, heavy metals, and pesticides. Indeed, these effects are what determines that the exposures are high. There are also many reports of neurobehavioral dysfunction in people exposed to concentrations of these agents lower than those producing clinically overt symptoms. Of the 91 NIOSH criteria documents, 36 cite effects on the nervous system, often at concentrations lower than those required to produce effects on other organ systems (2).
It is commonly believed that subtle neurobehavioral deficits caused by lower level exposure can, with prolonged exposure, progress to more severe effects. This belief implies that an opportunity exists that early detection can allow remedial action to be taken before dysfunction progresses to irreversible damage. Neurobehavioral dysfunction can be considered, then, in the language of some, a marker for neurotoxic disease.

Impediments to Assessing Risks of Neurotoxicity
There are theoretical and practical impediments to assessing neurotoxic risks in humans. The largest procedural impediment to quantitative assessment of risk from exposure to neurotoxicants is the absence of a large, consistent database in humans. A large database in humans does not exist because measuring neurologic or neurobehavioral outcomes is difficult and expensive, and what data do exist are difficult to interpret because they have been generated by a wide variety of testing methods. Theoretic and (interrelated) procedual reasons for use of so many methods include: the nervous system is complex, with many separate functions to be tested; there is no explicit, generally accepted functional model ofthe nervous system; there is no general neurotoxicologic model that generates explicit behavioral hypotheses to be tested; there is no standardized, commonly accepted tool to generate the data; investigators are rewarded for application of novel, rather than standard, techniques.
To create the needed database, work will be required at the interface of four fields: neurology, psychology, toxicology and epidemiology (Fig. 1). Advances and consensus on theory will be needed from both neuropsychology and neurotoxicology so that practical methods from psychology and epidemiology can be optimized.

Need for Advancements in Theory
There is great need for major consensus on theoretical models at the neuropsychologic and neurotoxicologic levels. At the neuropsychological level, agreement must be reached on an explicit model of the entire range of neuropsychological functions. Then the neuropsychological functions tested by existing behavioral tests could be specified. (No behavioral test assesses a single function; all tap some blend of sensory, motor, and cognitive functions.) Neuropsychologists currently use implicit neurobehavioral models, but these must be made more explicit and become more universally accepted. Perhaps information processing theory (3) can provide a unifying theme, as Williamson has suggested (4). In addition to providing a context for specification of neurobehavioral tests, a consensus neuropsychological model would provide some context from which to view any subclinical deficits in performance found in epidemiologic investigations. Agreement at the neuropsychological level would also allow development of new behavioral tests, tailored to individual neuropsychologic functions more specifically than existing composite tests. This approach has been taken by Eckerman et al. (5) to develop a computer-based neurobehavioral testing system from a theoretic framework provided by Carroll (6). Unfortunately, neither the theoretic framework nor the computerized system has found widespread acceptance in neurotoxicology or neuroepidemiology.
In addition, more explicit neurotoxicologic theories are needed than those currently available. The range of behavioral functions that may be affected by exposure to a toxic agent is extremely wide, and for this reason investigators typically use sets, or batteries, of tests. In practice, no study of a particular exposure situation can realistically sample very many of these functions, so neurotoxicologic theory should tell us where to concentrate our testing efforts and which neurobehavioral tests to choose.
In addition to the needs for development ofnew theoretic tools and building of consensus on them, there is great need for development of new practical tools and standardization of them. The World Health Organization (WHO), and its European Office in particular, has had a major impact by sponsoring meetings on some of these needs, both in terms of neurotoxic disease definition (7) and standardization of neurobehavioral methods (8)(9)(10).
In selecting which behavioral tests to include in a test battery, there are hundreds ofbehavioral tests from which to choose. In addition, such tests are often modified ad hoc either through ignorance of protocol details or to improve them for particular studies. As a result, divergent results from apparently similar studies are hard to reconcile. One instance ofaddressing this problem of lack oftest standardization and providing some consensus has emerged from a WHO-NIOSH conference held in Cincinnati in 1983 (10). At that meeting, a set of seven behavioral tests that has become known as the WHO Neurobehavioral Core Test Battery (WHO-NCTB) was recommended to be used in all epidemiologic investigations of workers exposed to potential neurotoxicants. Manually administered tests were selected so that they might be used in developing countries as well. It was intended that investigators would supplement this core with other tests as time and equipment available for testing permitted. A major study is currently underway in eight countries to determine norms and explore cultural differences in performance for the seven WHO-NCTB tests.

Computerized Neurobehavioral Testing
The general topic ofcomputerized psychological assessment has been summarized recently (II). Most of the effort has been devoted to clinical concerns where the commercial market resides. Such applications include computerized scoring ofquestionnaires, particularly the Minnesota Multiphasic Personality Inventory, computenized patient report generation, and cognitive rehabilitation software. Computerized neurobehavioral testing for clinical purposes is not well developed, probably due to the need for intense clinician-patient interaction in that setting.
Computerized neurobehavioral testing offers several advantages in the epidemiologic situation. The primary advantages are rigid standardization ofthe testing protocol through the computer program and efficiency ofdata collection. The data collected are objective and quantitative. Since conclusions in this context are on a group basis, study objectives can be met by less intense testing of a greater number of individuals than in the clinical testing situation. Thus, with computerized tests, sample sizes can be increased easily from the 20 to 50 subjects typical of past studies ofthis type. Larger sample sizes will allow better modeling ofcovariates, reduced sampling bias, and increased statistical power.
There are some disadvantages of computerized testing. The most obvious problem involves potential fear ofthe computer by the subject. This apparent problem can be controlled by proper hardware (using simple responses, covering unused keys, etc.) and software design (simple, consistent instructions and smooth program flow), and keeping the nature of the behavioral tests simple and obvious to the subjects. Perhaps the most significant criticism ofcurrently available computerized tests is that they tap only a limited range of the full behavioral repertoire. Visual presentation of stimuli and manual responding are emphasized in most currently available tests. It is fortuitous that, in the two best-studied exposure areas, lead and solvents, deficits in psychomotor and visuomotor functions are among those reported most often.
Only a few computerized test systems that assess a wide range ofbehavioral functions have been developed for environmental epidemiologic applications. Many investigators have applied individual computerized neurobehavioral tests. For example, special-purpose laboratory reaction time tests have been implemented on general purpose microcomputers, although hidden technical difficulties may often be ignored. The computerized test systems that contain a number of tests and have been developed for, or used in, epidemiologic applications include: MicroTox System by Eckerman et al.

Neurobehavioral Evaluation System
The computerized neurobehavioral system that has been used most widely for application in epidemiologic studies is NES (12). It consists of over 15 computerized neurobehavioral tests and questionnaires that tap the broad functional domains of psychomotor speed and control, perceptual speed, learning and memory, attention and affect. A subset of these tests is chosen for each study situation, depending upon the toxic agent in question, the study design, and the time available for testing. The test instructions have been translated from English into eight other languages, and more than 50 investigators havejoined the NES Users' Group and obtained the NES software (16). At this point, more than 10 laboratory and epidemiologic studies that used NES have been published.
A brief review ofthe studies in which NES tests have been applied has recently been provided (17). NES tests have been used in studies of more than 5000 subjects. Groups exposed to potential neurotoxicants that have been studied include painters (18,19), pesticide applicators (20), and mercury-exposed workers (21). Other epidemiologic studies of painters, floorlayers, and dry cleaners have been completed but are not yet published. Some NES tests are also being used in the Third National Health and Nutrition Evaluation Survey (NHANES-IH).

Neurobehavioral Data Analysis
For most neurobehavioral tests, the major modifier ofperformance is age, which may account for 5 to 40% of the total variance, depending upon the particular test and age range ofthe sample. Education may account for up to 30% of the total variance. These effects should be controlled for in all epidemiologic investigations employing neurobehavioral tests. Other effects such as gender and time of day are commonly thought to be important factors and may occasionally be statistically significant predictors for some neurobehavioral tests, but they rarely account for more than 5 % ofthe total variance in test scores. All these effects combined, including age and education, never account for more than 50% of the total variance of scores for a particular test. The other major source of variance in neurobehavioral test scores is within-subjects error, which may be 10 to 50% ofthe total variance. Ofthis, much may be actual instrumental error, again depending on the test. However, human performance is always subject to a degree of noninstrument noise.
Alcohol intake is commonly considered to have a major impact on neurobehavioral test performance. Certainly, acute alcohol intoxication and chronic alcoholism resulting in nutritional deficit should be considered criteria for exclusion from data analysis. However, the utility of self-reported alcohol consumption as a predictor variable in regression analyses ofneurobehavioral outcomes is not clear. Although negative effects ofmoderate drinking have been reported (22), others have found no such effects on a variety ofneurobehavioral measures in neurotoxically exposed, but otherwise healthy, populations (19,23). In addition, paradoxical positive effects ofgreater alcohol consumption are found on occasion (18).
The most troublesome source of variance in neurobehavioral test performance (computerized or manual) is the motivational state ofthe subject at the time oftesting. Malingering is a potentially important cause of suboptimal performance, although its frequency of occurrence in epidemiologic studies is unknown. It is possible that the frequency of malingering will increase as the use ofneurobehavioral tests in neurotoxic exposure situations and the number of litigations increases. While techniques exist for detecting malingering (24), their true efficacy is unknown. Such techniques have not been implemented in currently available computerized test systems.
Individuals with excessive within-test reproducibility can be identified and excluded from the data analysis as a way ofhandling subjects with submaximal effort. This procedure assumes that submaximal effort results in increased variability. However, individuals with true neurotoxic impairment may also have more variable performance, and excluding them would bias analyses toward the null hypothesis (25). Finally, poor effort may be the result of depression, which itself may or may not be caused by exposure. This issue is a difficult one for both traditional behavioral tests and for computerized tests, but one that may be handled better in the traditional testing situation where there is more direct subject-examiner interaction.
The reliability ofcomputerized neurobehavioral outcomes is moderately high and generally comparable to that of manually administered neurobehavioral tests. For example, the average reliability for NES tests in field epidemiologic investigations is about 0.7 (17). The reliability ofcomputerized neurobehavioral test outcomes can be improved in laboratory investigations by increasing the amount of training for subjects and increasing the length of the tests.
Since more than 50% ofthe total variance in performance on most neurobehavioral tests is due to between-subjects factors, a within-subjects (test-retest, cross-over) study design should usually be employed, ifthe critical hypotheses can be addressed by such a design. This is easily accomplished in investigations of effects of acute exposures. The implication for studies of effects of chronic exposure is that large sample sizes will be necessary to observe subtle effects in studies with betweensubjects designs, or that pre-exposure baseline perfonmance should be assessed in prsective studies. The practical efficiency ofcomputerized neurobehavioral outcomes is helpfil in both ofthese instances, i.e., testing large numbers of subjects or implementing routine testing. The efficiency considerations ofobtaining pre-exposure baseline performance would be a bonus to the usual advantages of prospective study design (e.g., increased epidemiologic validity). In addition, collection of baseline behavioral performance information would allow greater power for detecting effects of accidental exposures as well as allow greater confidence in making decisions about changes in the performance of individuals (16).

Concluding Remarks
Several computerized systems capable of generating neurobehavioral outcomes in epidemiologic studies ofthe effects ofexposure to potential neurotoxicants have been developed and are being applied. NES is the most widely used system and has been found useful in a variety of exposure situations.
There is still much work to be done. Advances in theory are needed. Theoretic advances will trigger additional test system development and provide a context for assessing the meaning of test outcome deficits that are found. Prospective studies are needed. They will allow determination ofthe biological significance of subtle deficits in neurobehavioral test outcomes and allow prevention of disease. In addition, such studies will provide a necessary database for estimating effect modifiers and external comparison groups to other studies. Consensus on opfimal outcome measures and other statistical methods is needed. Finally, standardization ofat least a few computerized tests among all test systems should facilitate comparison ofresults from diverse studies. This work was supported in part by a center grant from the National Institute of Environmental Health Sciences (ES00928-15). Many members of the NES Users' Group have contibuted much to the ideas and datapesited here. Special thanks go to F. Gerr and D. Hershman for their critical comments.