Methods of assessing neurobehavioral development in children exposed to methyl parathion in Mississippi and Ohio.

Methyl parathion (MP), an organophosphate pesticide, was sprayed illegally for pest control in U.S. residences and businesses in Mississippi and Ohio. To evaluate the association between MP exposure and neurobehavioral development, children 6 years of age or younger at the time of the spraying and local comparison groups of unexposed children were assessed using the pediatric environmental neurobehavioral test battery (PENTB). The PENTB is composed of informant-based procedures (parent interview and questionnaires) and performance-based procedures (neurobehavioral tests for children 4 years of age or older) that evaluate each of the four broad domains (cognitive, motor, sensory, and affect) essential to neurobehavioral assessment. Children were classified as exposed or unexposed using urinary p-nitrophenol (PNP) levels and environmental wipe samples for MP. Exposure was defined as a urinary PNP level of greater than or equal to 100 ppb for the child or any other individual living in the household. Environmental wipe sample levels of greater than or equal to 150 g MP/100 cm2 and greater than or equal to 132.9 g MP/100 cm2 were used to define MP exposure for children living in Mississippi and Ohio, respectively. The PENTB was first administered in summer 1999 (year 1). The PENTB was readministered in summer 2000 (year 2) to children who participated in year 1 of the study. A description of the methods used in the study are presented. Results of data analyses for both years of the study will be presented in a separate publication.

Methyl parathion (MP) is an organophosphate insecticide also known as "cotton poison" (ATSDR 1999;U.S. EPA 2002). It is currently licensed in the United States to control infestations of insects on certain agricultural crops in open fields. It is most commonly used on cotton, but is also used on field corn, peaches, wheat, barley, soybeans, and rice fields (Anonymous 1997a(Anonymous , 1997cATSDR 1999). MP was classified as a restricted-use pesticide in 1978 because of its potential to harm humans and birds (ATSDR 1999;U.S. EPA 2002).
MP was illegally used indoors as a pesticide for cockroaches in nine states (Alabama, Arkansas, Illinois, Louisiana, Michigan, Mississippi, Ohio, Tennessee, and Texas) (Anonymous 1997c). All sprayed areas in these states have been designated as Superfund sites by the U.S. Environmental Protection Agency. Although not licensed for indoor use, MP may have been used illegally as a pesticide for cockroaches because it is effective against these pests, it is relatively inexpensive, and it persists for long periods of time when used indoors so that frequent respraying may not be necessary (Anonymous 1997b;U.S. EPA 2002).
Humans can be exposed to MP through inhalation, ingestion, and dermal absorption (Anonymous 1997a(Anonymous , 1997cATSDR 1999). Acute exposure to high levels of the insecticide affects the nervous system. Signs and symptoms of acute high-dose exposure include loss of consciousness, headache, dizziness, confusion, difficulty breathing, loss of coordination, muscle twitching, tremor, nausea, vomiting, abdominal cramps, diarrhea, blurred vision, excessive perspiration and salivation (Anonymous 1997a(Anonymous , 1997bATSDR 1999;Eskenazi et al. 1999;U.S. EPA 2002).
In 1984, seven children in Mississippi became ill (and two ultimately died) after acute indoor exposure to MP in a concentration nearly three times that used for agricultural spraying. The signs and symptoms experienced by these children included two children in respiratory arrest, and five children with various degrees of lethargy, increased salivation, increased respiratory secretions, abdominal pain, and pinpoint pupils (CDC 1984).
Animal studies have shown that exposure to organophosphate pesticides may affect neurologic functioning in developing rats (ATSDR 1999;Eskenazi et al. 1999). A study of 90 pesticide applicators suggested that organophosphate exposure was associated with a loss of peripheral nerve function (Stokes et al. 1995). Another cohort of pesticide applicators exposed to organophosphates for 10-15 years were more likely to have difficulties with memory and motor reflexes and have mood changes than an unexposed comparison population (Savage et al. 1988). Agricultural workers in Nicaragua who were tested 2 years after exposure to organophosphate pesticides performed worse on tests that measured verbal and visual attention, visual memory, visuomotor speed, sequencing and problem solving, and motor steadiness and dexterity than an unexposed comparison population (Rosenstock et al. 1991) A study of 146 sheep farmers with long-term exposure to organophosphates found subtle adverse neurologic effects (Beach et al. 1996). These studies suggest evidence of neurologic deficits in workers occupationally exposed to organophosphate pesticides.
Children may be more likely to be exposed to MP because crawling and play activities put them close to the ground where they have increased chances of exposure to contaminated surfaces such as baseboards (ATSDR 1999;Bearer 1995;Eskenazi et al. 1999;Guzelian et al. 1992;Landrigan and Carlson 1995). Children may also be more susceptible to health effects from MP exposure because of their developing brain (ATSDR 1999;Eskenazi et al. 1999;Kolb and Fantie 1995). To our knowledge, no studies have examined the neurobehavioral effects of MP exposure in children.
The Agency for Toxic Substances and Disease Registry (ATSDR) conducted a study in Mississippi and Ohio to examine the association between low-dose, subacute MP exposure in children and neurobehavioral development. In this article, we describe the methodology of the study, including selection of participants, exposure assessment, and neurobehavioral evaluation. Results of the study will be presented in a separate publication.

Materials and Methods
Study population. Potential participants were identified through data files provided to ATSDR by the Mississippi and Ohio state health departments. Mississippi and Ohio were selected as data collection sites because environmental data and urine testing were available in these states. Sprayed homes were identified through a mass media public education campaign encouraging residents of sprayed homes to call a hotline, door-to-door canvassing of the area, and confiscated records from unlicensed exterminators who illegally sprayed the pesticide (Esteban et al. 1996). Children residing in Mississippi and Ohio who were 6 years of age or younger at the time of MP spraying were eligible for inclusion in the study. In one Ohio county, spraying occurred in a multifamily, subsidized housing facility that was last sprayed in 1994. In Mississippi spraying was more widespread and included 29 counties, with residences sprayed as recently as late 1996.
Results of environmental wipe samples for MP taken from residences (household MP) and urine testing for creatinine-adjusted p-nitrophenol (PNP, a metabolite of MP) were provided by the state health departments. Samples were collected for all residents in areas known to be illegally sprayed with MP. Testing was conducted in Ohio in 1994 and in Mississippi from late 1996 through mid-1997. Exposure status was defined on the basis of test results. Both household MP and urinary PNP levels were used to define exposure status.
In Mississippi, exposure was defined as at least one household MP sample ≥ 150 µg/100 cm 2 or urinary PNP ≥ 100 ppb for at least one person in the household. For Ohio, exposure was defined as household MP ≥ 132.9 µg/100 cm 2 or urinary PNP ≥ 100 ppb for at least one person in the household. To include enough exposed children in Ohio, it was necessary to lower the cutoff value for household MP.
Comparison groups of unexposed children residing in the same communities as exposed children were also identified. Local comparison groups were chosen to minimize confounding from sociocultural factors (e.g., regional variations in education, IQ, race, and cultural factors).
In Mississippi, unexposed children were selected through state records from houses that tested < 25 µg/100 cm 2 for household MP; no urine testing was done for children at those levels of MP. In Ohio, unexposed children were first selected through state records from houses that tested < 35 µg/100 cm 2 for household MP, and where no one in the household had a urinary PNP level > 25 ppb. The cutoff value for household MP in Ohio was increased to include enough unexposed children. Because an insufficient number of unexposed children were identified through existing records, a special census was done in the sprayed complex after it was remediated and in a nearby housing complex that was not sprayed, to identify additional unexposed children.
In Mississippi, levels of household MP ranged from 0.5 to 5000.0 µg/100 cm 2 , with a mean of 374.3 µg/100 cm 2 ; in Ohio, levels ranged from 1.0 to 8195.5 µg/100 cm 2 , with a mean of 709.7 µg/100 cm 2 . In Mississippi, levels of PNP ranged from 18 to 1,900 ppb, with a mean of 283.7 ppb; in Ohio, levels of PNP ranged from 2.3 to 1,374 ppb, with a mean of 217.2 ppb.
We identified 365 children in Mississippi (147 exposed and 218 unexposed) and 328 children in Ohio (104 exposed and 224 unexposed). In Mississippi, 181 children completed or partially completed testing (85 exposed and 96 unexposed) in summer 1999; in Ohio, 146 children completed or partially completed testing (49 exposed and 97 unexposed) in summer 1999. All testing protocols were approved by the institutional review board of the Centers for Disease Control and Prevention.
Data collection. To assess the long-term health effects of MP exposure, data were collected in summer 1999 (year 1) and summer 2000 (year 2). Parents or guardians of eligible children invited to take part in the study were initially contacted by letter, which was followed up with a telephone call. All children who participated in year 1 were invited to be retested in year 2. Parents or guardians who agreed to participate in the study were scheduled for an on-site appointment at a nearby testing center. All parents or guardians provided written informed consent for their child's participation in the study. Children 7 years of age or older provided assent for their participation in the study.
A computer-assisted personal interview was administered to the parent or guardian to obtain information on potential confounders. The interview asked about demographic and personal characteristics such as parental and child's medical history, mother's pregnancy history of the index child, parental occupational histories, workplace chemical usage, and child's residential history. For each test, potential confounders will be regressed individually with the exposure status. Variables that contribute to a change in the parameter estimate of the exposure status of 10% or more will be included in the final model.
The pediatric environmental neurobehavioral test battery (PENTB) was used to assess the neurobehavioral functioning of the children (Amler and Gibertini 1996). The PENTB is a screening battery adopted by ATSDR for use in large-scale studies to evaluate neurobehavioral effects of toxicants in children. The PENTB was developed by recognized experts in the fields of neurotoxicity, neuropsychology, neurology, psychology, pediatrics, and epidemiology. A field test of the PENTB was conducted on a group of children living in an urban community located near a former hazardous waste site (Amler and Gibertini 1996). The PENTB consists of interviews and questionnaires for the parent or guardian (informant-based procedures) and neurobehavioral testing of children 4 years of age or older (performancebased procedures). The test battery is not intended for use as a specific marker of neurotoxicity in any individual.
Examiners were trained in administration of the PENTB during an intensive weeklong training course conducted 1 month prior to data collection. Training included extensive practice in administering each test of the PENTB. At the completion of training, each examiner was evaluated while administering the test battery to a child. Only examiners collecting data in a reliable manner participated in data collection activities. To be considered reliable, the examiner had to administer each test correctly, using the verbatim script and appropriate probes when needed. The examiner also had to complete the test record correctly, namely, scoring each item on the test properly. In addition the examiner needed to meet more subjective criteria, such as establishing rapport with the child and parent, administering the test battery smoothly, appearing comfortable during the test administration, and using appropriate praise. The examiner had to meet both the objective and subjective criteria to be certified. Examiners were blinded to the exposure status of the child. For quality assurance and quality control, all performance-based portions of the PENTB (except vision tests) were videotaped.

PENTB Performance-Based Tests
In Table 1 we present the domains covered by the specific PENTB tests.
Contrast sensitivity. Visual contrast sensitivity (CS) is an indicator of neurologic functioning of the visual pathways from the retina to the cortex. This test measures the least amount (threshold) of luminance difference (contrast) between adjacent areas that is necessary for an observer to detect a visual pattern. The child views a series of circles through an OPTEC vision screener (Stereo Optical Co. Inc., Chicago, IL, USA). The circles contain lines with varying contrast. The child is asked to identify the direction in which the lines in individual circles are pointing (Cot 1994 test consists of 24 geometric figures arranged in order of increasing difficulty. The child is asked to copy each of the figures, beginning with the easiest, without erasing. Detailed scoring criteria allow an assessment of each figure drawn by the child. The child receives an all or none score for each figure, depending on whether the figure meets scoring criteria (Beery 1997).
Kaufman Brief Intelligence Test. The Kaufman Brief Intelligence Test (K-BIT) is an efficient measure of general intelligence, verbal ability, and nonverbal reasoning. The K-BIT is composed of two subtests: vocabulary and matrices. The vocabulary subtest is composed of two sections: expressive vocabulary and definitions. In the expressive vocabulary section the child is asked to name a pictured item. In the definitions section the child is asked to provide a word that fits two clues: a phrase description and a partial spelling of the word. In the matrices subtest the child is asked to identify, either verbally or by pointing, which of several options best solves for relationships among meaningful items, best completes analogies, or best completes patterns of dots. The K-BIT IQ composite score is derived from the child's performance on both the vocabulary and matrices subtests (Kaufman and Kaufman 1990).
Purdue Pegboard. The Purdue Pegboard (PP) test requires visual-motor coordination, manual dexterity, and motor speed. During the test, metal pegs are retrieved one at a time from wells at the top of the pegboard and placed in the holes, starting at the top of the pegboard and moving down. The child, seated in front of the pegboard, is given 30 sec to place as many pegs as possible in the holes using either the preferred hand, the nonpreferred hand, or both hands simultaneously. The standard PP is 45 cm long (25 holes in each column). A shortened version of the pegboard, measuring 29.5 cm (15 holes in each column), is used for children 7 years of age or younger (Lafayette Instrument Company, Lafayette, IN, USA).

Story memory and story memory delay from Wide-Range Assessment of Memory and
Learning. These are tests of verbal memory. For the story memory (SM) subtest, two short stories are read to the child. The child's immediate recall of both specific and general components of each story is assessed. For the story memory delay (SD) subtest, the child's recall of the same two stories is assessed after a time delay. The examiner reminds the child of the topic of each story and asks the child to recall as much about each story as possible. During the interval between the SM and SD subtests, the child completes other PENTB tests that require nonverbal performance (Adams and Sheslow 1990). Both raw and scaled scores were calculated for the immediate recall portion of the SM test. Raw scores were required because scaled scores were not available for children who were 4 years of age at the time of testing. Difference scores were also calculated between the raw immediate recall and raw delay portions of the SM test.

Trail Making Test, parts A and B.
The Trail-Making (TM) Test assesses multistep processing involving more than one cognitive function area (visual perception, motor speed, sequential skills, and symbol recognition). This test is administered to children 9 years of age or older and contains two separately administered forms. The score for each form is the number of seconds required for its completion. In part A, the child is asked to connect numbers in sequence by drawing a continuous line from one number to the next. In part B, the child is asked to draw a continuous line in a sequence that alternates between numbers and letters (for example, 1, A, 2, B . . .) until the end of the sequence is reached (Reitan 1992).
Verbal cancellation test. The verbal cancellation (VC) test measures sustained selective attention. Two forms are used for each child, an ordered form and a nonordered form. The stimulus array for each form is composed of letters, 60 of which are letter A. In the ordered form, the letters appear in regularly spaced rows and columns. In the nonordered form, the letters are distributed with no apparent order. The ordered form is presented first, and the child is given 90 sec to circle as many A's as can be found. The nonordered form is then administered with the same instructions. The examiner records whether the child's search strategy is organized or disorganized (Mesulam 1985).
Visual acuity. The visual acuity (VA) test screens for gross vision problems. A standard eye chart (Snellen type) developed for preschool children uses pictures rather than letters to measure visual acuity in children 4-6 years of age. Children 7 years of age or older are assessed using an OPTEC vision screener (Stereo Optical) that requires the identification of a series of letters for each acuity level (Neff 1991). This test checked for vision problems that might affect the child's performance on other tests.

PENTB Informant-Based Tests
Family Resources Scale. The Family Resources Scale (FRS) is a self-administered 30-item rating scale. The FRS assesses the adequacy of resources in households with young children. Items query the availability of food, shelter, money, transportation, time with family and friends, health care, and other basic resources (Dunst and Leet 1987).
Parenting Stress Index. The Parenting Stress Index (PSI) is a 101-item self-administered questionnaire. The PSI asks parents to estimate the occurrence of common signs and symptoms of child and family dysfunction. The index yields a child domain score composed of six subscales (adaptability, acceptability, demandingness, mood, distractability/hyperactivity, and reinforces parent) and a parent domain score composed of seven subscales (depression, attachment, restrictions of role, sense of competence, social isolation, relationship with spouse, and parental health). The index also yields a total stress score (Abidin 1995). The PSI was completed by parents or guardians of children 1-3 years of age.
Personality Inventory for Children. The Personality Inventory for Children (PIC) is a 280-item self-administered questionnaire. The PIC assesses the child's behavior, affect, and cognitive status. Questions are presented in a yes/no format. The PIC has been used extensively in clinical practice and in research Methyl Parathion • Methyl parathion methods for its ability to identify children who have emotional, behavioral, or cognitive disturbance. The inventory yields four factor scores (undisciplined/poor self-control, social incompetence, internalization/somatic symptoms, cognitive development) that serve as an overall measure of important developmental domains. A variety of validity scales for assessing response bias are included in this test (Wirt et al. 1991). The PIC was completed by the parent or guardian of children 4 years of age and older. Vineland Adaptive Behavior Scales. The Vineland Adaptive Behavior Scales (VABS) is a semistructured interview administered to a respondent (parent or guardian) who is familiar with the child's behavior. The VABS is a conversational interview that produces four domain scores (communication, daily living skills, socialization, and motor skills), as well as the summary adaptive behavior composite. The VABS spans adaptive behavior components from birth to adulthood, with the interview focusing on behavioral components relevant to the individual child (Sparrow et al. 1984). The VABS was administered to the parent or guardian of all children participating in the study.

PENTB Scoring
For the individual PENTB tests, both raw scores and age-scaled scores (where appropriate) were computed using the appropriate scoring manuals. We calculated a total PP score by averaging the number of pegs placed in the holes using the preferred hand, the nonpreferred hand, and both hands simultaneously in addition to calculating scores for the three separate trials. We calculated a total VC score by averaging the number of A's circled on the ordered and nonordered forms in addition to calculating scores for both forms separately. Additionally, raw and age-scaled scores were placed into one of seven categories: upper extreme (score is in the 98th percentile), well above average, above average, average, below average, well below average, or lower extreme (score is in the 2nd percentile). The scores were categorized using test-specific norms included in the manuals for each of the tests except for the VC test. The norms for these tests (except for VC) were based on large national samples. For the VC test, norms were developed based on the performance of unexposed children during the first year of testing.
Children were also assigned one of four overall PENTB outcome groups (expected, equivocal, below expected, or undetermined) based on the number of tests completed and the scores of the individual tests. For children younger than 4 years of age, the informant-based test results were used. For children 4 years of age or older, only results of performance-based tests were used, and a minimum of four tests, including the K-BIT, needed to be completed. Regardless of the number of tests completed, a child who scored in the lower extreme on the K-BIT IQ was classified as "below expected." Children who consistently scored in the average range or better, with only one or two test scores below average, were classified as expected. Children who scored average or better for some tests, but below average on three or four tests, or well below average on one or two tests and who showed no pattern or consistency, were classified as equivocal. Children who scored below average on five or more tests, well below average on three tests, or in the lower extreme on two tests were classified as below expected. Children 4 years of age or older who did not complete the K-BIT or four tests were classified as undetermined.
In year 1, each PENTB test was scored by one of two persons trained in scoring procedures and then reviewed by a neuropsychologist. In year 2, tests were scored independently by two trained persons. All scoring discrepancies were resolved by a neuropsychologist. Quality assurance and quality control measures were conducted both during data collection and after data collection was complete. To check that data collection was appropriate and that the collected data were properly recorded, 15% of the collected data at each study site were reviewed during data collection.
At the conclusion of data collection, the videotapes and test scores of all children who scored in the lowest 15th percentile on five or more of the tests and children who were classified as undetermined were reviewed for accuracy by a psychologist. A random selection of 10% of all other children was also reviewed for accuracy.
In year 1, 74 children were selected for quality assurance review after data collection was complete. Scoring was accurate for 95% (n = 70) of the children reviewed. For 4% (n = 3) of the children, scoring on one PENTB test was corrected, which changed the overall PENTB outcome group. Additionally, the overall PENTB outcome group was changed for one child (1%) where, although the scoring of the tests was accurate, not enough tests were completed to warrant a classification.
In year 2, 74 children were selected for quality assurance review after data collection was complete. Scoring was accurate for 82% (n = 61) of the children reviewed. For 12% (n = 9) of the children, scoring on one or more of the PENTB tests was corrected, but none of these scoring corrections resulted in a change in the overall PENTB outcome group of the child. Although the scoring of the tests was accurate, the overall PENTB outcome group was changed for three (4%) children because the classification of a test was inaccurate. Additionally, the overall PENTB outcome group was changed for one child (1%) where, although the scoring of the tests was accurate, not enough tests were completed to warrant a classification.

Data Analysis
Assessing the neurobehavioral development of children is important, because the central and peripheral nervous systems are sensitive to chronic, low-dose exposure to toxic substances (Amler and Gibertini 1996). Although the PENTB monograph describes the design and implementation of the test battery, it does not provide a framework for analyzing the data. Outside of field trials, this study represents the first time data collected using the PENTB will be analyzed. We will analyze the individual test scores as well as the overall PENTB outcome groups. The individual test scores will be analyzed continuously and dichotomously to compare those children who scored in the worst 10% on a test with children scoring in the other 90%. We will also explore if there are site-specific differences. The results will be presented in a future publication.